Every great data analyst speaks Python. Not because it is fashionable, but because it lets you do in three lines what takes thirty clicks in a spreadsheet — and then repeat it forever, error-free. This module builds the exact Python foundation an analyst needs: nothing wasted, everything practical. You will write real code from the very first topic.
1Set up your toolkit & write your first script
An analyst's job is a pipeline: get data in, shape it, understand it, and tell the story. Before we write code, picture the whole journey — this course follows it stage by stage.
Install your toolkit (pick one path)
- Zero-install (recommended to start): Google Colab — Python runs in your browser, nothing to install. Just visit colab.research.google.com.
- On your computer: install the Anaconda distribution — it bundles Python 3, Jupyter Notebook, and the data libraries we will use later (Pandas, NumPy, Matplotlib).
- Editor (optional): VS Code for writing longer scripts.
.py scripts when we want to automate something to run on its own.Your first analytics script
Let us compute a simple sales summary — the kind of thing you will automate by the end of this module.
# A tiny sales summary — your first analytics script
sales = [1200, 980, 1450, 760, 2100] # daily sales for one week
total = sum(sales)
average = total / len(sales)
best_day = max(sales)
print('Total sales :', total)
print('Average/day :', round(average, 2))
print('Best day :', best_day)Total sales : 6490 Average/day : 1298.0 Best day : 2100
Five lines of logic, and you have totals, an average and a peak. print() shows a value; sum(), len(), max() and round() are built-in functions you will use constantly.
- The analytics workflow: Collect → Clean → Explore → Analyse → Visualise → Communicate.
- Use notebooks (Jupyter / Colab) to explore; use
.pyscripts to automate. - Built-ins like
sum(),len(),max(),round()already do a lot of an analyst's arithmetic.
2Variables, data types & operators
A variable is a labelled box that holds a value. Python figures out the type automatically — you never declare it. Knowing the core types is essential, because each one behaves differently when you analyse it.
| Type | Example | Used in analytics for… |
|---|---|---|
int | units = 42 | counts, IDs, quantities |
float | price = 19.99 | money, rates, measurements |
str | city = 'Mumbai' | names, categories, labels |
bool | is_repeat = True | flags, filters (yes/no) |
list | [10, 20, 30] | an ordered column of values |
dict | {'name': 'Asha', 'age': 30} | one labelled record (a row) |
Working with variables and operators
price = 250.0 # float
quantity = 4 # int
customer = 'Asha' # str
is_member = True # bool
revenue = price * quantity
discount = revenue * 0.10 if is_member else 0
final = revenue - discount
print(customer, 'pays', final)
print('Type of revenue:', type(revenue))Asha pays 900.0 Type of revenue: <class 'float'>
Notice price * quantity produced a float (because one operand was a float). Python's arithmetic operators are + - * /, plus // (whole-number divide), % (remainder) and ** (power).
type(x) when results look odd — mixing a number stored as text ('42') with a real number is the #1 beginner bug.- Six core types power most analysis:
int,float,str,bool,list,dict. - A
listis like a column; adictis like a single labelled row. - Use
type(x)to debug — text that looks like a number ('42') will not do maths correctly.
3Control flow: conditionals, loops & comprehensions
Control flow lets your code make decisions and repeat work — the heart of automation.
Decisions with if / elif / else
score = 78
if score >= 90:
grade = 'A'
elif score >= 75:
grade = 'B'
else:
grade = 'C'
print('Grade:', grade)Grade: B
Indentation (4 spaces) is how Python groups code — there are no curly braces. Comparison operators are == != > < >= <=.
Repeating work with a for loop
orders = [1200, 980, 1450, 760, 2100]
big_orders = 0
for amount in orders:
if amount > 1000:
big_orders = big_orders + 1
print('Orders over 1000:', big_orders)Orders over 1000: 3
The analyst's favourite: list comprehensions
A list comprehension builds a new list in one readable line — perfect for filtering and transforming.
orders = [1200, 980, 1450, 760, 2100]
# keep only the big orders
big = [a for a in orders if a > 1000]
# apply 18% GST to every order
with_tax = [round(a * 1.18, 2) for a in orders]
print('Big orders:', big)
print('With tax :', with_tax)Big orders: [1200, 1450, 2100] With tax : [1416.0, 1156.4, 1711.0, 896.8, 2478.0]
- Indentation (4 spaces) defines blocks in Python — not braces.
- Use
forloops to walk through data; track results in an accumulator variable. - List comprehensions filter and transform a list in one line — the foundation of column-wise thinking.
4Functions, modules & error handling
Once you have written logic, wrap it in a function so you can reuse it without copy-pasting. Functions keep analysis clean, testable and shareable.
Define a reusable metric
def summary(values):
return {
'count': len(values),
'total': sum(values),
'average': round(sum(values) / len(values), 2),
}
week = [1200, 980, 1450, 760, 2100]
print(summary(week)){'count': 5, 'total': 6490, 'average': 1298.0}Borrow power from modules
A module is a library of ready-made functions. import brings it in.
import statistics as stats
week = [1200, 980, 1450, 760, 2100]
print('Mean :', stats.mean(week))
print('Median:', stats.median(week))
print('Stdev :', round(stats.stdev(week), 2))Mean : 1298 Median: 1200 Stdev : 519.4
Handle messy reality with try / except
Real data is dirty — a blank cell or stray text will crash a naive script. try / except lets your code recover gracefully.
raw = ['1200', '980', 'N/A', '760', '']
clean = []
for value in raw:
try:
clean.append(float(value))
except ValueError:
clean.append(None) # mark unreadable values
print(clean)[1200.0, 980.0, None, 760.0, None]
ValueError), not a bare except: that swallows everything — that is how silent, wrong numbers reach a report.- Functions (
def) make analysis reusable and testable — write once, run anywhere. importa module (e.g.statistics) to reuse battle-tested code.- Wrap risky parsing in
try / except, and catch the specific error, not everything.
5Reading & writing data files: CSV, JSON, text
Data lives in files. The three formats an analyst meets daily are CSV (spreadsheets), JSON (data from APIs and the web) and plain text (logs and reports). Python reads them all.
Read a CSV
Imagine a file sales.csv:
date,region,amount
2024-01-01,North,1200
2024-01-01,South,980
2024-01-02,North,1450import csv
total = 0
with open('sales.csv') as f:
reader = csv.DictReader(f)
for row in reader:
total = total + float(row['amount'])
print('Total sales:', total)Total sales: 3630.0
csv.DictReader turns each row into a dict keyed by the header, so row['amount'] reads that column. The with open(...) pattern safely opens and closes the file for you.
Write a report (plain text)
report = f'Sales report\nTotal: {total}\nRows read: 3\n'
with open('report.txt', 'w') as f:
f.write(report)
print('Saved report.txt')Saved report.txt
Read & write JSON
import json
record = {'region': 'North', 'amount': 2650, 'currency': 'INR'}
# Python object → JSON text
text = json.dumps(record)
print(text)
# JSON text → Python object
back = json.loads(text)
print(back['amount']){"region": "North", "amount": 2650, "currency": "INR"}
2650pd.read_csv('sales.csv').- Use
with open(path) as f:— it opens and closes files safely. csv.DictReadergives each row as a dict keyed by the column headers.json.dumps()turns a Python object into text;json.loads()turns text back into an object.
6Reproducible, version-controlled analysis
Professional analysts do not just get an answer — they make it reproducible (anyone can re-run it) and version-controlled (every change is saved and reversible). Two habits make this happen.
Notebooks: Jupyter & Google Colab
- A notebook is a series of cells. Run a cell with
Shift+Enterand its output appears right below. - Mix code, charts and written explanation in one document — ideal for sharing analysis with a team.
- Colab needs no install and runs in your browser; Jupyter runs locally via Anaconda.
requirements.txt so others get the same results.Git & GitHub: a safety net + your portfolio
Git records snapshots of your project; GitHub stores them online so your work is backed up and shareable. The everyday workflow is four commands:
git init # start tracking this folder (once)
git add analysis.py # stage the changes you want to save
git commit -m 'First analysis' # save a snapshot with a message
git push # upload to GitHub- Notebooks (Jupyter/Colab) mix code, output and explanation — restart & run-all to prove reproducibility.
- Git saves reversible snapshots; the core loop is
add → commit → push. - A documented GitHub repo is your portfolio — start committing from day one.
★ Hands-on Project — Automate a Sales Data Report
Put the whole module together. You will write one Python script that reads raw sales data and produces a clean, formatted report automatically — no spreadsheet clicking.
- Create a file
sales.csvwith columnsdate, region, amountand at least 10 rows (invent realistic numbers, include one blank/invalid amount on purpose). - Write
sales_report.pythat opens the CSV withcsv.DictReaderand loops over the rows. - Use
try / exceptto skip or flag any row whoseamountis missing or not a number. - Compute the total sales, the average per row, the best region, and the count of valid vs skipped rows.
- Write the results to
report.txtin a tidy, human-readable format using an f-string. - Also save the summary as
report.jsonwithjson.dumpso another program could read it. - Initialise Git in the folder and make your first commit:
git add . && git commit -m 'Sales report automation'. - Bonus: push the project to a new public GitHub repository — this is the first piece of your analytics portfolio.
Ready to test yourself?
Take the module quiz. Score 70% or more to mark this module complete.
Start the quiz →💡 Log in to save your progress and earn the certificate.