🐍 Module 1

Python Foundations for Analytics

⏱ 10 hoursBeginner6 topics

🎯 By the end: set up a professional Python environment, write clean scripts with variables, loops and functions, read and write CSV/JSON/text files, and track your work with Git & GitHub.

Every great data analyst speaks Python. Not because it is fashionable, but because it lets you do in three lines what takes thirty clicks in a spreadsheet — and then repeat it forever, error-free. This module builds the exact Python foundation an analyst needs: nothing wasted, everything practical. You will write real code from the very first topic.

1Set up your toolkit & write your first script

An analyst's job is a pipeline: get data in, shape it, understand it, and tell the story. Before we write code, picture the whole journey — this course follows it stage by stage.

The analytics workflow — every module of this course maps to a stage here.

Install your toolkit (pick one path)

Zero-install (recommended to start): Google Colab — Python runs in your browser, nothing to install. Just visit colab.research.google.com.
On your computer: install the Anaconda distribution — it bundles Python 3, Jupyter Notebook, and the data libraries we will use later (Pandas, NumPy, Matplotlib).
Editor (optional): VS Code for writing longer scripts.

Jupyter vs a script? Analysts live in notebooks (Jupyter / Colab) for exploration — you run code cell by cell and see results instantly. We use .py scripts when we want to automate something to run on its own.

Your first analytics script

Let us compute a simple sales summary — the kind of thing you will automate by the end of this module.

# A tiny sales summary — your first analytics script
sales = [1200, 980, 1450, 760, 2100]   # daily sales for one week

total = sum(sales)
average = total / len(sales)
best_day = max(sales)

print('Total sales :', total)
print('Average/day :', round(average, 2))
print('Best day    :', best_day)

▶ Output

Total sales : 6490
Average/day : 1298.0
Best day    : 2100

Five lines of logic, and you have totals, an average and a peak. print() shows a value; sum(), len(), max() and round() are built-in functions you will use constantly.

Key points

The analytics workflow: Collect → Clean → Explore → Analyse → Visualise → Communicate.
Use notebooks (Jupyter / Colab) to explore; use .py scripts to automate.
Built-ins like sum(), len(), max(), round() already do a lot of an analyst's arithmetic.

2Variables, data types & operators

A variable is a labelled box that holds a value. Python figures out the type automatically — you never declare it. Knowing the core types is essential, because each one behaves differently when you analyse it.

Type	Example	Used in analytics for…
`int`	`units = 42`	counts, IDs, quantities
`float`	`price = 19.99`	money, rates, measurements
`str`	`city = 'Mumbai'`	names, categories, labels
`bool`	`is_repeat = True`	flags, filters (yes/no)
`list`	`[10, 20, 30]`	an ordered column of values
`dict`	`{'name': 'Asha', 'age': 30}`	one labelled record (a row)

Working with variables and operators

price = 250.0          # float
quantity = 4           # int
customer = 'Asha'      # str
is_member = True       # bool

revenue = price * quantity
discount = revenue * 0.10 if is_member else 0
final = revenue - discount

print(customer, 'pays', final)
print('Type of revenue:', type(revenue))

▶ Output

Asha pays 900.0
Type of revenue: <class 'float'>

Notice price * quantity produced a float (because one operand was a float). Python's arithmetic operators are + - * /, plus // (whole-number divide), % (remainder) and ** (power).

Dynamic typing: you can reassign a variable to a different type at any time. Powerful, but check your types with type(x) when results look odd — mixing a number stored as text ('42') with a real number is the #1 beginner bug.

Key points

Six core types power most analysis: int, float, str, bool, list, dict.
A list is like a column; a dict is like a single labelled row.
Use type(x) to debug — text that looks like a number ('42') will not do maths correctly.

3Control flow: conditionals, loops & comprehensions

Control flow lets your code make decisions and repeat work — the heart of automation.

Decisions with if / elif / else

score = 78

if score >= 90:
    grade = 'A'
elif score >= 75:
    grade = 'B'
else:
    grade = 'C'

print('Grade:', grade)

▶ Output

Grade: B

Indentation (4 spaces) is how Python groups code — there are no curly braces. Comparison operators are == != > < >= <=.

Repeating work with a for loop

orders = [1200, 980, 1450, 760, 2100]

big_orders = 0
for amount in orders:
    if amount > 1000:
        big_orders = big_orders + 1

print('Orders over 1000:', big_orders)

▶ Output

Orders over 1000: 3

The analyst's favourite: list comprehensions

A list comprehension builds a new list in one readable line — perfect for filtering and transforming.

orders = [1200, 980, 1450, 760, 2100]

# keep only the big orders
big = [a for a in orders if a > 1000]

# apply 18% GST to every order
with_tax = [round(a * 1.18, 2) for a in orders]

print('Big orders:', big)
print('With tax  :', with_tax)

▶ Output

Big orders: [1200, 1450, 2100]
With tax  : [1416.0, 1156.4, 1711.0, 896.8, 2478.0]

Where this is heading: comprehensions teach the mindset of transforming a whole column at once. In Module 3 you will do exactly this on millions of rows with Pandas — instantly, no loop required.

Key points

Indentation (4 spaces) defines blocks in Python — not braces.
Use for loops to walk through data; track results in an accumulator variable.
List comprehensions filter and transform a list in one line — the foundation of column-wise thinking.

4Functions, modules & error handling

Once you have written logic, wrap it in a function so you can reuse it without copy-pasting. Functions keep analysis clean, testable and shareable.

Define a reusable metric

def summary(values):
    return {
        'count': len(values),
        'total': sum(values),
        'average': round(sum(values) / len(values), 2),
    }

week = [1200, 980, 1450, 760, 2100]
print(summary(week))

▶ Output

{'count': 5, 'total': 6490, 'average': 1298.0}

Borrow power from modules

A module is a library of ready-made functions. import brings it in.

import statistics as stats

week = [1200, 980, 1450, 760, 2100]
print('Mean  :', stats.mean(week))
print('Median:', stats.median(week))
print('Stdev :', round(stats.stdev(week), 2))

▶ Output

Mean  : 1298
Median: 1200
Stdev : 519.4

Handle messy reality with try / except

Real data is dirty — a blank cell or stray text will crash a naive script. try / except lets your code recover gracefully.

raw = ['1200', '980', 'N/A', '760', '']

clean = []
for value in raw:
    try:
        clean.append(float(value))
    except ValueError:
        clean.append(None)   # mark unreadable values

print(clean)

▶ Output

[1200.0, 980.0, None, 760.0, None]

Do not hide every error. Catch specific errors (like ValueError), not a bare except: that swallows everything — that is how silent, wrong numbers reach a report.

Key points

Functions (def) make analysis reusable and testable — write once, run anywhere.
import a module (e.g. statistics) to reuse battle-tested code.
Wrap risky parsing in try / except, and catch the specific error, not everything.

5Reading & writing data files: CSV, JSON, text

Data lives in files. The three formats an analyst meets daily are CSV (spreadsheets), JSON (data from APIs and the web) and plain text (logs and reports). Python reads them all.

Read a CSV

Imagine a file sales.csv:

date,region,amount
2024-01-01,North,1200
2024-01-01,South,980
2024-01-02,North,1450

import csv

total = 0
with open('sales.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        total = total + float(row['amount'])

print('Total sales:', total)

▶ Output

Total sales: 3630.0

csv.DictReader turns each row into a dict keyed by the header, so row['amount'] reads that column. The with open(...) pattern safely opens and closes the file for you.

Write a report (plain text)

report = f'Sales report\nTotal: {total}\nRows read: 3\n'

with open('report.txt', 'w') as f:
    f.write(report)

print('Saved report.txt')

▶ Output

Saved report.txt

Read & write JSON

import json

record = {'region': 'North', 'amount': 2650, 'currency': 'INR'}

# Python object  →  JSON text
text = json.dumps(record)
print(text)

# JSON text  →  Python object
back = json.loads(text)
print(back['amount'])

▶ Output

{"region": "North", "amount": 2650, "currency": "INR"}
2650

This is the manual way — on purpose. Understanding files at this level makes you a better analyst. From Module 3, Pandas reads a CSV into a full table with one line: pd.read_csv('sales.csv').

Key points

Use with open(path) as f: — it opens and closes files safely.
csv.DictReader gives each row as a dict keyed by the column headers.
json.dumps() turns a Python object into text; json.loads() turns text back into an object.

6Reproducible, version-controlled analysis

Professional analysts do not just get an answer — they make it reproducible (anyone can re-run it) and version-controlled (every change is saved and reversible). Two habits make this happen.

Notebooks: Jupyter & Google Colab

A notebook is a series of cells. Run a cell with Shift+Enter and its output appears right below.
Mix code, charts and written explanation in one document — ideal for sharing analysis with a team.
Colab needs no install and runs in your browser; Jupyter runs locally via Anaconda.

Reproducibility habit: restart the kernel and “Run all” before sharing a notebook. If it runs top-to-bottom without errors, it is reproducible. Pin your library versions in a requirements.txt so others get the same results.

Git & GitHub: a safety net + your portfolio

Git records snapshots of your project; GitHub stores them online so your work is backed up and shareable. The everyday workflow is four commands:

git init                      # start tracking this folder (once)
git add analysis.py           # stage the changes you want to save
git commit -m 'First analysis'  # save a snapshot with a message
git push                      # upload to GitHub

Git moves your work from your computer to GitHub in clear, reversible steps.

Why employers love this: a GitHub profile full of clean, documented analyses is your data-analyst portfolio. We build yours across this course, finishing with the capstone.

Key points

Notebooks (Jupyter/Colab) mix code, output and explanation — restart & run-all to prove reproducibility.
Git saves reversible snapshots; the core loop is add → commit → push.
A documented GitHub repo is your portfolio — start committing from day one.

★ Hands-on Project — Automate a Sales Data Report

Put the whole module together. You will write one Python script that reads raw sales data and produces a clean, formatted report automatically — no spreadsheet clicking.

Create a file sales.csv with columns date, region, amount and at least 10 rows (invent realistic numbers, include one blank/invalid amount on purpose).
Write sales_report.py that opens the CSV with csv.DictReader and loops over the rows.
Use try / except to skip or flag any row whose amount is missing or not a number.
Compute the total sales, the average per row, the best region, and the count of valid vs skipped rows.
Write the results to report.txt in a tidy, human-readable format using an f-string.
Also save the summary as report.json with json.dump so another program could read it.
Initialise Git in the folder and make your first commit: git add . && git commit -m 'Sales report automation'.
Bonus: push the project to a new public GitHub repository — this is the first piece of your analytics portfolio.

Ready to test yourself?

Take the module quiz. Score 70% or more to mark this module complete.

Start the quiz →

💡 Log in to save your progress and earn the certificate.

←

Course home

Data Collection & Ingestion