🏆

Capstone Project & Professional Assessment

Take a real dataset through the full CRISP-DM lifecycle — frame, wrangle, model, evaluate, deploy and document — and pass the final assessment to earn your Professional Certificate.

Your project

This is where the whole course converges. You will run a complete, end-to-end data-science project through the CRISP-DM lifecycle and present it like a professional — the flagship piece of your portfolio. Choose a domain you genuinely care about: finance, healthcare, retail, sports, climate or social good.

Make it portfolio-grade: start from a real question, build a model that earns its place, evaluate it honestly, deploy it, and document it so anyone can follow. This is the project you will talk about in every interview.

  1. Frame the problem. Pick a domain and write one clear question, plus how success will be measured (the metric and a baseline to beat) — Modules 1 & 5.
  2. Acquire real data. Source a genuine dataset — a public CSV, an API, a Kaggle dataset, or a database — and document where it came from (Module 2).
  3. Wrangle & engineer. Clean types, handle missing values, merge tables, and engineer meaningful features into a tidy, leak-free table (Modules 2 & 6).
  4. Explore (EDA). Profile distributions, outliers and relationships, and surface at least five evidence-backed insights with clear visualisations (Module 3).
  5. Reason statistically. Where it strengthens the work, add a hypothesis test, confidence interval or A/B analysis — and interpret it correctly (Module 4).
  6. Model. Build at least two models in scikit-learn Pipelines, validate with cross-validation, tune hyper-parameters, and beat your baseline (Modules 5 & 6).
  7. Go deep where it fits. If your problem suits it, apply deep learning, NLP, or time-series forecasting (Modules 7–9) — only where it genuinely helps.
  8. Deploy. Save the pipeline, wrap it in a FastAPI endpoint, and containerise it with Docker so it runs anywhere (Module 10).
  9. Be responsible. Audit fairness, explain predictions (SHAP/importance), handle any PII, and write a short model card (Module 11).
  10. Communicate. Write a report (Question → Finding → Evidence → Recommendation) and record a 10-minute walkthrough for a non-technical audience.
  11. Publish. Push the notebook, data dictionary, model card, API code and report to a well-documented public GitHub repository.
  12. Pass the final assessment below to complete the course and earn your Professional Certificate.

Final assessment

A final assessment covering the whole course. Pass it (70%+) — together with completing every module — to earn your Professional Certificate in Data Science using Python.

Take the final assessment →

💡 Log in first so your result counts toward the certificate.