Foundations
Introduction to Python for Data Science – syntax, Jupyter, libraries.
Data Structures in Python – lists, dictionaries, sets, and tuples for data handling.
NumPy for Numerical Computing – arrays, broadcasting, linear algebra.
Pandas for Data Analysis – dataframes, cleaning, aggregation.
Data Visualization Basics – Matplotlib, Seaborn, Plotly.
Data Preparation
Data Cleaning & Preprocessing – handling missing values, duplicates, and outliers.
Feature Engineering – transformations, encoding categorical variables, scaling.
Exploratory Data Analysis (EDA) – statistical summaries, visual exploration.
Working with Time Series Data – datetime objects, resampling, rolling windows.
Text Data Processing (NLP Basics) – tokenisation, stemming, word embeddings.
Statistics & Math for Data Science
Descriptive & Inferential Statistics – mean, variance, hypothesis testing.
Probability Distributions in Python – normal, binomial, Poisson (using
scipy.stats
).Linear Algebra & Calculus Applications – vectors, derivatives in ML context.
Statistical Modelling – regression analysis, ANOVA, chi-square.
Machine Learning
Supervised Learning with Scikit-learn – regression, classification.
Unsupervised Learning – clustering (K-means, DBSCAN), dimensionality reduction (PCA, t-SNE).
Model Evaluation & Validation – cross-validation, confusion matrix, ROC-AUC.
Hyperparameter Tuning – GridSearchCV, RandomSearch, Bayesian optimization.
Ensemble Learning – Random Forests, Gradient Boosting, XGBoost, LightGBM.
Advanced Topics
Deep Learning with TensorFlow & PyTorch – neural networks, CNNs, RNNs.
Natural Language Processing (NLP) Advanced – transformers, BERT, GPT-based models.
Time Series Forecasting Models – ARIMA, Prophet, LSTM.
Big Data with PySpark – distributed data analysis in Python.
MLOps & Model Deployment – Flask, FastAPI, Docker for serving ML models.
Data Science Project Lifecycle & Case Studies – from problem definition to deployment.