LEARN DATASCIENCE IN 2023
COMPLETE ROADMAP
5 MONTHS
ZERO MONEY NEEDED
Python 4 Weeks
• Introduction to Python for Data Science
o Python Introduction
o Python , Jupyter and Anaconda Installation
o Wlkthrough of Jupyter and Anaconda
o Python Syntax and Semantics
o Python Data Structures
o Python Functions and Lambdas
o Python List comprehension and generators
o Python regular expressions
o Python File IO and Resource Operations
• Advanced Python for Data Science
o Numpy deepdive
o Pandas deepdive
o Matplotlib & Seaborn – Visualization
o Common Modules and Packages
Corey Shafer https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-
osiE80TeTt2d9bfVyTiXJA-UTHn6WwU
https://www.youtube.com/watch?v=Kk-
MLWithAP
ob0bQKhg&list=PL47eid5T-F5S1U_xiTfFR3f8myu7YGF2l
2
Maths for Data Science 2 Weeks
• Mathematics for Machine Learning (2 weeks)
o Statistics – Descriptive and Inferential
o Linear Algebra
o Calculus & Probability
Khan Academy - https://www.khanacademy.org/math/statistics-probability
Khan Academy - https://www.khanacademy.org/math/calculus-1
Organic Chemistry Tutor -
https://www.youtube.com/@TheOrganicChemistryTutor/playlists
Machine Learning – 1 5 Weeks
• Exploratory Data Analysis
o Data Gathering and Extraction
§ SQL (MySQL). NoSQL (Mongo)
§ CSV and JSON files
§ Web Crawling /Scraping / APIs / Beautiful Soup
§ Cloud - S3
§ Unstructured data ( Video, Speech , Text )
o Data Preprocessing & Analysis
§ Univariate, Bivariate and Multivariate Analysis
§ Outliers and Anomalies detection
§ Data Cleansing - Null value, Imputations, Duplicate treatment
3
§ Dispersion and Distribution
o Data Split
§ Train, Test, Validation set
§ Sampling technique and Stratification Strategy
§ Bias and Variance trade off
§ Handling Bias and Imbalance
Coursera / IBM - https://www.coursera.org/learn/ibm-exploratory-data-analysis-for-
machine-learning
• Introduction to Machine Learning
o Machine learning Landscape
o Machine learning end to end project lifecycle
o Regression Vs Classification
o Supervised Vs Unsupervised
o Batch Vs Online
o Linear Regression
o Multi Variable Linear Regression
o Logistic Regression
o Scikit learn and Statsmodel
• Feature Engineering and Feature Selection
o Feature Engineering
§ New feature creation
§ Variable transformation
§ Feature Encoding
§ Handling Categorical and Numerical features
§ Binning
4
o Feature Selection
§ PCA
§ Dimensionality Reduction techniques
§ Multicollinearity
§ Forward/Backwar/Stepwise selection
§ Lasso
§ Filter/Wrapper/Embedded
o Feature Scaling
§ Standardization
§ Normalization
Coursera Andrew Ng’s -
https://www.coursera.org/specializations/machine-learning-introduction
Josh Starmer
https://www.youtube.com/@statquest
Machine Learning - 2 5 Weeks
• Advanced Supervised Learning
o Naïve Bayes Classifier
o k-NN Classifier
o Support Vector Machines (Regressor and Classifier)
o Ensemble Techniques
§ Decision Tress
§ Bagging
§ Random Forest
§ Boosting
• Model Selection and Tuning
5
o Hyper Parameter Tuning
o Model Performance measures
o Bias and Variance
o Overfitting vs Underfitting
o Cross validation
o GridSearchCV Vs RandomizedSearchCV
o Regularization – L1 and L2
o Pipelining
• Unsupervised Learning
o K-means clustering
o KNN
o Hierarchical Clustering
o Anomaly detection
o Dimensionality Reduction Techniques / PCA
o SVD
o DBSCAN
• Production Deployment.
o Deployment scenario and strategies
o ML Pipeline
o Flask and Heroku
o Introduction to FastAPI
o Deployment to AWS ECS
o Monitoring and Continuous performance measure
o MLOps
Stanford CS229 –
6
https://www.youtube.com/watch?v=jGwO_UgTS7I&list=PLoROMvodv4rMi
GQp3WXShtMGgzqpfVfbU
Kaggle :
https://www.kaggle.com/learn
Deep Learning 5 Weeks
• Neural Network and Deep Learning Fundamentals
o Perceptron
o Activation and loss function
o Gradient Descent
o Batch Normalization
o Introduction to TensorFlow and Keras
o Transfer learning and regularization
• Computer Vision (2 Weeks)
o CNN
o Convolution, Pooling and Padding
o CNN architectures and ImageNet Challenge
o Object Detection
o Semantic Segmentation
• Natural Language Processing
7
o RNNs
o Tokenization, Stemming and Lemmatization
o LSTMs and GRUs
o Time Series analysis
o Advance Language Models – BERT, GPT3
o Attention is all you need
• Autoencoders and GANs
o Generative Network and Adversarial Network
o Variational Autoencoders
o Convolution and DCGAN
o Application of GANs
• Reinforcement Learning
o RL framework
o Markov Chain
o Policy Gradient Methods
o Type of RL systems
o Q Learning
Andrew Ng’s CS230 Stanford –
https://www.youtube.com/watch?v=PySo_6S4ZAg&list=PLoROMvodv4rOA
BXSygHTsbvUz4G_YQhOb
Coursera / Deeplearning.AI
https://www.coursera.org/specializations/deep-learning
8
MIT 6S.191
https://www.youtube.com/watch?v=7sB052Pz0sQ&list=PLtBw6njQRU-
rwp5__7C0oIVt26ZgjG9NI
Stanford CS231n
https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PLC1qU-
LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=2
Google’s Machine Learning Course –
https://developers.google.com/machine-learning/crash-course
9
Books :
10
Podcasts:
Lex Friedman
https://www.youtube.com/c/lexfridman
Adrej Karpathy – Not a Podcast – but a Youtube Channel
https://www.youtube.com/@AndrejKarpathy
Other things :
1. Build few ( 3+ ) end to end project portfolio and showcase it on LinkedIn
a. Don’t build Titanic / Iris data set. Do something new and novel to stand out
b. Do it end to end with deployment on cloud.
2. Build an Online presence on LinkedIn by posting and engaging with other ML
community members.
a. LinkedIn is new resume. Spend time and polish it.
3. Practice, Practice and Practice.
4. Learn from Others – Especially Kaggle notebooks and how they approach any
problem.
11