Machine Learning Applications - Summary Notes
1. Linear & Multivariate Linear Regression
Linear Regression models the relationship between a dependent variable and one or more independent variables using
a straight line. Multivariate Linear Regression involves multiple features (X1, X2, ..., Xn).
Formula: Y = 0 + 1X1 + 2X2 + ... + nXn +
2. Logistic Regression and Regularization
Logistic Regression is used for classification problems. It uses the sigmoid function to output probabilities.
Regularization (L1/L2) helps prevent overfitting by penalizing large coefficients.
3. Practical Implementation Aspects
Steps: Data cleaning Feature scaling Model selection Training Evaluation Tuning.
Tools: scikit-learn, pandas, matplotlib.
4. Decision Trees and Pruning
Decision Trees split data based on feature values.
Pruning removes branches to avoid overfitting. Two types: Pre-pruning (early stopping) and Post-pruning (removing
after full growth).
5. Support Vector Machines (SVMs)
SVMs find the hyperplane that maximally separates classes. Kernel trick allows for non-linear boundaries. Important
parameters: C (penalty), kernel, gamma.
6. Boosting with Decision Trees
Boosting combines weak learners to form a strong one. Examples: AdaBoost, Gradient Boosting, XGBoost. Each new
tree corrects errors made by the previous ones.
7. Setting Up & Debugging ML Tasks
Machine Learning Applications - Summary Notes
Define the problem clearly. Clean and preprocess data. Try different models and tune hyperparameters. Use metrics
and visualizations to debug.
8. Unsupervised Learning: K-Means, PCA, Hierarchical Clustering
K-Means: Cluster based on nearest centroid.
PCA: Reduce dimensions while keeping max variance.
Hierarchical Clustering: Builds a tree (dendrogram) of clusters.
9. Implementing Clustering Algorithms
Libraries: scikit-learn, scipy, matplotlib.
Fit the model, predict clusters, visualize using scatter plots or dendrograms.
10. Parallelizing Learning Algorithms
Use parallel processing to speed up training. Tools: Joblib, Dask, Spark, XGBoost (n_jobs=-1 for all cores).
11. Applications of Machine Learning
Fields: Healthcare, Finance, Retail, Security, Transportation.
Tasks: Classification, Regression, Clustering, Recommendation.
12. Choosing Algorithms What Will Work?
Base on problem type, data size, feature type, training time, and accuracy needs.
Use trial-and-error, GridSearchCV, and metrics for evaluation.