1. What is Machine Learning?
Answer: Machine Learning is a field of artificial intelligence that focuses on the development of
algorithms and statistical models that enable computer systems to perform a task without
explicit programming. It involves the construction of models that learn patterns and make
predictions from data.
2. Explain the Three Types of Machine Learning:
Answer:
Supervised Learning: The algorithm is trained on a labeled dataset, and it learns to map
input data to the corresponding output.
Unsupervised Learning: The algorithm is given unlabeled data and must find patterns or
relationships within it.
Reinforcement Learning: The algorithm learns by interacting with its environment and
receiving feedback in the form of rewards or penalties.
3. What is Overfitting? How Can It Be Prevented?
Answer: Overfitting occurs when a model is too complex and learns noise in the training data
rather than the underlying patterns. To prevent overfitting, techniques such as cross-validation,
regularization, and using more data can be employed.
4. Differentiate Between Bias and Variance:
Answer:
Bias: Error introduced by approximating a real-world problem, which may be extremely
complex, by a much simpler model.
Variance: Error introduced by having too much complexity in the model, leading it to fit
the training data too closely.
5. What is Cross-Validation?
Answer: Cross-validation is a technique used to assess the performance of a machine learning
model. It involves dividing the dataset into multiple subsets, training the model on some
subsets, and testing it on the remaining subsets to evaluate its generalization performance.
6. Explain the ROC Curve:
Answer: The Receiver Operating Characteristic (ROC) curve is a graphical representation of the
trade-off between true positive rate (sensitivity) and false positive rate (1-specificity) at various
thresholds. It is commonly used to assess the performance of binary classification models.
7. What is Feature Engineering?
Answer: Feature engineering involves selecting, transforming, or creating new features from the
raw data to improve the performance of machine learning models. It plays a crucial role in
enhancing a model's ability to capture relevant patterns.
8. What is the Curse of Dimensionality?
Answer: The curse of dimensionality refers to the challenges that arise when working with high-
dimensional data. As the number of features (dimensions) increases, the amount of data
needed to generalize accurately also increases exponentially.
9. Explain Bagging and Boosting:
Answer:
Bagging (Bootstrap Aggregating): A technique where multiple models are trained on
different subsets of the training data, and their predictions are aggregated to improve
accuracy and reduce overfitting.
Boosting: A technique where weak learners are combined sequentially, with each
learner correcting the errors of its predecessor, leading to a stronger overall model.
10. What is the Difference Between Classification and Regression?
markdownCopy code
- **Answer:** - **Classification:** Involves predicting a category or label (e.g., spam or not spam). -
**Regression:** Involves predicting a continuous value (e.g., predicting house prices).
11. What is the Bias-Variance Tradeoff?
Answer: The bias-variance tradeoff is the balance between underfitting (high bias) and
overfitting (high variance) in a machine learning model. It involves finding the right level of
model complexity that minimizes both bias and variance to achieve optimal generalization.
12. Explain the K-Nearest Neighbors (KNN) Algorithm:
Answer: KNN is a simple, non-parametric algorithm used for classification and regression. It
assigns a new data point the majority class (for classification) or the average of the k-nearest
neighbors' values (for regression) based on a distance metric.
13. What is the Purpose of Regularization in Machine Learning?
Answer: Regularization is used to prevent overfitting in machine learning models by adding a
penalty term to the objective function. It discourages overly complex models by penalizing large
coefficients, promoting smoother and more generalized models.
14. What is Precision and Recall in the Context of Classification?
Answer:
Precision: The ratio of correctly predicted positive observations to the total predicted
positives. It measures the accuracy of the positive predictions.
Recall (Sensitivity): The ratio of correctly predicted positive observations to the total
actual positives. It measures the ability of the model to capture all the positives.
15. What is Cross-Entropy Loss?
Answer: Cross-entropy loss, also known as log loss, is a measure of the difference between the
predicted probabilities and the actual class labels. It is commonly used in classification problems
and aims to minimize the dissimilarity between the predicted and true probability distributions.
16. Differentiate Between Batch Gradient Descent and Stochastic Gradient Descent (SGD):
Answer:
Batch Gradient Descent: Updates the model's parameters using the entire training
dataset in each iteration, making it computationally expensive but providing a stable
convergence.
Stochastic Gradient Descent (SGD): Updates the model's parameters using only one
randomly selected training sample in each iteration, making it computationally less
expensive but more susceptible to noise.
17. What is the Purpose of Activation Functions in Neural Networks?
Answer: Activation functions introduce non-linearity to the neural network, allowing it to learn
complex relationships in the data. Common activation functions include ReLU (Rectified Linear
Unit), Sigmoid, and Tanh.
18. Explain the Concept of Ensemble Learning:
Answer: Ensemble learning involves combining multiple individual models (learners) to create a
more robust and accurate model. Common ensemble methods include Bagging, Boosting, and
Stacking.
19. What is the Difference Between L1 and L2 Regularization?
Answer:
L1 Regularization (Lasso): Adds the absolute values of the coefficients to the objective
function, leading to sparsity by encouraging some coefficients to become exactly zero.
L2 Regularization (Ridge): Adds the squared values of the coefficients to the objective
function, penalizing large coefficients and promoting a more even distribution.
20. Explain the Concept of Hyperparameter Tuning:
Answer: Hyperparameter tuning involves finding the optimal values for the hyperparameters of
a machine learning model. This is typically done through techniques such as grid search or
random search to optimize the model's performance.