0% found this document useful (0 votes)
148 views3 pages

CS3352 FDS Solved 2024

Foundation of data science nov/dec 2024

Uploaded by

dharni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views3 pages

CS3352 FDS Solved 2024

Foundation of data science nov/dec 2024

Uploaded by

dharni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

CS3352 – Foundations of Data Science

(Nov/Dec 2024)
Solved Question Paper with Answers

Part A (10 × 2 = 20 marks)


 Q: Define Data Science.

A: Data Science is an interdisciplinary field that uses scientific methods, statistics,


algorithms, and systems to extract knowledge and insights from structured and
unstructured data.

 Q: What is the difference between Data, Information, and Knowledge?

A: Data: Raw facts (e.g., 25, 30).


Information: Processed data with context (e.g., Average age = 27.5).
Knowledge: Actionable insight (e.g., Most students are in their 20s → target group for a
course).

 Q: Define Overfitting.

A: Overfitting occurs when a model learns the training data—including noise—too well,
performing excellently on training but poorly on unseen data.

 Q: List any two Python libraries used in Data Science.

A: NumPy, Pandas, Matplotlib, Scikit-learn (any two).

 Q: State Bayes’ Theorem.

A: P(H|E) = [P(E|H) * P(H)] / P(E). Used for probabilistic inference and classification (e.g.,
Naïve Bayes).

 Q: What is Normalization in Data Preprocessing?

A: Technique to scale numerical values into a standard range (e.g., [0,1]) to avoid bias from
larger-valued features.

 Q: Differentiate between Supervised and Unsupervised learning.

A: Supervised: Uses labeled data (e.g., classification).


Unsupervised: Uses unlabeled data (e.g., clustering).

 Q: Define Feature Engineering.


A: The process of selecting, creating, or transforming input variables (features) to improve
model performance.

 Q: What is Data Visualization? Give an example.

A: The graphical representation of data for easier interpretation. Example: Bar charts for
sales data, Heatmaps for correlations.

 Q: What is the significance of p-value in hypothesis testing?

A: The probability of observing the data if the null hypothesis is true. Small p-value (<0.05)
→ reject H0.

Part B (5 × 13 = 65 marks)

11(a). Explain the Data Science Lifecycle with neat diagram.


Steps: Business Understanding → Data Collection → Data Cleaning & Preprocessing →
Exploratory Data Analysis → Modeling → Evaluation → Deployment → Feedback.

Diagram: Represented as a circular flow, emphasizing iteration.

11(b). Explain the different types of Data with examples.


• Structured (tables in DB).
• Semi-structured (JSON, XML).
• Unstructured (text, images, video).
Examples provided for each.

12(a). What are the steps involved in Data Preprocessing? Explain.


Data cleaning (missing values, outliers), Integration, Transformation, Reduction,
Discretization.

12(b). Compare Descriptive, Predictive, and Prescriptive Analytics.


• Descriptive: What happened? (reports, dashboards).
• Predictive: What will happen? (forecasting, ML).
• Prescriptive: What should we do? (optimization, recommendations).

13(a). Explain different statistical measures used in Data Science.


• Central tendency: mean, median, mode.
• Dispersion: variance, std dev, IQR.
• Correlation & covariance.

13(b). Compute Mean, Median, Mode, Variance for data [2, 4, 4, 4, 5, 5, 7, 9].
Mean = 5
Median = 4.5
Mode = 4
Variance = 4
Std Dev = 2
14(a). Explain various data visualization techniques with examples.
Bar chart, Pie chart, Histogram, Scatter plot, Box plot, Heatmap. Each has different use cases.

14(b). With Python code, demonstrate data visualization.

import pandas as pd
import matplotlib.pyplot as plt

data = {'Month':['Jan','Feb','Mar','Apr'], 'Sales':[220,330,150,400]}


df = pd.DataFrame(data)

plt.plot(df['Month'], df['Sales'], marker='o')


plt.title('Monthly Sales')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()

15(a). Explain different Machine Learning algorithms used in Data Science.


Supervised: Regression, Decision Trees, SVM.
Unsupervised: K-means, Hierarchical clustering.
Reinforcement: Q-learning.

15(b). Compare Linear Regression and Logistic Regression.


Linear Regression → Predicts continuous values.
Logistic Regression → Predicts class probabilities.

Part C (1 × 15 = 15 marks)

16. Case Study: Bank Loan Default Prediction


Business understanding: Classification problem.
Data collection: Past loan records.
Preprocessing: Handle missing values, normalize, encode.
Feature selection: Age, income, credit score.
Apply ML algorithms: Logistic Regression, Decision Tree.
Evaluation: Accuracy, precision, recall, F1, ROC.
Deployment: Integrated into loan approval system.

You might also like