0% found this document useful (0 votes)
906 views18 pages

Solution

This document discusses questions about Python concepts for data science. It contains 10 multiple choice questions covering topics like data types, NumPy arrays, Pandas operations, data wrangling and machine learning models. Correct answers are provided for self-assessment.

Uploaded by

S Akash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
906 views18 pages

Solution

This document discusses questions about Python concepts for data science. It contains 10 multiple choice questions covering topics like data types, NumPy arrays, Pandas operations, data wrangling and machine learning models. Correct answers are provided for self-assessment.

Uploaded by

S Akash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Python for Data Science

Week 1

1. What is the output of the following code? [1 marks]

(a) 36
(b) 121212
(c) 123
(d) Error: Invalid operation, unsupported operator ‘*’ used between ‘int’ and ‘str’

Answer: (b)

2. What is the output of the following code? [1 marks]

(a) -1
(b) -2
(c) -1.28
(d) 1.28

Answer: (b)

1
3. Consider a following code snippet. What is a data type of y? [1 marks]

(a) int
(b) float
(c) str
(d) Code will throw an error.

Answer: (c)

4. Which of the following variable names are INVALID in Python? [1 mark]

(a) 1 variable
(b) variable 1
(c) variable1
(d) variable#

Answer: a, d

5. While naming the variable, use of any special character other than underscore( ) will
throw which type of error? [1 mark]

(a) Syntax error


(b) Key error
(c) Value error
(d) Index error

Answer: a

6. Let x = “Mayur”. Which of the following commands converts the ‘x’ to float datatype?
[1 mark]

(a) str(float,x)
(b) x.float()
(c) float(x)
(d) Cannot convert a string to float data type

Answer: d

2
7. Which Python library is commonly used for data wrangling and manipulation? [1
mark]

(a) Numpy
(b) Pandas
(c) scikit
(d) Math

Answer: b

8. Predict the output of the following code. [1 mark]

(a) 12.0
(b) 12
(c) 11.667
(d) 11

Answer: b

9. Given two variables, j = 6 and g = 3.3. If both normal division and floor division
operators were used to divide j by g, what would be the data type of the value obtained
from the operations? [1 point]

(a) int, int


(b) float, float
(c) float, int
(d) int, float

Answer: b

3
10. Let a = 5 (101 in binary) and b = 3 (011 in binary). What is the result of the following
operation? [1 mark]

(a) 3
(b) 7
(c) 5
(d) 1

Answer: d

4
Python for Data Science
Week 2

1. Which of the following oject does not support indexing? [1 mark]

(a) tuple
(b) list
(c) dictionary
(d) set

Answer: d

2. Given a NumPy array, arr = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]]), what is the
output of the command, print(arr[0][1])?

(a) [[1 2 3]
[4 5 6]
[7 8 9]]
(b) [1 2 3]
(c) [4 5 6]
(d) [7 8 9]

Answer: c

3. What is the output of the following code?

(a) [2, 3, 4, 5]
(b) [0 1 2 3]

1
(c) [1, 2, 3, 4]
(d) Will throw an error: Set objects are not iterable.

Answer: c

2
4. What is the output of the following code? [1 mark]

(a)

3
(b)

(c)

(d)

Answer: c

4
5. Which of the following code gives output My friend’s house is in Chennai? [1
mark]

(a)

(b)

(c)

(d)

Answer: a, d

6. Let t1 = (1, 2, “tuple”, 4) and t2 = (5, 6, 7). Which of the following will not give any
error after the execution? [1 mark]

(a) t1.append(5)
(b) x = t2[t1[1]]
(c) t3 = t1 + t2
(d) t3 = (t1, t2)
(e) t3 = (list(t1), list(t2))

Answer: (b, c, d, e)

7. Let d = {1 : “Pyhton”, 2 : [1, 2, 3]}. Which among the following will not give the error
after the execution? [1 mark]

(a) d[2].append(4)

5
(b) x = d[0]
(c) d[“one”] = 1
(d) d.update({‘one’ : 2})
Answer: (a, c, d)
8. Which of the following data type is immutable? [1 mark]
(a) list
(b) set
(c) tuple
(d) dictionary
Answer: (c)
9. student = {‘name’: ‘Jane’, ‘age’: 25, ‘courses’: [‘Math’, ‘Statistics’]}
Which among the following will return
{‘name’: ‘Jane’, ‘age’: 26, ‘courses’: [‘Math’, ‘Statistics’], ‘phone’: ‘123-456’}
(a) student.update({‘age’ : 26})
(b) student.update({‘age’ : 26, ‘phone’: ‘123-456’})
(c) student[‘phone’] = ‘123-456’
student.update({‘age’ : 26})
(d) None of the above
Answer: (b, c)
10. What is the output of the following code? [1 mark]

(a) [‘M’, ‘A’, ‘H’, ‘E’, ‘S’, ‘H’]


(b) [‘m’, ‘a’, ‘h’, ‘e’, ‘s’, ‘h’]
(c) [‘M’, ‘a’, ‘h’, ‘e’, ‘s’, ‘h’]
(d) [‘m’, ‘A’, ‘H’, ‘E’, ‘S’, ‘H’]
Answer: (a)

6
Python for data science

Week 3 assignment

1. Which of the following is the correct approach to fill missing values in case of categorical
variable? [1 mark]

(a) Mean
(b) median
(c) Mode
(d) None of the above

Answer: c
Assume a pandas dataframe df cars which when printed is as shown below. Based on
this information, answer questions 2 and 3.

2. Of the following set of statements, which of them can be used to extract the column
Type as a separate dataframe? [1 mark]

(a) df cars[[‘Type’]]
(b) df cars.iloc[[:, 1]
(c) df cars.loc[:, [‘Type’]]
(d) None of the above

Answer: a, c

1
3. The method df cars.describe() will give description of which of the following column?
[1 mark]

(a) Car name


(b) Brand
(c) Price (in lakhs)
(d) All of the above

Answer: c

4. Which pandas function is used to stack the dataframes vertically? [1 mark]

(a) pd.merge()
(b) pd.concat()
(c) join()
(d) None of the above

Answer: b

5. Which of the following are libraries in Python? [1 mark]

(a) Pandas
(b) Matplotlib
(c) NumPy
(d) All of the above

Answer: d
Read the ‘flavors of cocoa.csv’ file as a dataframe ‘df cocoa’ and answer questions 6-9.
The description of features/variables is given below:

Variable Description
ID Serial no.
Company Name of a manufacturing company
Bean Origin Place of origin of cocoa bean
Review Data Year in which chocolates were rated
Cocoa percent Percentage of cocoa in chocolate
Company Location Location of a manufacturing company
Rating Rating of chocolates

6. Which of the following variable have null values? [1 mark]

(a) ID

2
(b) Company
(c) Review Date
(d) Rating

Answer: c

7. Which of the following countries have maximum locations of cocoa manufacturing


companies? [1 mark]

(a) U.K.
(b) U.S.A.
(c) Canada
(d) France

Answer: b

8. After checking the data summary, which feature requires a data conversion considering
the data values held? [1 mark]

(a) Rating
(b) Review date
(c) Company
(d) Bean origin

Answer: b

9. What is the maximum rating of chocolates? [1 mark]

(a) 1.00
(b) 5.00
(c) 3.18
(d) 4.00

Answer: b

10. What will be the output of the following code? [1 mark]

3
(a) [bool, int, float, float, str]
(b) [str, int, float, float, str]
(c) [bool, int, float, int, str]
(d) [bool, int, int, float, str]

Answer: a

4
Python for data science

Week 4 assignment

1. Which of the following are regression problems? Assume that appropriate data is given.
[1 mark]

(a) Predicting the house price.


(b) Predicting whether it will rain or not on a given day.
(c) Predicting the maximum temperature on a given day.
(d) Predicting the sales of the ice-creams.

Answer: a, c, d

2. Which of the followings are binary classification problems? [1 mark]

(a) Predicting whether a patient is diagnosed with cancer or not.


(b) Predicting whether a team will win a tournament or not.
(c) Predicting the price of a second-hand car.
(d) Classify web text into one of the following categories: Sports, Entertainment, or
Technology.

Answer: a, b

3. If a linear regression model achieves zero training error, can we say that all the data
points lie on a hyperplane in the (d+1)-dimensional space? Here, d is the number of
features. [1 mark]

(a) Yes
(b) No

Answer: a
Read the information given below and answer the questions from 4 to 6:

Data Description:

An automotive service chain is launching its new grand service station this weekend.
They offer to
service a wide variety of cars. The current capacity of the station is to check 315 cars
thoroughly per

1
day. As an inaugural offer, they claim to freely check all cars that arrive on their
launch day, and
report whether they need servicing or not!

Unexpectedly, they get 450 cars. The servicemen will not work longer than the working
hours, but the data analysts have to!

Can you save the day for the new service station?

How can a data scientist save the day for them?

He has been given a data set, ‘ServiceTrain.csv’ that contains some attributes of the
car that can be easily measured and a conclusion that if a service is needed or not.

Now for the cars they cannot check in detail, they measure those attributes and store
them in ‘ServiceTest.csv’

Problem Statement:

Use machine learning techniques to identify whether the cars require service or not

Read the given datasets ‘ServiceTrain.csv’ and ‘ServiceTest.csv’ as train data


and test data respectively and import all the required packages for analysis.

4. Which of the following machine learning techniques would NOT be appropriate to solve
the problem given in the problem statement? [1 mark]

(a) kNN
(b) Random Forest
(c) Logistic Regression
(d) Linear regression

Answer: d
Prepare the data by following the steps given below, and answer questions
6 and 7.

• Encode categorical variable, Service - Yes as 1 and No as 0 for both the train and
test datasets.

2
• Split the set of independent features and the dependent feature on both the train
and test datasets.
• Set random state for the instance of the logistic regression class as 0.

5. After applying logistic regression, what is/are the correct observations from the resul-
tant confusion matrix? [1
mark]

(a) True Positive = 29, True Negative = 94


(b) True Positive = 94, True Negative = 29
(c) False Positive = 5, True Negative = 94
(d) None of the above

Answer: a, c

6. The logistic regression model built between the input and output variables is checked
for its prediction accuracy of the test data. What is the accuracy range (in %) of the
predictions made over test data? [1 mark]

(a) 60 - 79
(b) 90 - 95
(c) 30 – 59
(d) 80 – 89

Answer: b

7. How are categorical variables preprocessed before model building? [1 mark]

(a) Standardization
(b) Dummy variables
(c) Correlation
(d) None of the above

Answer: b
The Global Happiness Index report contains the Happiness Score data with
multiple features (namely the Economy, Family, Health, and Freedom) that
could affect the target variable value.

Prepare the data by following the steps given below, and answer question
8

3
• Split the set of independent features and the dependent feature on the given
dataset
• Create training and testing data from the set of independent features and depen-
dent feature by splitting the original data in the ratio 3:1 respectively, and set the
value for random state of the training/test split method’s instance as 1

8. A multiple linear regression model is built on the Global Happiness Index dataset
‘GHI Report.csv’. What is the RMSE of the baseline model? [1 mark]

(a) 2.00
(b) 0.50
(c) 1.06
(d) 0.75

Answer: c

9. A regression model with the following function y = 60 + 5.2x was built to understand
the impact of humidity (x) on rainfall (y). The humidity this week is 30 more than
the previous week. What is the predicted difference in rainfall? [1 mark]

(a) 156 mm
(b) 15.6 mm
(c) -156 mm
(d) None of the above

Answer: a

10. X and Y are two variables that have a strong linear relationship. Which of the following
statements are incorrect? [1 mark]

(a) There cannot be a negative relationship between the two variables.


(b) The relationship between the two variables is purely causal.
(c) One variable may or may not cause a change in the other variable.
(d) The variables can be positively or negatively correlated with each other.

Answer: a, b

You might also like