0% found this document useful (0 votes)
15 views51 pages

Practice Problem Set 1 DS Course

The document contains statistical analyses and calculations related to various datasets, including mean, variance, standard deviation, and probabilities. It covers topics such as Chebyshev's theorem, binomial distributions, and Poisson distributions, providing insights into data distributions and expected values. Additionally, it discusses the implications of different statistical measures and their applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views51 pages

Practice Problem Set 1 DS Course

The document contains statistical analyses and calculations related to various datasets, including mean, variance, standard deviation, and probabilities. It covers topics such as Chebyshev's theorem, binomial distributions, and Poisson distributions, providing insights into data distributions and expected values. Additionally, it discusses the implications of different statistical measures and their applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Q1 Range

SD
IQR

Q2 Mean 50
Variance 16
SD 4

a) Between 40 and 60
z 2.5
From Chebyshev theorem 0.84
Atleast 84% data lies between 40 and 60 i.e. 84000 students

b) Greater than 55
Between 45 to 55
z 1.25
From Chebyshev theorem 0.36
Atleast 36% data lies between 45 to 55
So at max 64% data can lie beyond 55
i.e. At max 64000 students can lie beyond 55

Q3 a) Total rice production 114318.7


No. of districts with positive production 281
Mean rice production in districts with positive production 406.82811

b) SD 546.11015
Range 4131.43

c) Mean 2249.72
SD 1149.2802

z 2.3930456
From Chebyshev theorem 0.8253784
So at max 17.46% of data can lie beyond 5000
i.e. At max 54 students can lie beyond 5000

d) Average of individual yields ≠ Total crop production / Total crop area

Q4 0

Q5 meanC 25
varC 5

meanF 77
varF 16.2
Q6 X Runs in a ball Runs (per balProbability
a) P(X <= 2) 0.8 0 0.3
1 0.3
b) E[X] 1.545 2 0.2
6*E[X] 9.27 3 0.05
4 0.1
c) Distribution under new conditions 5 0.005
Runs (per balProbability 6 0.045
0 0.335
1 0.285
2 0.19
3 0.0475 X X runs in a ball
4 0.095 W batsman getting out
5 0.00475 P(X) = P(X|W)*P(W) + P(X|W')*P(W')
6 0.04275 X=0 1*0.05+0.3*0.95
X=1 0*0.05+0.3*0.95
P(X <= 2) 0.81
E[X] 1.46775
6*E[X] 8.8065

Q7 A Cycle from A
B Cycle from B
C Cycle from C

D Defective cycle

P(A) 0.75
P(B) 0.15
P(C) 0.1

P(D|A) 0.04
P(D|B) 0.06
P(D|C) 0.08

P(A|D) = P(D|A)*P(A) / P(D)

P(D) = P(D|A)*P(A) + P(D|B)*P(B) + P(D|C)*P(C)


P(D) 0.047

P(A|D) 0.6382979

Q8 p 0.05
n 10
P(x>=2, n = 10)
Binominal distribution
P(x>=2, n = 10) = 1 - P(x<=1,n=10)
0.0861384

Q9 P result positive
N result negative

C with CAD
C' without CAD

P(P|C) 0.95
P(P|C') 0.1

P(N|C) 0.05
P(N|C') 0.9

p P©

P(C|P)= P(P|C)*P(C)/P(P)

P(P) = P(P|C)*P© + P(P|C')*P(C')

0.75 = 0.95*p/(0.95*p + 0.1*(1-p) )

p 0.24

Q10 It will cancel shipment if defect occurs at (1st,5th) (2nd,fifth)…(4th,5th)


p defect 0.05
p not defect 0.95
0.0428688
P(one defect in first four and one defect in fifth item) 0.0085738 0.0085738

Q11 A Project received go-ahead from compliance


B Project got approved by Data science

P(A) 0.5
P(B|A) 0.6666667
P(B|A') 0.3333333

a) P(B) = P(B|A)*P(A) + P(B|A')*P(A')


P(B) 0.5

b) 0.5

c) P(A|B) = P(B|A)*P(A)/P(B)
P(A|B) 0.6666667
P(x=1, n = 2, p = 2/3) 0.4444444

Q12 a) Option 1 Invested Mean Var


A 600 630 720 Mean of total worth of investment
B 400 424 480 Variance

Option 2 Invested Mean Var


A 400 420 320 Mean of total worth of investment
B 600 636 1080 Variance

b) P(1100<X<1500)=P(X<1500)-P(X<1100)
P(1100<X<1500) 0.1301805

c) P(1100<X<1500)=P(X<1500)-P(X<1100)
P(1100<X<1500) 0.1544788

d) Y = Option 1 - Option 2
Mean -2
Variance 3540.6041

P(Y>0) = 1 - P(Y<0)
P(Y>0) 0.4865934

Q13 Weight Mean SD


A 0.4 59 10
B 0.4 67 13
C 0.2
overall 65 7

overall mean = weighted average of all means


65 = 0.4*59 + 0.4 * 67 + 0.2 * mu©
mu(C) 73

overall variance = 0.16*var(A) + 0.16*var(B) + 0.04*var©


var(C) 149
std(C) 12.206556

Q14 a The plot cannot appropriately show the minor change from year to year. Similar plot for yearl
b It shows that 2015 temperature is higher than the average, but it is not enough to conclude th
c No. This plot does not talk about the shape of the distribution. It rather shows that there is a s
d Mean: Assuming that for each state the mean is calculated using the same number of observa
Median: the overall median cannot be computed without knowing the distribution of the tem
Standard deviation: It can be computed if we assume that number of observations per state is
57.1227
2.5193187

Q15 L Person has low viral load


P Person tests Positive
H Person has high viral load
D Person doesn't have covid

P(P|L) 0.6
P(P|H) 0.85
P(P'|D) 0.995 P(P|D) 0.005
P(D') 0.2 P(D) 0.8
P(P) 0.144

a) P(L|D') = P(L∩D')/P(D') = P(L)/P(D')

P(P) = P(P|D)*P(D) + P(P|L)*P(L) + P(P|H)*P(H)


P(L)+P(H)+P(D) = 1

P(L) 0.12
P(H) 0.08

P(L|D') 0.6

b) p P(P|L) 0.6
n 50

np 30 > 5
n(1-p) 20 > 5

Binomial approximated as normal


mean 30
std 3.4641016 0.1443376
P(X<30) P(X<=29.5)
P(X<=30.5) 0.557383

c) p P(P|H) 0.85
n 50

np 42.5 > 5
n(1-p) 7.5 > 5

Binomial approximated as normal


mean 42.5
std 2.5248762
P(X<=40.5) 0.2141459
P(X>40.5) 0.7858541

d) 72

Q16 Mean 45000


SD 5000

a) P(S>55000) = 1 - P(S<=55000)
P(S>55000) 0.0227501

b) p P(S>60000) 0.0013499
n 3
P(W=1, S>60000) 0.0040388

c) p P(40000<S<50000) 0.6826895
n 200

np 136.5379 > 5
n(1-p) 63.462102 > 5

Binomial approximated as normal


mean 136.5379
std 6.582166

P(X>156.5)=1-P(X<=156.5) 0.0012117

d) Mean 90000
SD 7071.0678

P(S>82929)=1-P(S<=82929) 0.8413424

Q17 Mean 0.4

Poisson distribution

a) P(x<=10) 0.5830398
P(x>10) 0.4169602

b) P(x<3) = Pr(x<=2) 0.2381033

c) P(x=0) 0.0608101

Q18 Working hours per day 18


Average packages processed per day 360

Small Droner Bulky


Weight (lbs) 2-4 Bigger and/or
All except Droner heavier than
Largest Diagonal (inches) and Bulky 12-24 drone
Transport Bicycle Drones Vans

Weight Exp dist. With mean = 6 lbs


Largest Diagonal Uni dist. Between (6, 36 inches)

a) Average packages processed per hour 20


Poisson distribution
P(X>=15) = 1-P(X<15) 0.8951357

b) P(2<W<4) 0.2031142
P(12<D<24) 0.4

p P(Package = Droner) 0.0812457


n 15

P(T<=3) 0.9712726

c) P(W>4) 0.5134171 P(W<4) 0.4865829


P(D>24) 0.4 P(D<24) 0.6

p P(Package=Bulky) 0.7080503
n 15

P(T>=11)=1-P(T<11) 0.5432345
Total crop area
X|W')*P(W')
al worth of investment 1054
1670.302

al worth of investment 1056


1870.302

year. Similar plot for yearly average from year to year would have shown the increase in global temperature.
not enough to conclude that the global temperature is rising
ther shows that there is a seasonal pattern across different months
e same number of observations, the overall mean can be computed and it is 24.23
the distribution of the temperature
of observations per state is equal and if the observations in the three states on a certain time point are independent. In that ca
re independent. In that case, the overall standard deviation will be 2.52. The independence assumption is not valid since temp
tion is not valid since temperatures in neighbouring states are usually dependent on each other.
x-u sqr f*sqr
0 0 -3.5104 12.32290816 0
1 0 -2.5104 6.30210816 0
2 0.2 -1.5104 2.28130816 0.456261632
3 0.32 -0.5104 0.26050816 0.083362611
4 0.288 0.4896 0.23970816 0.06903595
5 0.1536 1.4896 2.21890816 0.340824293
6 0.0384 2.4896 6.19810816 0.238007353

1.18749184 sum(f*sqr)
expected 3.5104 1.089720992 sqrt(sum)
std dev 1.089720992
X Prob (X-u) sqr Prob * sqr
1 0.1 -7.5 56.25 5.625
2 0.2 -6.5 42.25 8.45
8 0.4 -0.5 0.25 0.1
16 0.3 7.5 56.25 16.875

part b) expected value 8.5


std dev 5.572252686

part c) Here we are talking about the time it takes to complete a project
Therefore a random variable denoting the time of completion will be exponential in natu

so let Xi be the random variable denoting the time it takes to develop i algorithm

since n = 100 > 30, therefore we can use CLT here

let Y be the sum of all Xi

Y will be a normal distribution, by using CLT

and we can say that n


mean of Y 850
std dev of Y 55.72252686

P(sum < 1000) => P(Y < 1000)


here we have not used any correction factor since we are transforming a co
sum 31.05
std dev 5.572252686

etion will be exponential in nature

s to develop i algorithm

100

r since we are transforming a continuous distribution to continuous only


mean 2.5 mins/cust

a) mean for 40 customers 100 mins

so we need to find P(X>120) where X is a random variable denoting the time it takes for

P(X>120) = e ^ ( - x / u ) 0.3011942119

b) P(X>2.5) 0.367879441
n 40

P(X<=20) where X is a random variable denoting the number of people who take more th
this is a binomial
Check np and n(1-p)
approx to normal
find value

P(X<=20) => P(Y<=20.5)


denoting the time it takes for zoobi to reach

er of people who take more than 2.5 mins


Lalbagh School of Business

Background Share Placement Share


Engineering 60% Consulting 36%
Non-engineer 40% GenMan 64%

Conventions >>> Event Probability


Engineering E P(E) 0.6
Non-engineerNE P(NE) 0.4
Consulting C P(C) 0.36
GenMan G P(G) 0.64
WorkEx W P(G|NE & NW) 0.9
No WorkEx NW P(C|NE & NW) 0.1
P(C|E & W) 0.5
P(G|E & W) 0.5
P(G|E & NW) 4*P(C|E & NW)
P(G|E & NW) 0.8
P(C|E & NW) 0.2
P(C|NE & W) 1.5*P(G|NE & W)
P(C|NE & W) 0.6
P(G|NE & W) 0.4
P(NW|NE & C) 0.1

A) P(W) (x+y)/100
B) P(NW & G) (0.8*(60-x)+0.9*(40-y))/100
P(G|NW)
P(G|W)
not having work experience increases chances of getting int
Total 100

Engineering Non-engineering

60 W NW 40
C 0.5*x 0.2*(60-x) C
G 0.5*x 0.8*(60-x) G
x 60-x

P(C) 0.5*x + 0.2*(60-x) + 0.6*y + 0.1*(40-y) = 36


P(NW|NE & C) 0.1*(40-y)/(0.6*y + 0.1*(40-y)) = 1/11

Solving these 2 equation


x 25
y 25

1/11

0.5
0.415
0.83
0.45
es chances of getting into genman role
Non-engineering

W NW
0.6*y 0.1*(40-y)
0.4*y 0.9*(40-y)
y 40-y

Engineering Non-engineering

60 W NW 40
C 12.5 7 19.5 C
G 12.5 28 40.5 G
25 35
Non-engineering

W NW
15 1.5 16.5
10 13.5 23.5
25 15
lambda 2

part a) atleast 2
0.59399415 P(X=2) 1 - P(X=1) - P(X=0)

part b) p 0.59399415
n 31
x 18

since np and n(1-p) both are greater than 5, we can approximate as normal

new mean 18.41381866


new std dev 2.734249091

P(X=18) => P(17.5<=Y<=18.5)


find Z, and calc probability

part c)
P(X=0) 0.135335283 p

p^4 * (1-p) 0.000290063


oximate as normal
x prob x-u sqr f(x) * sqr
0 0.82 -0.36 0.1296 0.106272
1 0.05 a 0.64 0.4096 0.02048
2 0.1 b 1.64 2.6896 0.26896
3 0.01 c 2.64 6.9696 0.069696
4 0.02 d 3.64 13.2496 0.264992

b + d = 2(a * c) 0.7304
b = 2a part b) 0.854634425
a + b + c + d = 0.18

part a) 0.36
part b)

part c) n 7
p 0.1

np 0.7
np(1-p) ^ 0.5 0.793725393
lambda 1 iphone/6 hours

part a) P(not selling in first three hours) * P(selling 2 in last six hours)

lambda 0.5 lambda 1


x 0 x 2
prob 0.60653066 prob 0.183939721

answer 0.11156508

part b) let Xi be the random variable for the number of iphones sold on day i

n = 31, as n > 30 we can apply CLT

sum of Xi needs to be greater than 45

let Y be a random variable denoting sum of Xi, Y be normal because of CLT

mean of Y n * mean 46.5


std dev Y sqrt(n) * std dev(poisson) sqrt(46.5)

P(sum>45) => P(Y>=45.5) => 1 - P(Y <= 45.5)

part c) p 0.333333333 the probability of an iphone being sold is unifor


because the distribution for all hours remain th
n 200

this is a binominal distribution


approximate this as normal by checking np and n(1-p)
calculate new stats
use them to find probability

P(X<60) => P(Y<=59.5)


old on day i

al because of CLT

n 31
lambda 1 iphone/6 hours
1.5 iphone/day

f an iphone being sold is uniform over the day


ribution for all hours remain the same
100000 price

part b) 2 log(price)

4.4 mean
0.49 variance
0.7 std dev

P(X>=100000)0.999696617

part c) apply chebyshev's


hardware total 80 software total
mean 48 mean
std dev 4 std dev

Part a) Total T 6/8 H + 4/6 S


mean 64
variance 7

Given: H 52
Given: S 45
T= 69
z 1.889822365
prob 0.970609139

Part b) Var(T) (6/8)^2Var(H) + (4/6)^2 Var(S) + 2 (6/8)(4/6)Cov(H, S)

Cov(H, S) -6

Y H-S

u 6
Var 37
std dev 6.08276253

x 0
P(Y<=0) 0.161969923
answer 0.838030077
60
42
3
mean 2 (as 50% people do within this)

Part A) it is given that within 1.5 weeks 15.87% people book

z -0.99981509
x 1.5
u 2

sigma 0.50009247
this is sigma of population

Part B) n 45
s 0.50009247 Here it is important to note that even though we are given th
x-bar 2 we are not talking about the probability on average
x 3 if that was the case, we would have taken sigma as sigma / sq
since we are not talking about average time, we continue wit
z 1.999630187
prob 0.977229894 in short, only when talking about average of a sample taken f

answer 0.190644939
at even though we are given the sample size
robability on average
have taken sigma as sigma / sqrt(n)
average time, we continue with the population number

out average of a sample taken from the population, use sigma = sigma / sqrt(n)
Year Count Average Funding Std Dev Total Funding
2015 640 9.72 25.9 6220.8
2016 586 6.53 19.6 3826.58
2017 453 11.3 34 5118.9
2018 263 17.2 39.9 4523.6
2019 101 49.1 87.5 4959.1
2020 7 55.7 82.9 389.9
2050 12.21 25038.88

A)
We can see total funding has decreased

B) 12.21 Estimated funding


12.22

C)
X mean SD X-mean
70 13 37 57

P[(X-mean)/SD > c]<=1/c^2


0.421360419
Therefore, definitely less than 42%
tal Funding City Count Average FundStd Dev Total Funding
Bengaluru 575 13 37 7475
NCR region 559 12 36.9 6708
Mumbai 397 10.9 27.5 4327.3
Other cities 519 12.6 37.7 6539.4
2050 12.22 25049.7

(X-mean)/SD
1.54
accepted papers N(300,100)
rejected papers Normal P(review time<200 | rejected) = 2*P(review
P(review time>250 | rejected) = 0.05
P(review time<=250 | rejected) = 0.95
P(review time > 250) = 0.25

A)
P(review time<=250 | rejected) = 0.95

P(X<=t) z-value
0.95 1.645 P(Z <= 1.645) = 0.95

P(review time<200 | rejected) = 2*P(review time<200 | accepted)

P(review time<200 | accepted)


x mean SD z P(Z<=z)
200 300 100 -1 0.159

P(T<200 | A) 0.159
P(T<200 | R) 0.317

B)
P(T>250) = P(T>250|A)* P(A) + P(T>250|R)*P® = 0.25
P(A) p
P(R) 1-p Prob x
P(T>250|A) 250
P(T>250|R) 250

p*0.691 + (1-p)*0.05 = 0.25


p 0.312
1-p 0.688

C) P(A|T>250)

P(A|T>250) = P(T>250|A)*P(A)/P(T>250)
0.862
00 | rejected) = 2*P(review time<200 | accepted)
50 | rejected) = 0.05
250 | rejected) = 0.95
250) = 0.25

(x-mean)/SD = z
x mean SD z
250 M S 1.645
200 M S -0.475

(250-M)/S = 1.645 250-M = 1.645S


(200-M)/S = -0.475 200-M = -0.475S
Solving these equation gives
M 211.21
S 23.58

mean SD z probability
300 100 -0.500 0.691
211.21 23.58 1.645 0.050
seconds >>> prob fo correct counting
n mean p
1000 6 0.99

T ~ Exp(lambda)
mean 1/lambda 6.0000
lambda 1/mean 0.1667

A) P(T>t) = e^(-lambda*t)

P(T>9) 0.2231

B) 36 random variables following exponential distribution

n 36 RV
mean 6.0000
SD 1

CLT: Sample Average is normally distributed

C) (1000C10) * (p^(1000-10)) * ((1-p)^10)

0.1257
mean 1/lambda 6.0000
SD mean 6.0000
Units >>>
Time Taken 50 mins
Visitor Rate 48 per hour
Capacity 10 people

A)
P(X<10)

lambda 0.8 visitors per min


lambda_10 min 8 visitors per 10 min
X Poisson(8 min)

P(X<10) = P(X<=9) Probability of not being full p


0.717 p

B) Probability of first 2 being not full = 0.717^2


Probability of being full = 1-0.717
Ans 0.146

C) Atleast 6 10C6 * p^6 * (1-p)^4


10C7 * p^7 * (1-p)^3
10C8 * p^8 * (1-p)^2
10C9 * p^9 * (1-p)^1
10C10 * p^10 * (1-p)^0
Sum of all

D) P(4th & 5th being full |first 3 were not full) = P(4th & 5th being full) = (1-p)^2
As both are independent events

Ans 0.080
Arrival Leave Main Gate Waiting Period Good Explanation of Poission Process: htt
7:50 AM 8:00 AM 10 mins
8:50 AM 9:00 AM 10 mins
9:50 AM 10:00 AM 10 mins
10:50 AM 11:00 AM 10 mins
11:50 AM 12:00 PM 10 mins
12:50 PM 1:00 PM 10 mins
1:50 PM 2:00 PM 10 mins
2:50 PM 3:00 PM 10 mins
3:50 PM 4:00 PM 10 mins
Total Trips 9

0.717

ull) = (1-p)^2
n of Poission Process: https://www.probabilitycourse.com/chapter11/11_1_2_basic_concepts_of_the_poisson_process.php
f_the_poisson_process.php
P(bet) 0.8
P(C win) p
P(C win 2 betsp^2 0.09

A)
p 0.3

B)
P(F win 2) (1-p)^2 0.49 p2

C)
p1 P(Chatur win 2 bets) 0.09

P(C win 2 bets on atleast 2 days) = 1 - P(C win 2 bets on 1 day or no days)
0 day 7C0 * p1^0 * (1-p1)^7 0.52
1 day 7C1 * p1^1 * (1-p1)^6 0.36

Ans 0.13

D)
(7C2)*(p1^2) * (5C4)*(p2^4) * (1C1)*(p3)
C win 2 on 2 days 0.1701
F win 2 on 4 days 0.2882
Draw on 1 day 0.42
0.021

E)
Expected number of bets to be won by Chatur in month of June

# days 30
# bets 48
# C wins 14.4
p1 C win 2 0.09
p2 F win 2 0.49
p3 Draw 0.42
1

day or no days)
n 287

A)
Variable Name Variable Type
Sector Categorical (Nominal)
Full Time Employees Quantitative (Discrete)
Total ESG Risk Score Quantitative (Continuous)
ESG Risk Level Categorical (Ordinal) (Negligible < Low < Medium <
nal) (Negligible < Low < Medium < High)
A)
Jedi Jabba
Mean M1 M2 M1>M2
SD S1 S2 S1>S2

Greater mean return: Jedi Ltd.


More susceptible to large losses (riskier): Jedi Ltd.
If Anakin is risk-averse, Darth should recommend Jabba Co., as it has lower

B) Jedi Jabba
Mean 2 0.5
Variance 4 0.25
SD 2 0.5
X -1 -1
z -1.5 -3
P(Z<=z) 0.067 0.001 P(Loss > 1%)

C)
Jedi Jabba Portfolio
RV X1 X2 0.5X1 + 0.5X2
Mean 2 0.5 1.250
Variance 4 0.25 1.113
SD 2 0.5 1.055
X -1 -1 -1
z -1.5 -3 -2.133
P(Z<=z) 0.067 0.001 0.016
mend Jabba Co., as it has lower variance and fewer chances of large losses, even though expected return is slightly lower.

X1 + 0.5X2

P(Loss > 1%)


d return is slightly lower.
customers per day customers per day
Demand Poisson (6.5) Demand Criterion
Low <=4
Moderate [5-10]
High >10

A)
x P(X=x) xP(X=x)
5 0.145 0.727
6 0.157 0.945
7 0.146 1.024
7 0.146 1.024
8 0.119 0.951
9 0.086 0.772
10 0.056 0.558
Total 0.856 6.000 7.011

B) customers per day


Demand Criterion Permanent StaTemporary Sta
Low <=4 2 0
Moderate [5-10] 2 1
High >10 2 3

C) What is the probability that the store will need to hire temporary helpers on four or mor

P(X<5) 0.224
p P(X>=5) 0.776 probability of hiring temporary helpers

4 or more days a week 7Ci


4 7C4 * p^4 * (1-p)^3 35
5 7C5 * p^5 * (1-p)^2 21
6 7C6 * p^6 * (1-p)^1 7
7 7C7 * p^7 * (1-p)^0 1

D) What is the average number of days Poulomi would go at a stretch without hiring tempo

P(X<=4) P(X<5) 0.224 p

E[# days] 1/p 4.471


stomers per day per day, per person
Permanent Staff Temporary Staff Permanent Pay Temporary Pay
2 0 100 80
2 1 100 80
2 3 100 80

expected demand

per day, per person


Permanent Pay Temporary Pay Total Cost Probability Cost*Probability
100 80 200 0.22 44.73
100 80 280 0.71 198.66
100 80 440 0.07 29.41
1.00 272.80 272.80

mporary helpers on four or more days per week?

ing temporary helpers

p^i (1-p)^(7-i) prob


0.3632 0.0112 0.1423
0.2820 0.0500 0.2963
0.2189 0.2237 0.3428
0.1699 1.0000 0.1699
0.9512

a stretch without hiring temporary helpers?


expected daily labor cost

You might also like