Practice Problem Set 1 DS Course
Practice Problem Set 1 DS Course
SD
IQR
Q2 Mean 50
Variance 16
SD 4
a) Between 40 and 60
z 2.5
From Chebyshev theorem 0.84
Atleast 84% data lies between 40 and 60 i.e. 84000 students
b) Greater than 55
Between 45 to 55
z 1.25
From Chebyshev theorem 0.36
Atleast 36% data lies between 45 to 55
So at max 64% data can lie beyond 55
i.e. At max 64000 students can lie beyond 55
b) SD 546.11015
Range 4131.43
c) Mean 2249.72
SD 1149.2802
z 2.3930456
From Chebyshev theorem 0.8253784
So at max 17.46% of data can lie beyond 5000
i.e. At max 54 students can lie beyond 5000
Q4 0
Q5 meanC 25
varC 5
meanF 77
varF 16.2
Q6 X Runs in a ball Runs (per balProbability
a) P(X <= 2) 0.8 0 0.3
1 0.3
b) E[X] 1.545 2 0.2
6*E[X] 9.27 3 0.05
4 0.1
c) Distribution under new conditions 5 0.005
Runs (per balProbability 6 0.045
0 0.335
1 0.285
2 0.19
3 0.0475 X X runs in a ball
4 0.095 W batsman getting out
5 0.00475 P(X) = P(X|W)*P(W) + P(X|W')*P(W')
6 0.04275 X=0 1*0.05+0.3*0.95
X=1 0*0.05+0.3*0.95
P(X <= 2) 0.81
E[X] 1.46775
6*E[X] 8.8065
Q7 A Cycle from A
B Cycle from B
C Cycle from C
D Defective cycle
P(A) 0.75
P(B) 0.15
P(C) 0.1
P(D|A) 0.04
P(D|B) 0.06
P(D|C) 0.08
P(A|D) 0.6382979
Q8 p 0.05
n 10
P(x>=2, n = 10)
Binominal distribution
P(x>=2, n = 10) = 1 - P(x<=1,n=10)
0.0861384
Q9 P result positive
N result negative
C with CAD
C' without CAD
P(P|C) 0.95
P(P|C') 0.1
P(N|C) 0.05
P(N|C') 0.9
p P©
P(C|P)= P(P|C)*P(C)/P(P)
p 0.24
P(A) 0.5
P(B|A) 0.6666667
P(B|A') 0.3333333
b) 0.5
c) P(A|B) = P(B|A)*P(A)/P(B)
P(A|B) 0.6666667
P(x=1, n = 2, p = 2/3) 0.4444444
b) P(1100<X<1500)=P(X<1500)-P(X<1100)
P(1100<X<1500) 0.1301805
c) P(1100<X<1500)=P(X<1500)-P(X<1100)
P(1100<X<1500) 0.1544788
d) Y = Option 1 - Option 2
Mean -2
Variance 3540.6041
P(Y>0) = 1 - P(Y<0)
P(Y>0) 0.4865934
Q14 a The plot cannot appropriately show the minor change from year to year. Similar plot for yearl
b It shows that 2015 temperature is higher than the average, but it is not enough to conclude th
c No. This plot does not talk about the shape of the distribution. It rather shows that there is a s
d Mean: Assuming that for each state the mean is calculated using the same number of observa
Median: the overall median cannot be computed without knowing the distribution of the tem
Standard deviation: It can be computed if we assume that number of observations per state is
57.1227
2.5193187
P(P|L) 0.6
P(P|H) 0.85
P(P'|D) 0.995 P(P|D) 0.005
P(D') 0.2 P(D) 0.8
P(P) 0.144
P(L) 0.12
P(H) 0.08
P(L|D') 0.6
b) p P(P|L) 0.6
n 50
np 30 > 5
n(1-p) 20 > 5
c) p P(P|H) 0.85
n 50
np 42.5 > 5
n(1-p) 7.5 > 5
d) 72
a) P(S>55000) = 1 - P(S<=55000)
P(S>55000) 0.0227501
b) p P(S>60000) 0.0013499
n 3
P(W=1, S>60000) 0.0040388
c) p P(40000<S<50000) 0.6826895
n 200
np 136.5379 > 5
n(1-p) 63.462102 > 5
P(X>156.5)=1-P(X<=156.5) 0.0012117
d) Mean 90000
SD 7071.0678
P(S>82929)=1-P(S<=82929) 0.8413424
Poisson distribution
a) P(x<=10) 0.5830398
P(x>10) 0.4169602
c) P(x=0) 0.0608101
b) P(2<W<4) 0.2031142
P(12<D<24) 0.4
P(T<=3) 0.9712726
p P(Package=Bulky) 0.7080503
n 15
P(T>=11)=1-P(T<11) 0.5432345
Total crop area
X|W')*P(W')
al worth of investment 1054
1670.302
year. Similar plot for yearly average from year to year would have shown the increase in global temperature.
not enough to conclude that the global temperature is rising
ther shows that there is a seasonal pattern across different months
e same number of observations, the overall mean can be computed and it is 24.23
the distribution of the temperature
of observations per state is equal and if the observations in the three states on a certain time point are independent. In that ca
re independent. In that case, the overall standard deviation will be 2.52. The independence assumption is not valid since temp
tion is not valid since temperatures in neighbouring states are usually dependent on each other.
x-u sqr f*sqr
0 0 -3.5104 12.32290816 0
1 0 -2.5104 6.30210816 0
2 0.2 -1.5104 2.28130816 0.456261632
3 0.32 -0.5104 0.26050816 0.083362611
4 0.288 0.4896 0.23970816 0.06903595
5 0.1536 1.4896 2.21890816 0.340824293
6 0.0384 2.4896 6.19810816 0.238007353
1.18749184 sum(f*sqr)
expected 3.5104 1.089720992 sqrt(sum)
std dev 1.089720992
X Prob (X-u) sqr Prob * sqr
1 0.1 -7.5 56.25 5.625
2 0.2 -6.5 42.25 8.45
8 0.4 -0.5 0.25 0.1
16 0.3 7.5 56.25 16.875
part c) Here we are talking about the time it takes to complete a project
Therefore a random variable denoting the time of completion will be exponential in natu
so let Xi be the random variable denoting the time it takes to develop i algorithm
s to develop i algorithm
100
so we need to find P(X>120) where X is a random variable denoting the time it takes for
P(X>120) = e ^ ( - x / u ) 0.3011942119
b) P(X>2.5) 0.367879441
n 40
P(X<=20) where X is a random variable denoting the number of people who take more th
this is a binomial
Check np and n(1-p)
approx to normal
find value
A) P(W) (x+y)/100
B) P(NW & G) (0.8*(60-x)+0.9*(40-y))/100
P(G|NW)
P(G|W)
not having work experience increases chances of getting int
Total 100
Engineering Non-engineering
60 W NW 40
C 0.5*x 0.2*(60-x) C
G 0.5*x 0.8*(60-x) G
x 60-x
1/11
0.5
0.415
0.83
0.45
es chances of getting into genman role
Non-engineering
W NW
0.6*y 0.1*(40-y)
0.4*y 0.9*(40-y)
y 40-y
Engineering Non-engineering
60 W NW 40
C 12.5 7 19.5 C
G 12.5 28 40.5 G
25 35
Non-engineering
W NW
15 1.5 16.5
10 13.5 23.5
25 15
lambda 2
part a) atleast 2
0.59399415 P(X=2) 1 - P(X=1) - P(X=0)
part b) p 0.59399415
n 31
x 18
since np and n(1-p) both are greater than 5, we can approximate as normal
part c)
P(X=0) 0.135335283 p
b + d = 2(a * c) 0.7304
b = 2a part b) 0.854634425
a + b + c + d = 0.18
part a) 0.36
part b)
part c) n 7
p 0.1
np 0.7
np(1-p) ^ 0.5 0.793725393
lambda 1 iphone/6 hours
part a) P(not selling in first three hours) * P(selling 2 in last six hours)
answer 0.11156508
part b) let Xi be the random variable for the number of iphones sold on day i
al because of CLT
n 31
lambda 1 iphone/6 hours
1.5 iphone/day
part b) 2 log(price)
4.4 mean
0.49 variance
0.7 std dev
P(X>=100000)0.999696617
Given: H 52
Given: S 45
T= 69
z 1.889822365
prob 0.970609139
Cov(H, S) -6
Y H-S
u 6
Var 37
std dev 6.08276253
x 0
P(Y<=0) 0.161969923
answer 0.838030077
60
42
3
mean 2 (as 50% people do within this)
z -0.99981509
x 1.5
u 2
sigma 0.50009247
this is sigma of population
Part B) n 45
s 0.50009247 Here it is important to note that even though we are given th
x-bar 2 we are not talking about the probability on average
x 3 if that was the case, we would have taken sigma as sigma / sq
since we are not talking about average time, we continue wit
z 1.999630187
prob 0.977229894 in short, only when talking about average of a sample taken f
answer 0.190644939
at even though we are given the sample size
robability on average
have taken sigma as sigma / sqrt(n)
average time, we continue with the population number
out average of a sample taken from the population, use sigma = sigma / sqrt(n)
Year Count Average Funding Std Dev Total Funding
2015 640 9.72 25.9 6220.8
2016 586 6.53 19.6 3826.58
2017 453 11.3 34 5118.9
2018 263 17.2 39.9 4523.6
2019 101 49.1 87.5 4959.1
2020 7 55.7 82.9 389.9
2050 12.21 25038.88
A)
We can see total funding has decreased
C)
X mean SD X-mean
70 13 37 57
(X-mean)/SD
1.54
accepted papers N(300,100)
rejected papers Normal P(review time<200 | rejected) = 2*P(review
P(review time>250 | rejected) = 0.05
P(review time<=250 | rejected) = 0.95
P(review time > 250) = 0.25
A)
P(review time<=250 | rejected) = 0.95
P(X<=t) z-value
0.95 1.645 P(Z <= 1.645) = 0.95
P(T<200 | A) 0.159
P(T<200 | R) 0.317
B)
P(T>250) = P(T>250|A)* P(A) + P(T>250|R)*P® = 0.25
P(A) p
P(R) 1-p Prob x
P(T>250|A) 250
P(T>250|R) 250
C) P(A|T>250)
P(A|T>250) = P(T>250|A)*P(A)/P(T>250)
0.862
00 | rejected) = 2*P(review time<200 | accepted)
50 | rejected) = 0.05
250 | rejected) = 0.95
250) = 0.25
(x-mean)/SD = z
x mean SD z
250 M S 1.645
200 M S -0.475
mean SD z probability
300 100 -0.500 0.691
211.21 23.58 1.645 0.050
seconds >>> prob fo correct counting
n mean p
1000 6 0.99
T ~ Exp(lambda)
mean 1/lambda 6.0000
lambda 1/mean 0.1667
A) P(T>t) = e^(-lambda*t)
P(T>9) 0.2231
n 36 RV
mean 6.0000
SD 1
0.1257
mean 1/lambda 6.0000
SD mean 6.0000
Units >>>
Time Taken 50 mins
Visitor Rate 48 per hour
Capacity 10 people
A)
P(X<10)
D) P(4th & 5th being full |first 3 were not full) = P(4th & 5th being full) = (1-p)^2
As both are independent events
Ans 0.080
Arrival Leave Main Gate Waiting Period Good Explanation of Poission Process: htt
7:50 AM 8:00 AM 10 mins
8:50 AM 9:00 AM 10 mins
9:50 AM 10:00 AM 10 mins
10:50 AM 11:00 AM 10 mins
11:50 AM 12:00 PM 10 mins
12:50 PM 1:00 PM 10 mins
1:50 PM 2:00 PM 10 mins
2:50 PM 3:00 PM 10 mins
3:50 PM 4:00 PM 10 mins
Total Trips 9
0.717
ull) = (1-p)^2
n of Poission Process: https://www.probabilitycourse.com/chapter11/11_1_2_basic_concepts_of_the_poisson_process.php
f_the_poisson_process.php
P(bet) 0.8
P(C win) p
P(C win 2 betsp^2 0.09
A)
p 0.3
B)
P(F win 2) (1-p)^2 0.49 p2
C)
p1 P(Chatur win 2 bets) 0.09
P(C win 2 bets on atleast 2 days) = 1 - P(C win 2 bets on 1 day or no days)
0 day 7C0 * p1^0 * (1-p1)^7 0.52
1 day 7C1 * p1^1 * (1-p1)^6 0.36
Ans 0.13
D)
(7C2)*(p1^2) * (5C4)*(p2^4) * (1C1)*(p3)
C win 2 on 2 days 0.1701
F win 2 on 4 days 0.2882
Draw on 1 day 0.42
0.021
E)
Expected number of bets to be won by Chatur in month of June
# days 30
# bets 48
# C wins 14.4
p1 C win 2 0.09
p2 F win 2 0.49
p3 Draw 0.42
1
day or no days)
n 287
A)
Variable Name Variable Type
Sector Categorical (Nominal)
Full Time Employees Quantitative (Discrete)
Total ESG Risk Score Quantitative (Continuous)
ESG Risk Level Categorical (Ordinal) (Negligible < Low < Medium <
nal) (Negligible < Low < Medium < High)
A)
Jedi Jabba
Mean M1 M2 M1>M2
SD S1 S2 S1>S2
B) Jedi Jabba
Mean 2 0.5
Variance 4 0.25
SD 2 0.5
X -1 -1
z -1.5 -3
P(Z<=z) 0.067 0.001 P(Loss > 1%)
C)
Jedi Jabba Portfolio
RV X1 X2 0.5X1 + 0.5X2
Mean 2 0.5 1.250
Variance 4 0.25 1.113
SD 2 0.5 1.055
X -1 -1 -1
z -1.5 -3 -2.133
P(Z<=z) 0.067 0.001 0.016
mend Jabba Co., as it has lower variance and fewer chances of large losses, even though expected return is slightly lower.
X1 + 0.5X2
A)
x P(X=x) xP(X=x)
5 0.145 0.727
6 0.157 0.945
7 0.146 1.024
7 0.146 1.024
8 0.119 0.951
9 0.086 0.772
10 0.056 0.558
Total 0.856 6.000 7.011
C) What is the probability that the store will need to hire temporary helpers on four or mor
P(X<5) 0.224
p P(X>=5) 0.776 probability of hiring temporary helpers
D) What is the average number of days Poulomi would go at a stretch without hiring tempo
expected demand