Quiz 1: CS 726, Spring 2025
Prof. Sunita Sarawagi
15𝑡ℎ January, 2025
The quiz will last for 20 minutes.
1. Let 𝑋1 , 𝑋2 be Uniform random variables on [0,1]. Let Y = min(𝑋1 , 𝑋2 ). Let 𝐹𝑌 denote the CDF of 𝑌 .
(a) The value of 𝐹𝑌 (0.2) is?
Answer: ℙ(𝑍 ≤ 𝑧) = 1 − (1 − 𝑧)2 . 𝐹𝑌 (0.2) = 0.36 [1 mark]
(b) Find 𝐸[𝑌 ]3
Answer: 𝑓𝑍 (𝑧) = 2(1 − 𝑧). 𝐸[𝑌 3 ] = 0.1 [1 mark]
2. Let 𝑌 be a random variable following the distribution:
𝜆𝑦 𝑒 −𝜆
𝑝(𝑦|𝜆) = 𝑓 𝑜𝑟 𝑦 = 0, 1 ⋯
𝑦!
Suppose we are given 5 independent and identically distributed (i.i.d.) observations: 3,4,4,5,5 of 𝑌 . Find 𝜆̂ 𝑀𝐿𝐸 (MLE
Estimate of 𝜆).
Answer:
Likelihood = Π5𝑖=0 𝑝(𝑦𝑖 |𝜆) = 𝐶.𝜆∑𝑖=1 𝑦𝑖 𝑒 −5𝜆 . Where C is a constant.
5
Maximizing the likelihood wrt 𝜆 (by differentiating wrt 𝜆) , we get 𝜆 = (𝑦1 + 𝑦2 + 𝑦3 + 𝑦4 + 𝑦5 )/5 =(3 + 4 + 4 + 5 +
5)/5 = 4.2 [1 mark]
3. In a multiple-choice question (MCQ) test with 4 options:
• An examinee either knows the answer with probability 34 , or guesses randomly with probability 14 .
• If the examinee knows the answer, the probability of answering correctly is 1.
• If the examinee guesses, the probability of answering correctly is 41 .
Find the probability that an examinee knew the answer to a question, given that he/she has answered it correctly. The
answer is expressed as 𝑝𝑞 , where 𝑝 and 𝑞 are coprime integers. Compute 𝑝 + 𝑞.
Answer: 25
Define the following events:
• 𝐾 : The examinee knows the answer.
• 𝐶: The examinee answers the question correctly.
We want to find 𝑃(𝐾 ∣ 𝐶)
Using Bayes’ theorem:
𝑃(𝐶 ∣ 𝐾 )𝑃(𝐾 )
𝑃(𝐾 ∣ 𝐶) = .
𝑃(𝐶)
𝑃(𝐶) = 𝑃(𝐶 ∣ 𝐾 )𝑃(𝐾 ) + 𝑃(𝐶 ∣ 𝐾 𝑐 )𝑃(𝐾 𝑐 ),
where 𝐾 𝑐 represents the event that the examinee does not know the answer.
We are given:
• 𝑃(𝐾 ) = 3
4 and 𝑃(𝐾 𝑐 ) = 41 .
• 𝑃(𝐶 ∣ 𝐾 ) = 1.
• 𝑃(𝐶 ∣ 𝐾 𝑐 ) = 1
4 (random guessing).
Thus:
3 1 1 3 1 13
𝑃(𝐶) = 1⋅ + ⋅ = + = .
( 4 ) ( 4 4 ) 4 16 16
Compute 𝑃(𝐾 ∣ 𝐶) Substituting the values into Bayes’ theorem:
𝑃(𝐶 ∣ 𝐾 )𝑃(𝐾 ) 1 ⋅ 43 3 16 12
𝑃(𝐾 ∣ 𝐶) = = 13 = ⋅ = .
𝑃(𝐶) 16
4 13 13
Here, 𝑝 = 12 and 𝑞 = 13 are coprime integers. Thus, 𝑝 + 𝑞 = 12 + 13 = 25.
[1 mark]
4. Let 𝑋 and 𝑌 be two random variables with the following joint density function:
{
𝑘(𝑥 2 + 𝑥𝑦), if 0 < 𝑥 < 1 and 0 < 𝑦 < 3,
𝑓 (𝑥, 𝑦) =
0, otherwise.
Answer the following questions:
(a) Find the value of 𝑘 such that 𝑓 (𝑥, 𝑦) is a valid joint probability density function.
Solution:
1 3
∫ ∫ 𝑘(𝑥 2 + 𝑥𝑦) 𝑑𝑦 𝑑𝑥 = 1.
0 0
First, integrate with respect to 𝑦:
3 3
𝑥𝑦 2 9𝑥
∫ (𝑥 2 + 𝑥𝑦) 𝑑𝑦 = 𝑥 2 𝑦 + = 3𝑥 2 + .
0 [ 2 ]0 2
Now, integrate with respect to 𝑥:
1 1
9𝑥 9𝑥 2 9
⟹ ∫ 3𝑥 2 + 𝑑𝑥 = 𝑥 3 + =1+ .
0 ( 2 ) [ 4 ]0 4
9 13
⟹ 𝑘 1+ =𝑘⋅ = 1.
( 4 ) 4
4
⟹ 𝑘= .
13
[1 mark]
(b) Let 𝑓𝑌 (𝑦) be the marginal probability density function of 𝑌 . Calculate the ratio 𝑓𝑌 (1) .
𝑓𝑌 (2)
Solution:
The marginal density function 𝑓𝑌 (𝑦) is obtained by integrating the joint density 𝑓 (𝑥, 𝑦) over 𝑥:
1 1
𝑓𝑌 (𝑦) = ∫ 𝑓 (𝑥, 𝑦) 𝑑𝑥 = ∫ 𝑘(𝑥 2 + 𝑥𝑦) 𝑑𝑥.
0 0
Substitute 𝑘 = 4
13 and integrate with respect to 𝑥:
1
4
𝑓𝑌 (𝑦) = (𝑥 2 + 𝑥𝑦) 𝑑𝑥.
13 ∫0
1 1
𝑥 3 𝑥 2𝑦 1 𝑦
=∫ (𝑥 2 + 𝑥𝑦) 𝑑𝑥 = + = + .
0 [3 2 ]0 3 2
Thus, the marginal density is:
4 1 𝑦
𝑓𝑌 (𝑦) = + .
13 ( 3 2 )
Now, calculate the ratio 𝑓𝑌 (1) :
𝑓𝑌 (2)
4 1 2 4 4 16
𝑓𝑌 (2) = + = ⋅ = ,
13 ( 3 2 ) 13 3 39
4 1 1 4 5 20 10
𝑓𝑌 (1) = + = ⋅ = = .
13 ( 3 2 ) 13 6 78 39
The ratio is:
16
𝑓𝑌 (2) 39
= 10 = 1.6.
𝑓𝑌 (1) 39
[1 mark]
5. Consider a one-dimensional linear regression problem. The dataset corresponding to this problem consists of 𝑛 exam-
ples, 𝐷 = {(𝑥𝑖 , 𝑦𝑖 )}𝑛𝑖=1 , where 𝑥𝑖 , 𝑦𝑖 ∈ ℝ, ∀𝑖. Let 𝑤∗ = [𝑤0∗ , 𝑤1∗ ]𝑇 represent the unique solution that minimizes the cost
function 𝐽 (𝑤), defined as:
𝑛
1
𝐽 (𝑤) = ∑(𝑦𝑖 − 𝑤0 − 𝑤1 𝑥𝑖 )2 .
𝑛 𝑖=1
Define 𝑥̄ and 𝑦̄ as the means of 𝑥𝑖 and 𝑦𝑖 , respectively:
𝑛 𝑛
1 1
𝑥̄ = ∑ 𝑥𝑖 , 𝑦̄ = ∑ 𝑦𝑖 .
𝑛 𝑖=1 𝑛 𝑖=1
Which of the following statements are true? (Mark all correct options for full credit. No credit for partially corrects
answers)
(a) 1
𝑛
∑𝑛𝑖=1 (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )𝑦𝑖 = 0
(b) 1
𝑛
∑𝑛𝑖=1 (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑦𝑖 − 𝑦)
̄ =0
(c) 1
𝑛
∑𝑛𝑖=1 (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑥𝑖 − 𝑥)
̄ =0
(d) 1
𝑛
∑𝑛𝑖=1 (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑤0∗ + 𝑤1∗ 𝑥𝑖 ) = 0
Answer: (c), (d)
(a), (b) Can be disproved by simple counterexamples
(c) To find 𝑤0∗ , taking derivative of 𝐽 wrt 𝑤0 and setting it to 0:
𝑛
𝜕𝐽 −2
= ∑ (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 ) = 0
𝜕𝑤0 𝑛 𝑖=1
𝑛
∑ (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 ) = 0 (1)
𝑖=1
To find 𝑤1∗ , taking partial derivative of 𝐽 (𝑤) with respect to 𝑤1 and set it to zero:
𝑛
𝜕𝐽 −2
= ∑ (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 ) 𝑥𝑖 = 0
𝜕𝑤1 𝑛 𝑖=1
Rearranging:
𝑛
∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )𝑥𝑖 = 0 (2)
𝑖=1
𝑛
∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑥𝑖 − 𝑥̄ + 𝑥)
̄ =0
𝑖=1
𝑛 𝑛
∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑥𝑖 − 𝑥)
̄ + 𝑥̄ ∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 ) = 0
𝑖=1 𝑖=1
Using (1)
𝑛
∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑥𝑖 − 𝑥)
̄ +0=0 (3)
𝑖=1
(d) Using (1)
𝑛
𝑤0∗ ∑ (𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 ) = 0
𝑖=1
Using (2)
𝑛
𝑤1∗ ∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )𝑥𝑖 = 0
𝑖=1
Adding,
𝑛
∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑤0∗ + 𝑤1∗ 𝑥𝑖 ) = 0
𝑖=1
Dividing by n,
𝑛
1
∑(𝑦𝑖 − 𝑤0∗ − 𝑤1∗ 𝑥𝑖 )(𝑤0∗ + 𝑤1∗ 𝑥𝑖 ) = 0
𝑛 𝑖=1
[2 marks]
6. Consider two independent random variables 𝑋 ∼ (1, 2) and 𝑌 ∼ (3, 4), where (𝑎, 𝑏) represents a normal
distribution with mean 𝑎 and variance 𝑏. Define the new random variables as follows:
𝑍 = 𝑋 + 𝑌, 𝑊 = 𝑌 − 𝑋.
Calculate 𝐶𝑜𝑣(𝑍, 𝑊 )?
Answer: 2
Cov(𝑍, 𝑊 ) = 𝔼[𝑍𝑊 ] − 𝔼[𝑍]𝔼[𝑊 ].
Compute 𝔼[𝑍] and 𝔼[𝑊 ]
𝑍 = 𝑋 + 𝑌 ⟹ 𝔼[𝑍] = 𝔼[𝑋 ] + 𝔼[𝑌 ] = 1 + 3 = 4.
𝑊 = 𝑌 − 𝑋 ⟹ 𝔼[𝑊 ] = 𝔼[𝑌 ] − 𝔼[𝑋 ] = 3 − 1 = 2.
Compute 𝔼[𝑍𝑊 ]
𝑍𝑊 = (𝑋 + 𝑌 )(𝑌 − 𝑋 ) = 𝑋 𝑌 − 𝑋 2 + 𝑌 2 − 𝑋 𝑌 = −𝑋 2 + 𝑌 2 .
Taking the expectation:
𝔼[𝑍𝑊 ] = 𝔼[−𝑋 2 + 𝑌 2 ] = −𝔼[𝑋 2 ] + 𝔼[𝑌 2 ].
The variance formula gives:
𝔼[𝑋 2 ] = Var(𝑋 ) + (𝔼[𝑋 ])2 = 2 + 12 = 3,
𝔼[𝑌 2 ] = Var(𝑌 ) + (𝔼[𝑌 ])2 = 4 + 32 = 13.
Thus:
𝔼[𝑍𝑊 ] = −(3) + 13 = 10.
Compute Cov(𝑍, 𝑊 )
Cov(𝑍, 𝑊 ) = 𝔼[𝑍𝑊 ] − 𝔼[𝑍]𝔼[𝑊 ].
Substitute the values:
Cov(𝑍, 𝑊 ) = 10 − (4)(2) = 10 − 8 = 2.
[2 mark]