0% found this document useful (0 votes)
8 views4 pages

HW t5 Answers

The document contains a series of homework problems related to calculations involving neural networks, probability, and Bayesian inference. It includes detailed calculations for activation functions, probability distributions, and Bayesian reasoning, along with explanations of concepts like conditional independence and Markov blankets. The problems require students to set up and solve various probability scenarios, with specific points allocated for accuracy and methodology.

Uploaded by

damonionkatsby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

HW t5 Answers

The document contains a series of homework problems related to calculations involving neural networks, probability, and Bayesian inference. It includes detailed calculations for activation functions, probability distributions, and Bayesian reasoning, along with explanations of concepts like conditional independence and Markov blankets. The problems require students to set up and solve various probability scenarios, with specific points allocated for accuracy and methodology.

Uploaded by

damonionkatsby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2025

Homework T5: Chapters 12, 13. 21


1. [15 pts.] 3 points each for the calculation of a4 to a6. 2 points each for the calculation of a7 to
a9. For the last two layers, -1 total if the only errors are due to incorrect input from the
previous layer.

a4 = ReLU(w0,4 + a1* w1, 4 + a2 * w2, 4)


= ReLU(-1 + 1 * 4 + 1 * 2 ) = ReLU(5)
=5
a5 = ReLU(w0,5 + a1* w1, 5 + a2 * w2,5 + a3 * w3,5, t5)
= ReLU(-4 + 1 * 3 + 1 * 2 + 0 * -3) = ReLU(1)
=1
a6 = ReLU(w0,6 + a2* w2, 6 + a3 * w3,6, t6)
= ReLU(-2 + 1 * 2 + 0 * 3) = ReLU(0)
=0
a7 = ReLU(w0,7 + a4* w4, 7 + a5* w5, 7 + a6 * w6, 7, t7)
= ReLU(-4 + 5 * -3 + 1 * 3 + 0 *2) = ReLU(-16)
=0
a8 = ReLU(w0,8 + a4* w4, 8 + a5* w5, 8 + a6 * w6, 8, t8)
= ReLU(-2 + 5 * 2 + 1 * -3 + 0 *-5) = ReLU(5)
=5
a9 = ReLU(w0,9 + a7* w7, 9 + a8 * w8, 9, t9)
= ReLU(1+ 0* -4 + 5 * 2) = ReLU(1+0+10) =ReLU(11)
= 11

2. [20 points, 5 pts. each] -1 each if simply added wrong


a) P(A) = <P(A=true), P(A=false)> = <P(a), 1 - P(a)>
= < 0.10 + 0.01 + 0.05 + 0.20, 1 - P(a) > = < 0.36, 0.64 >
b) P(a  b) = 0.05 + 0.20 = 0.25
c) P(c  a) = 0.10 + 0.20 + 0.05 + 0.15 + 0.04 + 0.25 = 0.79
d) P(a | b  c) = P(a  b  c) / P(b  c) = 0.05 / (0.05 + 0.15) = 0.25

3. [15 pts.] Let Test be a boolean variable where true means you tested positive, and Disease be
a boolean variable where true means you have the disease. We are given the following
probabilities:
P(test|disease) = 0.99 (accuracy of test – if you have the disease)
P(test|disease)= 0.99 (accuracy of test – if you do not have the disease
P(disease) = 0.0001 (rarity of disease)
We know via Bayes rule that P(disease|test) = P(test|disease)P(disease)/P(test). However, we
have not been given a value for P(test). The crucial observation here is that if we use
marginalization / summing out, P(test) = P(test  disease) + P(test  disease). Using the
product rule, this can be further equated to:
2025

P(test) = P(test|disease)P(disease) + P(test|disease)P(disease)


= P(test|disease)P(disease) + [1 - P(test|disease)]P(disease)
= (0.99)(0.0001) + (0.01)(0.9999)
= 0.000099 + 0.009999
= 0.010098
Therefore,
P(disease|test) = (0.99)(0.0001) / (0.010989) = (0.000099)/ (0.010989) ~= 0.009804
The reason the rarity of the disease is good news is that according to Bayes rule, the less
likely the disease is, the less likely a positive test means you really have the disease (since
P(disease) appears as a product in the numerator). To think about it intuitively, consider that
if 10,000 people take the test, only one should have the disease. But, if the test incorrectly
comes out positive 1 out of 100 times (i.e., 99% accurate), then about 100 people
(10,000/100) will incorrectly test positive for the disease. Therefore, the vast majority of the
“positive test results” will be false positives.

-3 if Bayes rule is used, but P(test) is incorrectly calculated (or not calculated at all).

4. [25 pts.]
a) [10 pts]

FG

FA G

b) [5 pts] No, the network is not a polytree. There are two paths from T to G: one direct
path and one via FG.
2025

c) [5 pts] Note, it is okay if the last column (G=high) is omitted. -1 if they flip columns and
rows
G
FG T normal high
F normal x 1-x
F high 1-x x
T normal y 1-y
T high 1-y y
d) [5 pts] Note, it is okay if the last column (A=false) is omitted. 1 if they flip columns and
rows
A
FA G true false
F normal 0 1
F high 1 0
T normal 0 1
T high 0 1

5. [25 pts.] For each problem ~half the credit is for setting the problem up correctly, and the
other half is for filling in the probabilities and solving the math correctly. Typically a single
math error or an incorrect value for one probability is just a single point off. Different
choices in rounding can lead to different results. We insist that your answers be accurate up
to at least two decimal places.
a) [5 pts]
P(a  b  c  d)
= P(a) * P(b | a) * P(c | a) * P(d | b ^ c)
= 0.4 * 0.4 * 0.5 * 0.5
= 0.04
b) [8 pts] In this problem, I use vectors as a shorthand to represent the probabilities for
different values of A. It is okay is students solve for each value of A separately.
P(A | b  c  d)
= α P(A, b, c, d)
= α (P(A) P(b | A) P(c | A) P(d | b  c))
= α (<0.4, 0.6> * <0.6, 0.3> * <0.5, 0.2> * 0.6)
= α <0.072, 0.0216>
=> α = 10.68
So, < 0.769, 0.231>
Note, B and C form a Markov blanket around A, so A is conditionally independent of D
given B and C. Thus, this also works:
P(A | b  c  d) = P(A | b  c)
= α (P(A) P(b | A) P(c | A) ∑𝑑̂ 𝑃(𝑑̂ | 𝑏  𝑐))
= α (<0.4, 0.6> * <0.6, 0.3> * <0.5, 0.2> * 1)
= α <0.12, 0.036>
2025

=> α = 6.41
This gives the same answer as above: < 0.769, 0.231>
Students only need to solve the problem one way.
c) [12 pts]
P(B | c  d) = α P(B, ¬c, d)
= <P(b|c ^ d), P(b|c ^ d)>
P(b|c ^ d) = α â P(â) * P(b|â) * P(c|â) * P(d|b^c)

= α * P(d|b^c) (∑â 𝑃(â) ∗ 𝑃(𝑏|â) ∗ 𝑃(𝑐|â))


= α * P(d|b^c) (P(a) * P(b|a) *P(c|a) + P(a) * P(b|a) * P(c|a))
= α * 0.2 * (0.4 * 0.6 *0.5 + 0.6 * 0.3 * 0.8)
= α (0.0528)
P(b|c ^ d) = α â P(â) * P(¬b|â) * P(c|â) * P(d|b^c)

= α * P(d|b^c) (  P(â) * P(b|â) * P(c|â))


â
= α * P(d|b^c) (P(a) * P(b|a) *P(c|a) + P(a) * P(b|a) * P(c|a))
= α * 0.8 * (0.4 * 0.4 *0.5 + 0.6 * 0.7 * 0.8)
= α * 0.8 * (0.08 + 0.336)
= α * 0.3328
So, P(B | c  d) = α <0.0528, 0.3328>
=> α = 2.59

Thus, the answer is <0.137, 0.863>


Technically, I got 0.862 due to rounding errors. Of course, we know the sum of the
distribution has to be 1.

You might also like