ANN SAQ
Hey, I am gonna ask you topics about Artificial Neural Networks and Graphical Models
Answer in deep and depth
Do not give any inro like, I am ready to generate, here are your results, etc.
Do not give any end lines like, please ask me if you need more info ?
Directly generate the content in depth and however I ask
Understood ? ( say yes or no )
Yes
1 mark questions - 2-3 points / maths equations extra
understood ?
Yes
Multilayer perceptron and back propagation
1. What is a neural network?
2. How does a human brain inspire neural networks?
3. Define a neuron model.
4. What is meant by error correction learning?
5. List the types of learning processes in neural networks.
6. What is the credit assignment problem?
7. State the purpose of the least mean square algorithm.
8. What is the XOR problem in neural networks?
9. Name a heuristic used to improve the performance of a backpropagation algorithm.
10. What is weight initialization in neural networks?
1. What is a neural network?
A computational model inspired by the structure of biological neural networks.
Composed of interconnected nodes (neurons) arranged in layers.
Each connection has an associated weight, and learning involves adjusting these weights.
2. How does a human brain inspire neural networks?
The brain contains neurons connected via synapses; artificial neurons mimic this by using weighted
connections.
Learning in the brain occurs through synaptic plasticity; neural networks learn by weight adaptation
using algorithms like backpropagation.
Parallel distributed processing in the brain is mirrored in multi-layered structures of neural networks.
3. Define a neuron model.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/13
A basic processing unit in a neural network.
Mathematical form:
n
y = ϕ (∑i=1 wi xi + b)
where xi are inputs, wi are weights, b is bias, and ϕ is an activation function.
4. What is meant by error correction learning?
A learning process where weights are updated to minimize the error between predicted and actual
output.
Based on minimizing a loss function like Mean Squared Error (MSE).
Example update rule:
Δw = −η ∂E ∂w
where η is the learning rate and E is the error.
5. List the types of learning processes in neural networks.
Supervised Learning
Unsupervised Learning
Reinforcement Learning
6. What is the credit assignment problem?
The problem of determining which neurons or weights are responsible for a specific error or success.
Especially challenging in multi-layer networks where the contribution is indirect.
7. State the purpose of the least mean square algorithm.
To minimize the mean squared error between predicted output and actual output.
Weight update rule:
w(t + 1) = w(t) + η(d(t) − y(t))x(t)
where d(t) is desired output and y(t) is actual output.
8. What is the XOR problem in neural networks?
XOR is not linearly separable; a single-layer perceptron cannot solve it.
Requires at least one hidden layer → solved using a multilayer perceptron (MLP).
9. Name a heuristic used to improve the performance of a backpropagation algorithm.
Momentum term:
Δw(t) = −η∇E + αΔw(t − 1)
where α is the momentum coefficient.
Helps escape local minima and speeds up convergence.
10. What is weight initialization in neural networks?
The process of assigning initial small random values to weights.
Poor initialization can lead to vanishing/exploding gradients.
Common methods:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/13
Xavier/Glorot initialization
He initialization
Unsupervised learning and sequence learning
1. What is associative memory?
2. What does BAM stand for in neural networks?
3. Define ARTI in unsupervised learning
4. What is a self-organizing map (SOMY?
5. State one property of SOMS.
6. Name the algorithm used in SOM
7. What is the primary purpose of learning vector quantization
8. Define a recurrent nerwork paradigm.
9. What is the BPTT algorithm?
10. Who introduced Hopfield networks?
1. What is associative memory?
A type of memory model that recalls complete patterns when presented with partial or noisy inputs.
It maps input patterns to output patterns by association.
Example: Hopfield network and Bidirectional Associative Memory (BAM).
2. What does BAM stand for in neural networks?
BAM: Bidirectional Associative Memory
A type of recurrent neural network with two layers, enabling two-way pattern association between
input and output.
3. Define ARTI in unsupervised learning
ARTI: Adaptive Resonance Theory I
A neural architecture for unsupervised pattern recognition and clustering.
Uses vigilance parameter to control category formation, allowing stable learning of new patterns.
4. What is a self-organizing map (SOM)?
An unsupervised neural network that projects high-dimensional data into a lower-dimensional (usually
2D) grid.
Preserves topological relationships among input vectors.
Developed by Teuvo Kohonen.
5. State one property of SOMs.
Topological Preservation: Neurons that are close in the map respond to similar input vectors.
6. Name the algorithm used in SOM
Kohonen’s Learning Algorithm
Involves competition among neurons, weight adaptation, and neighborhood function.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/13
7. What is the primary purpose of Learning Vector Quantization (LVQ)?
A supervised version of SOM used for classification tasks.
It refines prototypes (codebook vectors) to better match labeled input data.
8. Define a recurrent network paradigm.
A neural network with feedback connections, where outputs are fed back into the network as inputs.
Capable of learning temporal or sequential patterns.
Example: Elman network, Hopfield network.
9. What is the BPTT algorithm?
Backpropagation Through Time
An extension of backpropagation for training recurrent neural networks.
Unrolls the network through time steps and applies gradient descent:
∂E
= ∑t=1 ∂E
T t
∂w ∂w
where Et is the error at time t.
10. Who introduced Hopfield networks?
John J. Hopfield in 1982
Developed a form of recurrent network used as associative memory with binary threshold units.
Graphical Models
1. Define a Bayesian network.
2. What is conditional independence in graphical models?
3. What does d-separation imply in Bayesian networks?
4. Name the algorithm used for inference on a chain in graphical models.
5. What is factorization in Markov random fields?
6. State one application of Markov random fields.
7. What is the purpose of the sum-product algorithm?
8. Define loopy belief propagation.
9. What is the max-sum algorithm used for?
10. Give an example of a directed graphical model
1. Define a Bayesian network.
A directed acyclic graph (DAG) representing joint probability distributions.
Nodes = random variables, Edges = conditional dependencies.
Joint distribution factorizes as:
P (X1 , ..., Xn ) = ∏ni=1 P (Xi ∣ Pa(Xi ))
where Pa(Xi ) are the parents of node Xi .
2. What is conditional independence in graphical models?
Two variables X and Y are conditionally independent given Z if:
P (X, Y ∣ Z) = P (X ∣ Z)P (Y ∣ Z)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/13
In graphs, this is reflected in the absence of direct paths or blocked paths under certain conditions.
3. What does d-separation imply in Bayesian networks?
d-separation is a criterion to determine if a set of variables X is conditionally independent of Y given
Z.
If Z d-separates X and Y , then:
X ⊥⊥ Y ∣ Z
Blocks paths using collider, chain, and fork structures.
4. Name the algorithm used for inference on a chain in graphical models.
Forward-Backward Algorithm
Used in Hidden Markov Models (HMMs) for computing marginal distributions efficiently over
sequences.
5. What is factorization in Markov random fields?
The joint distribution is factorized over cliques in the undirected graph:
P (X) = Z1 ∏C∈C ψC (XC )
where ψC is a potential function over clique C , and Z is the partition function.
6. State one application of Markov random fields.
Image segmentation
Each pixel is modeled as a random variable, and MRF captures spatial dependencies for smoothing and
structure.
7. What is the purpose of the sum-product algorithm?
Computes marginal probabilities of variables in factor graphs.
Performs message passing on the graph:
μf →x (x) = ∑x∖x f (x) ∏y∈ne(f )∖x μy→f (y)
8. Define loopy belief propagation.
An extension of sum-product algorithm to graphs with cycles.
Approximate inference method using iterative message passing even in non-tree graphs.
9. What is the max-sum algorithm used for?
Computes the most probable assignment (MAP) instead of marginal probabilities.
Replaces sums with maxima:
μf →x (x) = maxx∖x [f (x) + ∑y∈ne(f )∖x μy→f (y)]
10. Give an example of a directed graphical model.
Hidden Markov Model (HMM)
A chain-structured Bayesian network where hidden states influence observed outputs:
P (X, Y ) = P (X1 ) ∏Tt=2 P (Xt ∣ Xt−1 ) ∏Tt=1 P (Yt ∣ Xt )
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/13
Mixture models and EM
1. What is rejection sampling?
2. Define importance sampling.
3. What is the purpose of the EM algorithm in sampling?
4. Name one standard distribution used in sampling algorithms.
5. What is the role of the Metropolis-Hastings algorithm?
6. Define Gibbs sampling.
7. What is slice sampling?
8. What is the partition function in MCMC?
9. Name one property of Markov chains.
10. What is adaptive rejection sampling?
1. What is rejection sampling?
A technique to sample from a complex distribution p(x) using a simpler proposal distribution q(x).
Accept a sample x with probability:
p(x)
Mq(x)
where M is a constant such that p(x) ≤ Mq(x) for all x.
2. Define importance sampling.
A method to estimate expectations under a target distribution p(x) using samples from a proposal
distribution q(x):
p(xi )
Ep [f (x)] ≈ N1 ∑N
i=1 f (xi ) q(xi )
where xi ∼ q(x).
3. What is the purpose of the EM algorithm in sampling?
EM (Expectation-Maximization) is not a sampling method itself, but it helps estimate parameters of
latent variable models (e.g., Gaussian Mixture Models).
It iteratively maximizes the likelihood by alternating:
E-step: Estimate posterior over latent variables.
M-step: Maximize expected log-likelihood w.r.t parameters.
4. Name one standard distribution used in sampling algorithms.
Gaussian (Normal) distribution
Commonly used as a proposal distribution in methods like Metropolis-Hastings.
5. What is the role of the Metropolis-Hastings algorithm?
A Markov Chain Monte Carlo (MCMC) algorithm to sample from a target distribution p(x).
Accepts a proposal x′ with probability:
α = min (1, p(x )
′ ′
)q(x∣x )
p(x)q(x′ ∣x)
ensuring the correct stationary distribution.
6. Define Gibbs sampling.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/13
A special case of Metropolis-Hastings where proposals are sampled from conditional distributions:
(t+1) (t)
xi ∼ p(xi ∣ x−i )
Each variable is updated sequentially while conditioning on the current values of all others.
7. What is slice sampling?
Samples from a distribution by introducing an auxiliary variable and sampling from the region under
the curve of the target density.
Samples from the slice defined by:
{x : p(x) ≥ u}
where u ∼ Uniform(0, p(x)).
8. What is the partition function in MCMC?
A normalization constant Z in probabilistic models:
Z = ∫ p~(x)dx
~(x) is the unnormalized probability.
where p
Often intractable, so MCMC methods avoid computing it directly.
9. Name one property of Markov chains.
Ergodicity: The chain eventually forgets its starting state and converges to a stationary distribution,
given it is irreducible and aperiodic.
10. What is adaptive rejection sampling?
A variant of rejection sampling for log-concave distributions.
Builds a piecewise linear upper bound to approximate the target distribution.
Efficiently adapts the proposal envelope using previously rejected samples.
1. What is the main goal of Gaussian mixture models?
2. Define the EM algorithm.
1. What is maximum likelihood in Gaussian mixtures?
4 State one use of EM in Bayesian linear regression
5. What is meant by variational inference?
6. Name a property of factorized distributions.
7. Define predictive density in variational mixture models.
8. What is the variational lower bound?
9. How is the number of components determined in Gaussian mixtures?
10. What is the role of induced factorization in variational models?
1. What is the main goal of Gaussian mixture models?
To model data as a weighted sum of multiple Gaussian distributions, capturing complex, multimodal
data distributions.
Formally:
K
p(x) = ∑k=1 πk N (x ∣ μk , Σk )
where πk are mixing coefficients.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/13
2. Define the EM algorithm.
An iterative optimization algorithm for models with latent variables.
E-step: Compute the expected log-likelihood using current parameter estimates.
M-step: Maximize this expected log-likelihood to update parameters.
3. What is maximum likelihood in Gaussian mixtures?
The process of estimating parameters (πk , μk , Σk ) that maximize the likelihood of the observed
data.
The log-likelihood:
i=1 log (∑k=1 πk N (xi ∣ μk , Σk ))
log p(X) = ∑N K
4. State one use of EM in Bayesian linear regression.
EM is used to estimate hyperparameters (e.g., noise variance, prior variance) in hierarchical Bayesian
linear models where direct marginalization is intractable.
5. What is meant by variational inference?
An approximation method to compute posterior distributions by optimizing over a family of tractable
distributions q(z).
Converts inference into optimization by maximizing a lower bound on the log-likelihood.
6. Name a property of factorized distributions.
Independence assumption: The joint distribution is assumed to factor as:
q(z) = ∏i qi (zi )
simplifying computations but possibly reducing accuracy.
7. Define predictive density in variational mixture models.
The probability of a new data point x∗ given the observed data X :
p(x∗ ∣ X) = ∫ p(x∗ ∣ z, θ)q(z, θ) dz dθ
where q(z, θ) is the variational approximation of the posterior.
8. What is the variational lower bound?
Also called the Evidence Lower Bound (ELBO):
log p(X) ≥ Eq(z) [log p(X, z)] − Eq(z) [log q(z)]
Maximizing this bound improves the approximation of the true posterior.
9. How is the number of components determined in Gaussian mixtures?
Methods include:
Model selection criteria like BIC (Bayesian Information Criterion) or AIC.
Dirichlet Process Gaussian Mixtures for automatic, nonparametric selection.
10. What is the role of induced factorization in variational models?
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/13
Simplifies inference by assuming independence between subsets of variables.
Enables tractable optimization of the ELBO and reduces computational cost.
1. What is a neural network?
2. How does a human brain inspire neural networks?
3. Define a neuron model
4. What is meant by error correction learning?
5. State one property of SOMs.
6. Name the algorithm used in SOM.
7. What is the primary purpose of learning vector quantization?
8. Define a recurrent network paradigm.
9. What is conditional independence in graphical models?
10. What does d-separation imply in Bayesian networks?
1. What is a neural network?
A neural network is a computational framework composed of layers of interconnected nodes (neurons)
used to approximate complex functions.
It learns patterns by adjusting weights on connections through training data.
Used for tasks like classification, regression, and pattern recognition.
2. How does a human brain inspire neural networks?
Artificial neurons are inspired by biological neurons that receive and transmit signals via synapses.
The network architecture mimics the brain's distributed parallel processing, where learning is
achieved by modifying synaptic strengths (weights).
3. Define a neuron model.
A mathematical function that simulates the behavior of a biological neuron.
Core equation:
n
y = ϕ (∑i=1 wi xi + b)
where xi are inputs, wi are weights, b is bias, and ϕ is the activation function.
4. What is meant by error correction learning?
A learning process where the network updates weights based on the difference between the predicted
and actual output.
Objective: Minimize a loss function like Mean Squared Error (MSE).
Update rule example:
Δw = −η ∂E
∂w
5. State one property of SOMs.
Topology preservation: Nearby neurons in the map respond to similar input patterns, maintaining the
spatial structure of input data.
6. Name the algorithm used in SOM.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/13
Kohonen’s Learning Algorithm
Involves competition, cooperation (neighborhood), and adaptation steps for training the map.
7. What is the primary purpose of Learning Vector Quantization?
To perform supervised classification by learning prototypes (codebook vectors) that represent classes.
Improves classification accuracy by adapting vector positions based on labeled data.
8. Define a recurrent network paradigm.
A network architecture where connections between neurons form directed cycles, enabling temporal
dynamic behavior.
Suitable for tasks involving sequences like language, time series, and control systems.
9. What is conditional independence in graphical models?
Two variables X and Y are conditionally independent given Z if:
P (X, Y ∣ Z) = P (X ∣ Z)P (Y ∣ Z)
Graphically, this is indicated by the absence of connecting paths when conditioned on Z .
10. What does d-separation imply in Bayesian networks?
A criterion for determining conditional independence using the graph structure.
If a set Z d-separates nodes X and Y , then X ⊥⊥ Y ∣ Z in the probability distribution.
It blocks all active paths between X and Y .
1. What is the purpose of the sum-product algorithm?
2. Define loopy belief propagation.
3. What is meant by variational inference?
4. Name a property of factorized distributions.
5. Define predictive density in variational mixture models.
6. What is the variational lower bound?
7. What is slice sampling?
8. What is the partition function in MCMC?
9. Name one property of Markov chains.
10. What is adaptive rejection sampling?
1. What is the purpose of the sum-product algorithm?
To compute marginal distributions of variables in factor graphs efficiently.
Works via message passing between variable and factor nodes.
Key idea:
p(xi ) = ∑x∖xi ∏j fj (xscope(j) )
is computed without enumerating all joint configurations.
2. Define loopy belief propagation.
An approximate inference method where sum-product algorithm is applied to graphs with cycles
(loops).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/13
Messages are passed iteratively until convergence, yielding approximate marginals.
Not guaranteed to converge, but often performs well in practice.
3. What is meant by variational inference?
A method for approximating intractable posterior distributions by optimizing over a family of
simpler distributions.
Converts inference to an optimization problem by maximizing the Evidence Lower Bound (ELBO).
4. Name a property of factorized distributions.
Independence assumption:
q(z) = ∏i qi (zi )
simplifies the joint approximation and reduces computation but may sacrifice accuracy.
5. Define predictive density in variational mixture models.
The probability of a new data point given training data using the variational posterior:
p(x∗ ∣ X) = ∫ p(x∗ ∣ θ, z)q(θ, z) dθ dz
where q(θ, z) approximates the true posterior.
6. What is the variational lower bound?
Also called ELBO (Evidence Lower Bound), defined as:
log p(X) ≥ Eq(z) [log p(X, z)] − Eq(z) [log q(z)]
Maximizing this bound improves the closeness of the variational distribution to the true posterior.
7. What is slice sampling?
A MCMC sampling method that samples from a target distribution by sampling uniformly from the
region under its probability density curve.
Involves sampling an auxiliary variable u ∼ Uniform(0, p(x)) and then sampling x from the “slice”
{x : p(x) ≥ u}.
8. What is the partition function in MCMC?
A normalization constant Z for unnormalized probability distributions:
Z = ∫ p~(x) dx
Appears in models like Boltzmann machines and is often intractable; MCMC methods avoid computing
it directly.
9. Name one property of Markov chains.
Stationarity: A Markov chain converges to a stationary distribution π(x), such that:
π(x′ ) = ∑x π(x)P (x′ ∣ x)
and remains unchanged under further transitions.
10. What is adaptive rejection sampling?
A version of rejection sampling designed for log-concave distributions.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/13
Constructs and updates a piecewise linear envelope over the log-density function, improving
sampling efficiency.
Samples are drawn from the envelope and accepted/rejected based on actual density.
1. What is the credit assignment problem?
2. Name a heuristic used to improve the performance of a backpropagation algorithm, COL
3. What is BPTT algorithm
4. Who introduced Hopfield networks
5. State one application of Markov random fields
6. What is the purpose of the sum-product algorithm"
7 What is maximum likelihood in Gaussian mixtures? / give its function
8. State one use of EM in Bayesian linear regression.
9. Define Gibbs sampling.
10. What is the purpose of the EM algorithm in sampling?
1. What is the credit assignment problem?
The difficulty of determining which weights (or neurons) are responsible for the error in a neural
network's output.
Especially important in multilayer networks where internal units don’t directly produce output,
making it hard to assign responsibility.
2. Name a heuristic used to improve the performance of a backpropagation algorithm.
Momentum: Helps accelerate learning and escape local minima.
Update rule:
Δw(t) = −η ∂E ∂w
+ αΔw(t − 1)
where α is the momentum term.
3. What is BPTT algorithm?
Backpropagation Through Time: Extension of backpropagation for recurrent neural networks
(RNNs).
Unrolls the RNN across time steps and applies backpropagation to compute gradients.
Handles temporal dependencies in sequence learning tasks.
4. Who introduced Hopfield networks?
John J. Hopfield, in 1982.
The Hopfield network is a form of recurrent neural network used for associative memory and
optimization.
5. State one application of Markov random fields.
Image Denoising: MRFs model spatial dependencies between pixels and restore images corrupted by
noise using local interactions.
6. What is the purpose of the sum-product algorithm?
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/13
To compute marginal probabilities efficiently in factor graphs via message passing.
Avoids brute-force marginalization over high-dimensional joint distributions.
7. What is maximum likelihood in Gaussian mixtures? / give its function
The goal is to find parameters (πk , μk , Σk ) that maximize the log-likelihood of the data:
log p(X ∣ θ) = ∑N
i=1 log (∑K
k=1 πk N (xi ∣ μk , Σk ))
Optimized using the EM algorithm due to latent component assignments.
8. State one use of EM in Bayesian linear regression.
EM is used to estimate hyperparameters (e.g., noise variance, prior variance) when integrating out
latent variables (e.g., weights) is intractable.
9. Define Gibbs sampling.
A Markov Chain Monte Carlo algorithm where each variable is sampled from its conditional
distribution:
(t+1) (t)
xi ∼ p(xi ∣ x−i )
Iteratively samples variables while conditioning on the most recent values of the others.
10. What is the purpose of the EM algorithm in sampling?
In sampling contexts (e.g., GMMs), EM helps estimate model parameters when data includes latent
(unobserved) variables.
It avoids direct sampling of complex posteriors by iteratively maximizing expected log-likelihood.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/13