Bayesian Networks (Directed Acyclic Graphical Models)

Coin1 Coin2 Bell Bayesian Networks(Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. A clique will be formed because of induced dependency of the two coins given the bell.

Bayesian Networks (BNs)Examples of models for diseases & symptoms & risk factors One variable for all diseases (values are diseases) One variable per disease (values are True/False) Naïve Bayesian Networks versus Bipartite BNs

Boundary Basis for Dependency Models Let M be a dependency model over U={X1,…,Xn}. Let d be an ordering of these elements. A boundary basis wrt d of M is a set of independence statements I(Xi, Bi, Ui-Bi) that hold in M where Ui={X1,X2,…,Xi-1}, i=1,..n. A boundary basis is minimal if every Bi is minimal. Example I: What is the boundary basis for P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?

X1 X2 X3 X4 Example I A boundary basis and a boundary DAG for: P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)? I( X3 ,X2 , X1) I( X4 , X3, {X1, X2}) The directed acyclic graph (DAG) created by assigning each vertex Xi the parents Bi is called the boundary DAG of M relative to order d.

Coin1 Coin2 Bell Example II A boundary basis and a boundary DAG for: P(coin1,coin2,bell) =P(coin1)P(coin2)P(bell|coin1,coin2) I( coin1, { } ,coin2)

S V L T B A X D Example III • In the order V,S,T,L,B,A,X,D, we have a boundary basis: • I( S, { }, V ) • I( T, V, S) • I( l, S, {T,V}) • … • I( X,A, {V,S,T,L,B,D}) Does I( {X,D} ,A,V) also hold in the dependency model P ?

D is a minimal I-map of M if by removing any edge, D ceases to be an I-map. 3. D is a perfect map of M if ID(X,Z,Y)  IM(X,Z,Y) for all disjoint subsets X,Y, Z of U. Definitions Can we define “Independence” ID(X,Z,Y) graphically that answers these probabilistic independence questions ? 1. A Directed Acyclic Graph (DAG) D=(U,E) is an I-map of a dependency model M over U if ID(X,Z,Y)  IM(X,Z,Y) for all disjoint subsets X,Y, Z of U.

From Separation in UGs To d-Separation in DAGs

S V L T B A X D Paths • Intuition: dependency must “flow” along paths in the graph • A path is a sequence of neighboring variables Examples: • X  A  D  B • A L S  B

Path blockage • Every path is classified given the evidence: • active -- creates a dependency between the end nodes • blocked – does not create a dependency between the end nodes Evidence means the assignment of a value to a subset of nodes.

S S Blocked Blocked Active L L B B Path Blockage Three cases: • Common cause

Blocked Blocked Active S S L L A A Path Blockage Three cases: • Common cause • Intermediate cause

Blocked Blocked Active T T T L L L X X X A A A Path Blockage Three cases: • Common cause • Intermediate cause • Common Effect

T L A Definition of Path Blockage Definition: A path is active, given evidence Z, if • Whenever we have the configurationthen either A or one of its descendents is in Z • No other nodes in the path are in Z. Definition: A path is blocked, given evidence Z, if it is not active. Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z.

d-Separation

ID(T,S|) = yes S V L T B A X D Example

ID (T,S |) = yes ID(T,S|D) = no S V L T B A X D Example

ID (T,S |) = yes ID(T,S|D) = no ID(T,S|{D,L,B}) = yes S V L T B A X D Example

S V L T B A X D Example • In the order V,S,T,L,B,A,X,D, we get from the boundary basis: • ID( S, { }, V ) • ID( T, V, S) • ID( l, S, {T,V}) • … • ID( X,A, {V,S,T,L,B,D})

Main Result - Soundness

Bayesian Networks(Directed Acyclic Graphical Models) Definition: Given a probability distribution P on a set of variables U, a DAG D = (U,E) is called a Bayesian Network of P iff D is a minimal I-map of P.

First claim holds because any probability distribution is a semi graphoid (Symmetry, Decomposition, Contraction, Weak union).

Second claim of uniqueness of parents sets holds due to. • I(X,ZW1,YW2) and I(X,ZW2,YW1)  I(X,Z,YW1W2) • Proof: • (1) I(X, ZW1,YW2). Given. • I(X, ZW2,YW1). Given. • (3) I(X, ZW1W2,Y) by weak union from (1). • (4) I(X, ZYW1,W2) by weak union from (1). • (5) I(X, ZYW2,W1) by weak union from (2). • (6) I(X, ZY, W1W2) by intersection from (4) and (5). •  I(X, Z, YW1W2) by intersection from (3) and (6).

d-separation The definition ofID(X, Z, Y) is such that: Soundness [Theorem 9]: ID(X, Z, Y)= yes implies IP(X, Z, Y) follows from the boundary Basis(D). Completeness [Theorem 10]: ID(X, Z, Y)= no implies IP(X, Z, Y) does not follow from the boundary Basis(D).

S V L T B A X D Revisiting Example II So does IP( {X,D} ,A, V)hold ? Enough to check d-separation !

S V L T B A X D Bayesian Networks with numbers p(s) p(v) p(t|v) p(l|s) p(b|s) p(a|t,l) p(d|a,b) p(x|a)

S V L T B A X D p(s) p(v) p(t|v) p(l|s) p(b|s) p(a|t,l) p(d|a,b) p(x|a) Bayesian Network (cont.) Each Directed Acyclic Graph defines a factorization of the form:

IP( Xi ; {X1,…,Xi-1}\Pai | Pai ) Independence in Bayesian networks This set of independence assertions is denoted Basis(G) . All other independence assertions that are entailed by (*) are derivable using the semi-graphoid axioms.

COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds. Proof : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d.

COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten). Proof : By Corollary 4, each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P. In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.

Extension of the Markov Chain Property I(Xk, Xk-1, X1 … Xk-2)  I(Xk, Xk-1 Xk+1, X1 … Xk-2 Xk+2… Xn ) Holds due to the soundness theorem. Converse holds when Intersection is assumed. Markov Blankets in DAGs

Consequence: There is no improvement to d-separation and no statement escapes graphical representation. Reasoning: (1) If there were an independence statement  not shown by d-separation, then  must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and violates . (2) Same argument. [Note that (2) is a stronger claim.]

Bayesian Networks (Directed Acyclic Graphical Models)

Bayesian Networks (Directed Acyclic Graphical Models)

Presentation Transcript

Graphical Models

Stochastic Markov Processes and Bayesian Networks

Bayesian networks, introduction

Pattern Recognition and Machine Learning

Bayesian Networks for Modeling Gene Expression Data

Bayesian networks

Bayesian Networks

Application of Bayesian Network in Human Motion Analysis

Bayesian Networks

Dynamic Bayesian Networks

Bayesian networks, introduction

Bayesian Networks

Bayesian Networks (Directed Acyclic Graphical Models)

Bayesian networks (BNs)

ITCS 6114

Stochastic Markov Processes and Bayesian Networks

Pattern Recognition and Machine Learning

ITCS 6114

Application of Bayesian Network in Human Motion Analysis