330 likes | 346 Views
Coin 1. Coin 2. Bell. Bayesian Networks (Directed Acyclic Graphical Models). The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models.
E N D
Coin1 Coin2 Bell Bayesian Networks(Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well represented by undirected graphical models. A clique will be formed because of induced dependency of the two coins given the bell.
Bayesian Networks (BNs)Examples of models for diseases & symptoms & risk factors One variable for all diseases (values are diseases) One variable per disease (values are True/False) Naïve Bayesian Networks versus Bipartite BNs
Boundary Basis for Dependency Models Let M be a dependency model over U={X1,…,Xn}. Let d be an ordering of these elements. A boundary basis wrt d of M is a set of independence statements I(Xi, Bi, Ui-Bi) that hold in M where Ui={X1,X2,…,Xi-1}, i=1,..n. A boundary basis is minimal if every Bi is minimal. Example I: What is the boundary basis for P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)?
X1 X2 X3 X4 Example I A boundary basis and a boundary DAG for: P(X1,X2,X3,X4) = P(X1)P(X2|X1)P(X3|X2)P(X4|X3)? I( X3 ,X2 , X1) I( X4 , X3, {X1, X2}) The directed acyclic graph (DAG) created by assigning each vertex Xi the parents Bi is called the boundary DAG of M relative to order d.
Coin1 Coin2 Bell Example II A boundary basis and a boundary DAG for: P(coin1,coin2,bell) =P(coin1)P(coin2)P(bell|coin1,coin2) I( coin1, { } ,coin2)
S V L T B A X D Example III • In the order V,S,T,L,B,A,X,D, we have a boundary basis: • I( S, { }, V ) • I( T, V, S) • I( l, S, {T,V}) • … • I( X,A, {V,S,T,L,B,D}) Does I( {X,D} ,A,V) also hold in the dependency model P ?
D is a minimal I-map of M if by removing any edge, D ceases to be an I-map. 3. D is a perfect map of M if ID(X,Z,Y) IM(X,Z,Y) for all disjoint subsets X,Y, Z of U. Definitions Can we define “Independence” ID(X,Z,Y) graphically that answers these probabilistic independence questions ? 1. A Directed Acyclic Graph (DAG) D=(U,E) is an I-map of a dependency model M over U if ID(X,Z,Y) IM(X,Z,Y) for all disjoint subsets X,Y, Z of U.
From Separation in UGs To d-Separation in DAGs
S V L T B A X D Paths • Intuition: dependency must “flow” along paths in the graph • A path is a sequence of neighboring variables Examples: • X A D B • A L S B
Path blockage • Every path is classified given the evidence: • active -- creates a dependency between the end nodes • blocked – does not create a dependency between the end nodes Evidence means the assignment of a value to a subset of nodes.
S S Blocked Blocked Active L L B B Path Blockage Three cases: • Common cause
Blocked Blocked Active S S L L A A Path Blockage Three cases: • Common cause • Intermediate cause
Blocked Blocked Active T T T L L L X X X A A A Path Blockage Three cases: • Common cause • Intermediate cause • Common Effect
T L A Definition of Path Blockage Definition: A path is active, given evidence Z, if • Whenever we have the configurationthen either A or one of its descendents is in Z • No other nodes in the path are in Z. Definition: A path is blocked, given evidence Z, if it is not active. Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z.
ID(T,S|) = yes S V L T B A X D Example
ID (T,S |) = yes ID(T,S|D) = no S V L T B A X D Example
ID (T,S |) = yes ID(T,S|D) = no ID(T,S|{D,L,B}) = yes S V L T B A X D Example
S V L T B A X D Example • In the order V,S,T,L,B,A,X,D, we get from the boundary basis: • ID( S, { }, V ) • ID( T, V, S) • ID( l, S, {T,V}) • … • ID( X,A, {V,S,T,L,B,D})
Bayesian Networks(Directed Acyclic Graphical Models) Definition: Given a probability distribution P on a set of variables U, a DAG D = (U,E) is called a Bayesian Network of P iff D is a minimal I-map of P.
First claim holds because any probability distribution is a semi graphoid (Symmetry, Decomposition, Contraction, Weak union).
Second claim of uniqueness of parents sets holds due to. • I(X,ZW1,YW2) and I(X,ZW2,YW1) I(X,Z,YW1W2) • Proof: • (1) I(X, ZW1,YW2). Given. • I(X, ZW2,YW1). Given. • (3) I(X, ZW1W2,Y) by weak union from (1). • (4) I(X, ZYW1,W2) by weak union from (1). • (5) I(X, ZYW2,W1) by weak union from (2). • (6) I(X, ZY, W1W2) by intersection from (4) and (5). • I(X, Z, YW1W2) by intersection from (3) and (6).
d-separation The definition ofID(X, Z, Y) is such that: Soundness [Theorem 9]: ID(X, Z, Y)= yes implies IP(X, Z, Y) follows from the boundary Basis(D). Completeness [Theorem 10]: ID(X, Z, Y)= no implies IP(X, Z, Y) does not follow from the boundary Basis(D).
S V L T B A X D Revisiting Example II So does IP( {X,D} ,A, V)hold ? Enough to check d-separation !
S V L T B A X D Bayesian Networks with numbers p(s) p(v) p(t|v) p(l|s) p(b|s) p(a|t,l) p(d|a,b) p(x|a)
S V L T B A X D p(s) p(v) p(t|v) p(l|s) p(b|s) p(a|t,l) p(d|a,b) p(x|a) Bayesian Network (cont.) Each Directed Acyclic Graph defines a factorization of the form:
IP( Xi ; {X1,…,Xi-1}\Pai | Pai ) Independence in Bayesian networks This set of independence assertions is denoted Basis(G) . All other independence assertions that are entailed by (*) are derivable using the semi-graphoid axioms.
Lung Cancer (Yes/No) Tuberculosis (Yes/No) p(A|T,L) Abnormality in Chest (Yes/no) Local distributions- Asymmetric independence Table: p(A=y|L=n, T=n) = 0.02 p(A=y|L=n, T=y) = 0.60 p(A=y|L=y, T=n) = 0.99 p(A=y|L=y, T=y) = 0.99
COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds. Proof : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d.
COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten). Proof : By Corollary 4, each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P. In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.
Extension of the Markov Chain Property I(Xk, Xk-1, X1 … Xk-2) I(Xk, Xk-1 Xk+1, X1 … Xk-2 Xk+2… Xn ) Holds due to the soundness theorem. Converse holds when Intersection is assumed. Markov Blankets in DAGs
Consequence: There is no improvement to d-separation and no statement escapes graphical representation. Reasoning: (1) If there were an independence statement not shown by d-separation, then must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and violates . (2) Same argument. [Note that (2) is a stronger claim.]