Graphical Models

Graphical models us graphical language for representing and reasoning about random variables and their distributions. The edge represents dependence structure.

For instance, hair length and height are not independent, since there is a confounding variable of gender. Given gender, they are independent. We can draw a graph to represent this.

\[ \text{gender } G \]
\[ \swarrow \qquad \searrow \]
\[ \text{hair length } L \boldsymbol \quad \overset{\quad c \quad }{\longrightarrow} \quad \text{height } H \qquad \]
\[\begin{split} \begin{array}{ll} p(h \mid l) \neq p(h) & H \not \perp L \\ p(h \mid l, g)=p(h \mid g) & H \perp L \mid G \end{array} \end{split}\]

Bayesian Networks

Bayesian networks is an acyclic directed graphical models.

  • Each node is a variable \(X_i\), and its parents are denoted by \(\pi(X_i)\)

  • Local probability function \(P(X_i \mid \pi(X_i))\)

Factorization

In a Bayesian network, the joint probability can be factorized as

\[ P\left(X_{1}, \ldots, X_{n}\right)=\prod_{i=1}^{n} P\left(X_{i} \mid \pi\left(X_{i}\right)\right) \]

Fig. 138 Joint probability as factorization

How many numbers needed to encode the probabilities?

Suppose \(a,b,c,d\) are discrete and each have 10 values

  • representing \(p(a)\) requires 10 numbers (actually 9)

  • representing \(p(b \vert a)\) requires 100 numbers

So without factorization, to represent \(p(a,b,c,d)\) we need \(1e4\) numbers. But with factorization, we only need \(1e1 + 1e2 + 1e3 + 1e4 = 1210 \ll 1e4\). That’s the advantage of factorization.