Theorem 2: An ordered Gibbs sampler applied to a consistent dependency network for , where each is finite (and hence discrete) and each local distribution is positive, defines a Markov chain with a unique stationary joint distribution for equal to that can be reached from any initial state of the chain.
Proof: Let be the sample of after the iteration of the ordered Gibbs sampler. The sequence can be viewed as samples drawn from a homogenous Markov chain with transition matrix having elements . The matrix is the product , where is the ``local'' transition matrix describing the resampling of according to the local distribution . The positivity of local distributions guarantees the positivity of , which in turn guarantees the irreducibility of the Markov chain. Consequently, there exists a unique joint distribution that is stationary with respect to . The positivity of also guarantees that this stationary distribution can be reached from any starting point. That this stationary distribution is equal to is proved in the Appendix.
After a sufficient number of iterations, the samples in the chain will be drawn from the stationary distribution for . We use these samples to estimate . There is much written about how long to run the chain before keeping samples (the ``burn in'') and how many samples to use to obtain a good estimate (e.g., Gilks, Richardson, and Spiegelhalter, 1996). We do not discuss the details here. Next, let us consider the use of Gibbs sampling to compute for particular instances and where and are arbitrary disjoint subsets of . A naive approach uses an ordered Gibbs sampler directly. During the iterations, only samples of where are used to compute the conditional probability. An important difficulty with this approach is that if either or is small (a common occurrence when or contain many variables), many iterations are required for an accurate probability estimate. A well-known approach for estimating when is small is to fix during ordered Gibbs sampling. It is not difficult to generalize the proof of Theorem 2 to show that this modified ordered Gibbs sampler has a stationary distribution and yields the correct conditional probability given a consistent dependency network. When is rare because contains many variables, we can use the independencies encoded in a dependency network along with the law of total probability to decompose the inference task into a set of inference tasks on single variables. For example, consider the dependency network . Given the independencies specified by this network, we have
1 (* the unprocessed variables *)
2 (* the processed and conditioning variables *)
3 (* the values for *)
5 Choose such that has no more parents in than any variable in
6 If all the parents of are in
9 Use a modified ordered Gibbs sampler to determine
13 Return the product of the conditionals
The key step in this algorithm is step 7, which bypasses Gibbs sampling when it is not needed. This step is justified by Equation 1.