Ph.D. Dissertation: Repeated Measures with Censored Data

Chapter 6   Sampling in MCEM Algorithm

In this Chapter, we detail the sampling scheme needed in the MCEM algorithm implementations outlined in Theorem 4.2 and Theorem 4.5. The augmented Gibbs sampler developed in Theorem 6.3 is very efficient and particularly suitable for sampling from truncated distributions and for EM style algorithms.

 
6.1   Sampling Scheme

Let us first briefly review some widely used sampling methods and algorithms, which are relevant to our implementation schemes.

 
Inverse Cumulative Distribution Method

This is a well known method for generating a (univariate) variable X with cumulative distribution function F. It is particularly attractive in designing efficient Monte Carlo methods (Johnson, 1987; Olkin et al., 1980). The method is stated simply as

Lemma 6.1.   A variate X under distribution function F can be generated as follows.

(1). Generate a variate U from uniform distribution U(0,1).

(2). Let X = supx[F(x) ≤ U ]. If F is strictly increasing, let X = F-1(U ).

 
Method of Composition

The method of composition (Tanner, 1996) is essential for generating multivariate random variables from univariate sampling.

Lemma 6.2.   Suppose ƒ(y|x) and g(x) are density functions where x and y may be vectors. Repeat the following two steps m times.

(1). Draw x* ~ g(x).

(2). Draw y*~ ƒ(y|x*).

The pairs (x1, y1), ..., (xm, ym) are an i.i.d. sample from the joint density h(x, y)=ƒ(y|x)g(x), while y1, ..., ym are an i.i.d. sample from the marginal ∫ƒ(y|x)g(x) d x.

Theoretically, we can generate random variables with any dimension with this lemma. The method is a key technique underlying the data augmentation algorithms.

 
Gibbs Sampler

The Gibbs sampler is a powerful Markov Chain Monte Carlo (MCMC) data augmentation technique (Geman & Geman, 1984; Gelfand & Smith, 1990). It has been widely utilized in the image-processing and other large scale models such as neural networks and expert systems. The general problem is to draw (dependent) samples from a very complex multivariate density. Suppose that y is a k×1 random vector with density ƒ(y), also that the following conditional distributions are known,

yi|(y1, ..., yi-1, yi+1, ..., yk) ~ ƒi(yi|y1, ..., yi-1, yi+1, ..., yk),    (i = 1, ..., k).

Lemma 6.3. (Gibbs Sampler)   Given an arbitrary starting point in the support of ƒ(y) as y(0) = (y1(0), ..., yk(0))'. The iteration i+1 gets y(i+1) = (y1(i+1), ..., yk(i+1))' as follows.

For j = 1, ..., k, draw yj(i+1) from ƒj(i+1)j(yj|y1(i+1), ..., yj-1(i+1), yj+1(i), ..., yk+1(i)).

The vectors y(0), y(1), y(2), ... are a realization of a Markov chain with transition probability from y(i) to y(i+1) as ƒj(i+1). Under very mild regularity conditions, the joint distribution of (y1(i), y2(i), ..., yk(i)) converges geometrically to the unique invariant distribution ƒ(y) in the L1 norm (Chan, 1993; Geman & Geman, 1984). More importantly, though these y(i)'s are not independent, the following property still holds. For any integrable function g(y),

These results suggest that if the chain is run to equilibrium, one can use the simulated data as a basis for summarizing ƒ(y), and the average of a function of interest over values from a chain yields a consistent estimator of its expectation.

In practice, we need to discard some initial iterations as ``burn-in" steps so that the sampling is not far from the converged distribution. If an iid sample is desired, multiple independent chains with different starting points or suitable spacings between realizations could be employed. Also, determining run length and assessing convergence are important practical issues (Brooks & Gelman, 1998; Tanner, 1996). A good explanation of Gibbs sampler can be found in Casella and George (1992).

Two other widely used sampling algorithms are data-augmentation (Tanner & Wong, 1987) and sampling-importance-resampling (Rubin, 1987). Gelfand and Smith (1990) compared them with the Gibbs sampler to the calculation of marginal densities and obtained some favorable findings to the Gibbs sampler and the data-augmentation algorithm. As they argued, the two iterative methods are closely related, and the latter could slightly speed up the Gibbs sampler if more reduced conditional densities are available.

 
6.2   Sampling from Conditional Densities k1 and k2

In this section, we discuss the sampling scheme from general distributions. From next section on, we deal with multivariate normal distribution in particular, wherein a rather attractive Gibbs sampler is developed, which could be generalized to many general distributions.

Sampling from Conditional Distribution k1

When some components of y is not completely known, we need to generate random samples of y from the conditional distribution k1(y|u, ) given in equation (4.6) as

.

This is a density function over *, the observed range of incomplete components of y. It is easy to rewrite it in terms of conditional distribution as
(6.1)

which is a truncated density of h1(y*|y+,u,). Sampling from such distribution, especially the truncated multivariate normal distribution, has been very well studied in the literature and usually can be handled by the Gibbs sampler or other Markov Chain Monte Carlo (MCMC) methods such as the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970). As an added advantage of these MCMC methods, the integral calculation in the denominator of (6.1) can usually be avoided.

Sampling from Conditional Distribution k2

The key step is to generate random samples of u from the conditional distribution k2(u|,) given in equation (4.7) as

(6.2)

For simplicity, assume u is a scalar. However, it is not difficult to extend it to a vector. Denote the cumulative distribution function of k2(u|,) by

which is continuous in our context.

By the inverse cumulative distribution method in Lemma 6.1, we have

Lemma 6.4.   To generate a sample of size m from k2(u|,), proceed as follows.

(1). Draw a sample a(1), a(2), ..., a(m) from uniform distribution U(0, 1).

(2). Solve the equation K2(u) = a(i) for u to get u(i).

Then sample u(1), u(2), ..., u(m) is such a random sample from k2(u|,).

The equations in step (2) can be solved by simple numerical algorithms such as the bisection method or the Newton's method. Now the problem becomes how to calculate K2(u) when u is known. Denote function

(6.3)
then

Let I(.) be the indicator function. For given v,

(6.4)

Now we need to calculate the two expectations above with respect to h2(u|).

Lemma 6.5.   For any given v, we can proceed as follows with a sample of size m.

(1). Draw a sample u(1), u(2), ..., u(m) from h2(u|).

(2). Calculate expectations as

Now we need to calculate h0(u|,) when u is known. That is to calculate each item h1(yi|u,)dyi in the product (6.3). Based on Definition 3.1, we proceed the calculation according to the three different cases of incompleteness.

Case (a), if all components of yi are complete, it is simply

h1(yi|u,)dyi = h1(yi|u,).

Case (c), if all components of yi are incomplete, i.e. yi i, we have

Now the calculation can be done by the following simulation.

(1). Draw a sample yi(1), yi(2), ..., yi(m) from i under h1(yi|u,).

(2). Calculate h1(yi|u,)dyi as .

Case (b), if some components of yi are incomplete, say yi = (yi+, yi*) {yi+} × i*,

That is, plug in completely observed components yi+ into the integrand and then integrating over i* for incomplete components.

Since in general h1(yi+, yi*|u, ) is no longer a density function. It is standardized using the marginal density of yi+|(u, ) as

where i* is the sampling space for yi*, and ci could be calculated by Monte Carlo method if no closed form solution is available. This results the conditional distribution density function for yi* given (yi+, u, ) as h1(yi*|yi+, u, ) = h1(yi+,yi*|u, )/ci. Then

Lemma 6.6.   The calculation of h0(u|,) can be done by the following simulation.

(1). Draw a sample yi*(1), yi*(2), ..., yi*(m) from i* under h1(yi*|yi+, u, ).

(2). Calculate h1(yi|u, ) dyi as .

Note that case (c) is a special case where ci = 1, and case (a) is also a special case with ci = h1(yi|u, ) and no drawing is necessary. In summary, assume u is a scalar,

Theorem 6.1.   A sample of u from the conditional distribution k2(u|, ) can be drawn as follows.

(1) Draw a sample a(1), a(2), ..., a(m) from uniform distribution U(0, 1).

(2) Draw a sample u0(1), u0(2), ..., u0(m) from h2(u|).

(3) Calculate h0(u0(j)|,) for j = 1, 2, ..., m as follows.

(3.1) For i = 1, ..., n and j = 1, ..., m, calculate h1(yi|u0(j), ) dyi as below.

(a) if yi is completely known, let h1(yi|u0(j), ) dyi = h1(yi|u0(j), ).

(b) if some components of yi are incomplete, draw a sample of yi* from i* as yi*(j,1), yi*(j,2), ..., yi*(j,m) under h1(yi*|yi+, u0(j), ), let

(c) if all components of yi are incomplete, draw a sample of yi from i as yi(j,1), yi(j,2), ..., yi(j,m) under h1(yi|u0(j), ), let

(3.2) Calculate h0(u0(j)|,) = h1(yi|u0(j), ) dyi.

(4) Solve the equation for u to get u(i) for i = 1, 2, ..., m, where

Then sample u(1), u(2), ..., u(m) is a random sample from k2(u|,).

 
6.3   Mixed Models with Normal Distribution

Under multivariate normal distribution, we can design a very attractive Gibbs sampler employing the idea of latent variables (Damien & Walker, 1996). This method is remarkably suitable for truncated densities and EM style algorithms. It is very easy to implement and bypasses the need for rejection sampling and algorithms such as the Metropolis-Hastings and sampling-resampling. With the introduction of one or more latent variables, most conditional distributions can be sampled via uniform variates. It is applicable in a very broad range of distributions.

6.3.1   Normal Model

For the Mixed model given in equation (4.2), yi = Xi β + Ziu + ei, under multivariate normal distributions u ~ Nq(0, Σ2) and ei ~ Nk(0, Σ1), then yi|u ~ Nk(μi, Σ1), where μi = Xi β + Ziu, which is a function of u.

For multivariate normal distribution yi|(u,) ~ Nk(μi, Σ1), both the marginal distribution of yi+|(u,) and the conditional distribution h1(yi*|yi+, u, ) of yi* given (yi+, u, ) are also normal. Given u and , then μi and Σ1 are also known. If we partition

(6.5)

then the marginal distribution of yi+|(u,) is multivariate normal with mean μ+ and covariance matrix Σ+, and the conditional distribution of yi*|(yi+, u, ) is with mean μ*+ Σ*+(yi+- μ+) and covariance matrix Σ*-Σ*+Σ+*. Therefore

(6.6)

where k+ and k* are the number of complete and incomplete components of yi, respectively.

 
6.3.2   Sampling from k1(yi*|yi+, u, )

Now discuss the sampling from k1(yi*|yi+, u, ), a truncated multivariate normal distribution when yi has incomplete components yi*. As in (6.1), given u and yi* i*,

k1(yi*|yi+, u, ) h1(yi*|yi+, u, ) I(yi* i*).

We will construct a general Gibbs sampler from truncated multivariate normal distributions, with sampling from k1(yi*|yi+, u, ) as a direct application. First give the following simple lemma about the roots of quadratic equations.

Given matrix A = (aij), let vector a(j) be the jth column of A without the jth element ajj, and matrix Ajj be the sub-matrix of A without the jth row and column.

Lemma 6.7   Suppose x=(x1, ..., xk)' and matrix A = (aij)k×k. Given xi (ij) and y, if ajj > 0, the solution of xj from the quadratic inequality x'Ax < y is given as xj(l)xjxj(r), where

(6.7)

and x(j) is the vector x without the jth element xj.

If A is diagonal as A={d aii}, the roots in equation (6.7) become

(6.8)

Now we state the following augmented Gibbs sampler for truncated multivariate normal distributions, which is implied in Cumbus et al. (1996).

Theorem 6.2   Suppose x=(x1, ..., xk)' ~ Nk(0, Σ) and x A, say Ai = (ai, bi). Then we have the following augmented Gibbs sampler for drawing x.

(1). Given an arbitrary starting point x(0)=(x1(0), ..., xk(0))' A. Set counter i = 0.

(2). Draw a ~ U(0, 1) and let y(i) = x'(i) x(i) - 2ln(1 - a).

(3). For each j, in the order from 1 to k draw xj(i+1) from uniform distribution over the set , where . At the end of this step, we have x(i+1) = (x1(i+1), x2(i+1), ..., xk(i+1))'.

When i is large enough (after initial burn-in period), x(i+1) is approximately from Nk(0, Σ). Repeat (2-3) to sample more points. Note the bounding set in (3) can be obtained by equation (6.7).

Proof.   With a latent variate y, easy to verify that the joint density

has its marginal density for x as Nk(0, Σ) truncated by x A. In order to show that the theorem is a Gibbs sampling of (x, y) from the joint density ƒ(x, y), we only need to show that step (2) is in fact sampling y(i) from ƒy|x(y|x(i)), and step (3) is sampling xj(i+1) from .

Based on the joint distribution ƒ(x, y), given x, variate y has truncated exponential distribution , which can be sampled by the inverse cumulative distribution method. It is done in step (2).

On the other hand, given y, vector x has multivariate uniform distribution over the region {x|y: x'x<y}∩A. In particular, each component of x has conditional density ƒj(xj|x1, ..., xj-1, xj+1, ..., xk, y), which is univariate uniform

ƒj(xj|x1, ..., xj-1, xj+1, ..., xk, y) I(x'x<y, xj Aj),

over the set {xj: x'x<y}∩ Aj. This is step (3) and we finish the proof.

Note that (ai, bi) could have ai=-∞ or bi=+∞. In fact, the bounding set A could be far more complex than intervals.

Since k1(yi*|yi+,u,) h1(yi*|yi+,u,) I(yi* i*), and h1(yi*|yi+,u,) is normal with mean μ*+ Σ*+(yi+- μ+) and covariance matrix Σ*-Σ*+Σ+*. Sampling from k1(yi*|yi+,u,) is now straightforward.

Corollary 6.1   Drawing a variate yi* ~ k1(yi*|yi+,u,) can proceed as follows.

(1). Calculate μ0 = μ* + Σ*+(yi+-μ+) and Σ0 = Σ*-Σ*+Σ+*.

(2). Draw y0 ~ Nk*(0, Σ0) over the region A = {y0: y0 + μ0 i*} as above.

(3). Let yi* = y0 + μ0, then yi* is a drawing from k1(yi*|yi+,u,) over i*.

 
6.3.3   Sampling from k2(u|, ) and k(|, )

Now we employ similar idea of latent variates to implement a Gibbs sampler for drawing u from the conditional distribution k2(u|, ) given in equation (4.7) as

(6.9)

In fact, sample from the conditional distribution k(|, ) as given by (4.8),

Let latent variables w1, ..., wn, v together with y1*, ..., yn*, u have joint density

Note that yi = (yi1, ..., yik)', yi* denotes its incomplete components in i*, and μi is a function of u = (u1, ..., uq)' as μi(u) = Xi β + Ziu.

It is easy to verify that based on this joint distribution, the marginal density function for u is k2(u|, ) and the conditional density for yi* given u is k1(yi*|yi+, u, ). Thus sampling from this joint distribution will yield a sample of u ~ k2(u|, ) and a sample of yi* ~ k1(yi*|yi+, u, ) as well. In fact we get a sample from the conditional distribution k(|, ). Therefore with this scheme we will get all samples needed in the MCEM algorithm step (1.1) and step (1.2) of Theorem 4.5.

The following conditional densities are easy to obtain and essential for Gibbs sampling. Given all other variables, each yij and uj has uniform distribution, while wi and v have truncated exponential distributions. All of these distributions are very easy to sample from.

If yij is incomplete, given all other incomplete y's, as well as u, v, w's,

where Aij = {yij: (yi - μi)'(yi - μi) < wi}, which can be obtained by (6.7). This is a univariate uniform distribution over set Aij ∩ {yij: yi* i*}.

For each component uj of u,

It is a uniform distribution over the set

which can be obtained by equation (6.7).

Given y* =( y1*, ..., yn*) and u, wi and v have truncated exponential distributions

with support at wi > (yi - μi)'(yi - μi) and v > u'u, respectively. These can be sampled easily by the inverse cumulative distribution method. Thus we have a Gibbs sampler for k(|, ), k2(u|, ), and k1(yi*|yi+, u, ) as below.

Theorem 6.3   Assume is known. An augmented Gibbs sampler for k(|, ) and k2(u|, ) can be formatted as below.

(1). Select staring points for u and yj for j = 1, ..., n, and set counter i = 0.
(1a). Select an arbitrary starting point u(0), and calculate μj(0) = μj(u(0)).
(1b). Get starting point yj(0) as follows. For l = 1, ..., k, if yjl is known, set yjl(0) = yjl; otherwise select an arbitrary value of yjl(0) such that yj(0) j.
(2). Draw a ~ U (0, 1). Let

(3). For j = 1, ..., n, draw aj ~ U (0, 1). Let

(4). In the order of j from 1 to q draw uj(i+1) from uniform distribution over the set

where and

Let u(i+1) =(u1(i+1), ..., uq(i+1))' and obtain μj(i+1) = μj(u(i+1)) for j = 1, ..., n.

(5). For j = 1, ..., n, get yj(i+1) =( yj1(i+1), yj2(i+1), ..., yjk(i+1))' as below. In the order of l from 1 to k, if yjl is completely known, set yjl(i+1) = yjl; otherwise draw yjl(i+1) from uniform distribution over the set , where and

When i is large enough (after initial burn-in period), u(i+1) is approximately from k2(u|, ), yj(i+1) from k1(yj|u(i+1), 1) and (i+1) = ( y1(i+1), ..., yn(i+1), u(i+1)) from k(|, ). Repeat steps (2-5) with i+1 to sample more points.

Note that the bounding sets in steps (4) and (5) can be obtained by equation(6.7).

This Gibbs sampler is very efficient and will be used in our subsequent simulations and data analysis. More importantly, the use of latent variables in constructing the augmented sampler is an ideal match for sampling incomplete data in EM style algorithm. Its potential could be enormous in sampling schemes for very complicated distributions and in other areas as well.

 

Last update: 5/1/2001