Workshop on Incomplete Data

¤G¡BEM algorithm ¬ÛÃö¤åÄm¡C

¤T¡BImage Analysis¡C

¡u¤£§¹¾ã¼Æ¾Ú¸ê®Æ¡v¡]incomplete
data¡^¬O³\¦h»â°ìªº¹êÃÒ¬ã¨s¤¤±`¨£ªº°ÝÃD¤§¤@¡A¨ä©Òl¥Íªº²Îp¤ÀªR¬ÛÃö°ÝÃD¤w¸g¥B«ùÄò¦a¨ü¨ì¤F¼sªxªºª`·N¡C¦b¹L¥h¨â¦~¤º¡A¥xÆW¤¤³¡²Îp¾ÇªÌ¤wÁ|¿ì¤T¦¸¬ÛÃöªº¬ã°Q·|¡A¥»¬ã°Q·|´Á±æ¯àÄ~Äò±À°Ê¦¹»â°ìªº¬ã¨s¡A§@¬°°ê¤º¦³§Ó©ó¤£§¹¾ã¼Æ¾Ú¸ê®Æªº¬ã¨sªÌ¬Û¤¬¥æ¬yªº·¾³q´ë¹D¤§¤@¡CÅ²©ó°ê¤º²Îp¬É¦b¦¹»â°ì¤§¬ã¨sªÌ¤é²³¡A¦P®É¬°«P¶i²Îp¾Ç®a»P¨ä¥L¬ì¾Ç»â°ì¤§¹êÃÒ¬ã¨sªÌ¦b¦¹¤@°ÝÃD¤W¦³§ó±K¤Áªº°Q½×¤Î¤¬°Ê¡A©ó¥x¤j²z½×¬ì¾Ç¬ã¨s¤¤¤ß¤ä«ù¤U¤Î¤¤¥¡¬ã°|²Îp¬ì¾Ç¬ã¨s©Ò³Å©Ó¼w³Õ¤h¡B®v¤j¼Æ¾Ç¨tµ{¼Ý»¨±Ð±Â¡B¤Î»´ä¬ì§Þ¤j¾ÇJÁt´Á±Ð±Âªº¹ªÀy¤U¡A¶}©l¤F³oÓ¬ã°Q·|¡C

¥»ºô¶«Y¦C¥X¬ÛÃö¸ê®Æ¨Ñ°Ñ¦Ò¡C

¦p±z¦³¥ô¦ó«ØÄ³(§t¥DÃD¤ÎºtÁ¿¤H)©ÎÄ@·N°µ³ø§i½Ð»P¥x¤j¼Æ¾Ç¨t³¯§»³sÃ´

hchen@math.ntu.edu.tw¡C

## Missing Covariates Regression Problem

## Workshop

¬ã°Q·|¤¤ÁÜ½Ð¨ì¨â¦ì¤À§O±q¨Æ¤ß²zp¶q¤ÎÀç¾i¬y¦æ¯f¾Çªº±M®a¾ÇªÌÁ¿z¨ä¶i¦æ¤¤¤§¬ã¨s¸ê®Æ¤Î©Ò¾D¹J¤§¬ÛÃö°ÝÃD¡A¨Ã»P²Îp¾Ç®a¶i¦æ¬ÛÃö°Q½×¡A¥HÁA¸Ñ²Îp¤èªk¦b¸Ñ¨M¬ÛÃö°ÝÃD¤¤¥i¯à§êºt¤§¨¤¦â¡C¦b²Îp¤èªkªº§Þ³N¼h±³¡¤À¡A¥»¦¸¬ã°Q·|±N¥ýµÛ«©ó²{¦³¤èªk¾Ç¤§¦^ÅU¡C¨â¦ìè©ó¦¹»â°ì¡]incomplete covariate data, measurement error problem¡^¬°³Õ¤h½×¤å¥DÃD¡AÀò±o²Îp³Õ¤h¾Ç¦ìªºÁ¿ªÌ±N©ó·|¤¤¦^ÅU¨â½g«nªº¤åÄm¡]Robins et al. 1994, JASA; Carroll and Wand, 1991, JRSSB¡^¡A§@¬°¤j®a§ä´M§ó¦n¸Ñ¨M¤è®×ªº¶}ºÝ¡C¥DÁ¿ªÌ

§õ¬üæ¢±Ð±Â¡]°ê¨¾Âå¾Ç°|¡^:

Validation Study in Nutritional Epidemiology

¼Bªø¸©¬ã¨sû¡]¤¤¥¡¬ã°|²Îp¬ì¾Ç¬ã¨s©Ò¡^:

Missing Data in Social Science

µ{¼Ý»¨³Õ¤h

Review Robins, Rotnitzky and Zhao (1994) Estimation

of Regression Coefficients When Some Regressors Are

Not Always Observed, JASA, 89, 846-865.

Á§¼z±Ó³Õ¤h

Review Carroll and Wand (1991) Semiparametric

Estimation in Logistic Measurement Error Models,

JRSSB, 53, 573-587.

**Workshop 2:¤ý²M¶³³Õ¤h,Fred
Hutchinson Cancer Research Center**

Lecture 1: Introduction and Examples to Missing Data andMeasurement Errorabstract, chapter 1 of notesLecture 2: Logistic Regression with Covariate MeasurementErrorabstract, chapter 4 of notesLecture 3: Expected Estimating Equations to Accommodate CovariateMeasurement Errorabstract, notesLecture 4: Regression Analysis when Covariates are RegressionParameters of a Random Effects Model for ObservedLongitudinal Measurementsabstract, notes

Regression with Missing Covariates

of Munich transparencySurvival analysis under measurement error

EM algorithm¬ÛÃö¤åÄmªºSeminar

**¡´ **11/2/99¡GProf.
Yi-Hau Chen µ{¼Ý»¨
±Ð±Â**¡]®v¤j¼Æ¾Ç¨t¡^**transparency
**
An
Introduction to the EM Algorithm**
**¡´ ** 11/16/99¡GProf.
Hung Chen (National Taiwan Univ.)
** ³¯
§» ±Ð±Â (¥x¤j¼Æ¾Ç¨t)
**transparency
**
Using
EM to Obtain asymptotic Variance Matrices: The SEM Algorithm**
** Comments
made by Dr. Min-Te Chao »¯¥Á¼w³Õ¤h¡]¤¤¥¡¬ã°|²Îp¬ì¾Ç¬ã¨s©Ò¡^**
** He
says that "I liked the paper written by Louis, since one extra step at
the end**
**
finds the asy. variance matrix. But back in the early 1980's, I used that
algorithm,**
**
only to find a negative variance ---- there is a small typo in that paper.**
**
For the final formula (3.2'), there is a double summation 2 \sum_{i<j}
....**
**
which should read \sum_{i \ne j} since the use of EM destroyed the symmetricity
of**
**
the covariance structure. I wrote tom Tom and he agreed with me."**
**¡´ 11/30/99¡G**Dr.
Cheng-Der Fuh** ****³Å©Ó¼w³Õ¤h¡]Academia
Sinica,**
**
¤¤¥¡¬ã°|²Îp¬ì¾Ç¬ã¨s©Ò¡^**transparency
** An
Introduction to EM and ECM Algorithms**
** Comments
made by Dr. Fuh¡]¤¤¥¡¬ã°|²Îp¬ì¾Ç¬ã¨s©Ò¡^**
** He
says that "Theorems 1 and 4 in DLR ( JRSSB, 1977) give the rate of convergence**
**
of EM algorithm and Theorems 2 and 3 give the result on convergence.
In the proof of**
**
Theorems 2 and 3, one triangular inequality is applied incorrect.
This leads to the**
**
paper written by Wu in the Annals of Statistics."**
**¡´ **1/11/00¡GProf.
In-Chi
HuJÁt´Á±Ð±Â(Department
of Information & Systems

Management, The Hong Kong University of
Science & Technology,»´ä¬ì§Þ¤j¾Ç)**,**
** **transparency
** Some
refined versions of the EM algorithm**
**¡´**¬ÛÃö¤åÄm

1. Dempster, A. P. , Laird, N. M., and Rubin, D. B. (1997). Maximum

likelihood from incomplete data via EM algorithm ( with discussion).

*JRSSB*, **39**,1-38.

2. Smith's discussion and replies from Dempster, A. P. , Laird, N. M.,

and Rubin, D. B. (1997). Maximum likelihood from incomplete data

via EM algorithm ( with discussion). *JRSSB*, **39**, 1-38.

3. Meng, X.L. and Rubin, D.B., (1991). Using EM to obtain asymptotic

variance-covariance matrices: The SEM Algorithm. * JASA*, **86**,

899-909.

The supplemented EM or SEM algorithm enables users of EM to

calculate the incomplete-data asymptotic variance-covariance matrix

associated with the maximum likelihood estimate obtained by EM,

using only the computer code for EM and for the complete-data

asymptotic variance-covariance matrix.

Related references:

Meng, Xiao-Li and Rubin, Donald B. Maximum likelihood estimation

via the ECM algorithm: a general framework. Biometrika 80 (1993),

no. 2, 267--278.

Summary: "Two major reasons for the popularity of the EM algorithm

are that its maximum step involves only complete-data maximum

likelihood estimation, which is often computationally simple, and that

its convergence is stable, with each iteration increasing the likelihood.

When the associated complete-data maximum likelihood estimation itself

is complicated, EM is less attractive because the M-step is

computationally unattractive. In many cases, however, complete-data

maximum likelihood estimation is relatively simple when conditional on

some function of the parameters being estimated. We introduce a class of

generalized EM algorithms, which we call the ECM algorithm, for

expectation/conditional maximization (CM), that takes advantage of the

simplicity of complete-data conditional maximum likelihood estimation

by replacing a complicated M-step of EM with several computationally

simpler CM-steps. We show that the ECM algorithm shares all the

appealing convergence properties of EM, such as always increasing

the likelihood, and present several illustrative examples."

Maximum likelihood estimation via the ECM algorithm: computing

the asymptotic variance. Statist. Sinica 5 (1995), no. 1, 55--75.

Summary: "This paper provides detailed theory, algorithms,

and illustrations for computing asymptotic variance-covariance matrices

for maximum likelihood estimates using the ECM algorithm.

Meng, Xiao-Li On the rate of convergence of the ECM algorithm.

Ann. Statist. 22 (1994), no. 1, 326--339.

The EM algorithm is a very useful iterative algorithm converging to

a maximum likelihood estimator under incomplete data. The ECM

and MCECM algorithms are generalizations of it. The author

investigates the convergence rate of the ECM and MCECM algorithms.

He obtains a very beautiful expression for the matrix convergence rate
of

ECM and MCECM as follows:

DM^{ECM}(q^{*})
= DM^{EM}(q^{*})
+ {I-DM^{EM} EM}(q^{*})}\prod\sp

S_{s=1}P_{s},
where DM^{EM}(q^{*})
is the convergence rate of EM,

$P_{s}
=\nabla_{s}[\nabla\sp T_{s} I^{-1}\sb {\rm com}(q^{*})\nabla\sb
s]^{-1 }\nabla\sp T_{s}

I^{-1 }\sb
{\rm com}(q^{*}),
s=1,2,...,, with $\nabla_{s }= \nabla g\sb s(q^{*})$,

q^{*}
is the limit point and g_{s} are functions given in ECM.

Its derivation is extremely
ingenious. The author points out that this has

an appealing interpretation:

speed of ECM = (speed
of EM) X (speed of CM).

Moreover, he examines
the global rates of ECM and MCECM. Under

the situation Y_{1},Y_{2
~ }i.i.d N\Big[{q_{1
}\chooseq_{2}},{1
r\choose
r
1}\Big]$, he treats

the MLE of q
based on (y_{11}, y_{12 }- y_{21}) and, for this
incomplete data problem,

he calculates concretely
the matrix rates of convergence of EM, ECM and MCECM.

He compares their
global rates of convergence and points out that no dominance result

holds in general.

4. Louis, T.A. (1982). Finding the observed information matrix when

using the EM algorithm. *JRSSB*, **44**, 226-233.1.

5. Baum, L. E., Petrie, T., Soules, G. and Weiss, N. (1970).

A maximization technique occurring in the statistical analysis of

Probabilistic functions of Markov chains. *Ann. Math. Statist.*

**41**, 164-171.

6. Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation

via the ECM algorithm: a general framework. *Biometrika*, **80**,
267-278.

Summary: "Two major reasons for the popularity of the EM algorithm

are that its maximum step involves only complete-data maximum

likelihood estimation, which is often computationally simple, and that

its convergence is stable, with each iteration increasing the likelihood.

When the associated complete-data maximum likelihood estimation itself

is complicated, EM is less attractive because the M-step is computationally

unattractive. In many cases, however, complete-data maximum likelihood

estimation is relatively simple when conditional on some function of

the parameters being estimated. We introduce a class of generalized

EM algorithms, which we call the ECM algorithm, for expectation/conditional

maximization (CM), that takes advantage of the simplicity of complete-data

conditional maximum likelihood estimation by replacing a complicated

M-step of EM with several computationally simpler CM-steps. We show that

the ECM algorithm shares all the appealing convergence properties of EM,

such as always increasing the likelihood, and present several illustrative

examples."

7. Render, R. A. and Walker, H. F. (1984). Mixture densities, maximum

likelihood and the EM algorithm. *SIAM Review*, **26**, 195-239.

8. Meng, X. L. and van Dyk, D. (1997). The EM Algorithm-an Old

Folk-song Sung to a Fast New Tune. *JRSSB* 511-567.

Review: The EM algorithm is analysed with the aim of making it converge

faster while maintaining its simplicity and stability. A brief historical

account is given with many references and comments. The main

methodological contribution of the paper is the introduction of the "working

parameter" approach to searching for efficient data augmentation schemes

for constructing fast EM-type algorithms. Here an optimal EM algorithm

for the multivariate (including univariate) *t-*distribution with
known degrees

of freedom, simulation studies and theoretical derivations (the rate of

convergence, the matrix rate of convergence) are presented. The main

theoretical contribution is given in Section 3, where the formulation of

the alternating expectation-conditional maximization (AECM) algorithm,

which unifies several recent extensions of the EM algorithm that effectively

combine data augmentation with model reduction, is given. As examples,

a fitting of *t-*models with unknown degrees of freedom and an image

reconstruction under the Poisson model are given.

The paper is completed by a discussion. It was read before the Royal

Statistical Society and four contributors were invited to lead the discussion

(Donald B. Rubin, D. M. Titterington, Walter R. Gilks and Jean Diebolt).

Many others (17) participated in it or wrote letters with comments. The
authors

replied to all of them in writing.

The paper, together with the discussion, gives very good information on

the contemporaneous state of the EM and related algorithms.

9. Jamshidian, M. and R. Jennrich, R. (1997). Acceleration of the EM

Algorithm by using quasi-newton methods. *JRSSB*, 569-587.

10. The EM Algorithm and Extensions. G. McLachlan and T. Krishnan.

1997, John Wiley.

11. Bayesian Computation and Stochastic Systems. *Statistical Science*,

1995, 3-66.

Image Analysis

Created: November 9th, 1999

Last Revised: mARCH 23rd, 2000

© Copyright 1999 Hung Chen