不完整數據研討會

Workshop on Incomplete Data

主辦單位：台大理論科學中心,台大數學系

期間：自88年1月1日至89年6月30日止

執行項目：

       一、Missing Covariates Regression Problem。
   二、EM algorithm 相關文獻。
   三、Image Analysis。

「不完整數據資料」（incomplete data）是許多領域的實證研究中常見的問題之一，其所衍生的統計分析相關問題已經且持續地受到了廣泛的注意。在過去兩年內，台灣中部統計學者已舉辦三次相關的研討會，本研討會期望能繼續推動此領域的研究，作為國內有志於不完整數據資料的研究者相互交流的溝通渠道之一。鑑於國內統計界在此領域之研究者日眾，同時為促進統計學家與其他科學領域之實證研究者在此一問題上有更密切的討論及互動，於台大理論科學研究中心支持下及中央研院統計科學研究所傅承德博士、師大數學系程毅豪教授、及香港科技大學胡膺期教授的鼓勵下，開始了這個研討會。

本網頁係列出相關資料供參考。
如您有任何建議(含主題及演講人)或願意做報告請與台大數學系陳宏連繫
hchen@math.ntu.edu.tw。

Missing Covariates Regression Problem

Workshop

Workshop 1: 6/22/99-6/23/99不完整數據研討會

研討會中邀請到兩位分別從事心理計量及營養流行病學的專家學者講述其進行中之研究資料及所遭遇之相關問題，並與統計學家進行相關討論，以瞭解統計方法在解決相關問題中可能扮演之角色。在統計方法的技術層面部分，本次研討會將先著重於現有方法學之回顧。兩位剛於此領域（incomplete covariate data, measurement error problem）為博士論文主題，獲得統計博士學位的講者將於會中回顧兩篇重要的文獻（Robins et al. 1994, JASA; Carroll and Wand, 1991, JRSSB），作為大家找尋更好解決方案的開端。

主講者
   李美璇教授（國防醫學院）:
                            Validation Study in Nutritional Epidemiology
   劉長萱研究員（中央研院統計科學研究所）:
                            Missing Data in Social Science
   程毅豪博士 (台大流行病學研究所生物統計組):
                            Review Robins, Rotnitzky and Zhao (1994) Estimation
                                  of Regression Coefficients When Some Regressors Are
                                  Not Always Observed, JASA, 89, 846-865.
      薛慧敏博士 （中央大學統計研究所）:
                           Review Carroll and Wand (1991) Semiparametric
                                  Estimation in Logistic Measurement Error Models,
                                  JRSSB, 53, 573-587.

Workshop 2:王清雲博士,Fred Hutchinson Cancer Research Center

Lecture 1: Introduction and Examples to Missing Data and
                 Measurement Error
         abstract, chapter 1 of notes
Lecture 2: Logistic Regression with Covariate Measurement
                Error
         abstract, chapter 4 of notes
Lecture 3: Expected Estimating Equations to Accommodate Covariate
                 Measurement Error
         abstract, note s
Lecture 4: Regression Analysis when Covariates are Regression
               Parameters of a Random Effects Model for Observed
               Longitudinal Measurements
         abstract, notes

● 3/15/99：Dr. C.Y. Wang王清雲博士(Fred Hutchinson Cancer

Research Center)

Recalibration Based on An Approximate Relative Risk Estimator in Cox

Regression with Missing Covariates

● 12/14/99：Dr. Thomas Augustin, Department of Statistics, University

of Munich transparency
Survival analysis under measurement error

EM algorithm相關文獻的Seminar

● 11/2/99：Prof. Yi-Hau Chen 程毅豪教授（師大數學系）transparency
   An Introduction to the EM Algorithm
● 11/16/99：Prof. Hung Chen (National Taiwan Univ.)
     陳宏教授 (台大數學系) transparency
   Using EM to Obtain asymptotic Variance Matrices: The SEM Algorithm
      Comments made by Dr. Min-Te Chao 趙民德博士（中央研院統計科學研究所）
He says that "I liked the paper written by Louis, since one extra step at the end
       finds the asy. variance matrix. But back in the early 1980's, I used that algorithm,
       only to find a negative variance ---- there is a small typo in that paper.
       For the final formula (3.2'), there is a double summation 2 \sum_{i<j} ....
       which should read \sum_{i \ne j} since the use of EM destroyed the symmetricity of
       the covariance structure. I wrote tom Tom and he agreed with me."
● 11/30/99：Dr. Cheng-Der Fuh 傅承德博士（Academia Sinica,
    中央研院統計科學研究所）transparency
   An Introduction to EM and ECM Algorithms
     Comments made by Dr. Fuh（中央研院統計科學研究所）
   He says that "Theorems 1 and 4 in DLR ( JRSSB, 1977) give the rate of convergence
       of EM algorithm and Theorems 2 and 3 give the result on convergence. In the proof of
       Theorems 2 and 3, one triangular inequality is applied incorrect. This leads to the
       paper written by Wu in the Annals of Statistics."
● 1/11/00：Prof. In-Chi Hu胡膺期教授(Department of Information & Systems
      Management, The Hong Kong University of Science & Technology,香港科技大學),
    transparency
    Some refined versions of the EM algorithm
●相關文獻
     1. Dempster, A. P. , Laird, N. M., and Rubin, D. B. (1997). Maximum
        likelihood from incomplete data via EM algorithm ( with discussion).
      JRSSB, 39,1-38.
     2. Smith's discussion and replies from Dempster, A. P. , Laird, N. M.,
        and Rubin, D. B. (1997). Maximum likelihood from incomplete data
        via EM algorithm ( with discussion). JRSSB, 39, 1-38.
     3. Meng, X.L. and Rubin, D.B., (1991). Using EM to obtain asymptotic
        variance-covariance matrices: The SEM Algorithm. JASA, 86,
        899-909.
        The supplemented EM or SEM algorithm enables users of EM to
        calculate the incomplete-data asymptotic variance-covariance matrix
        associated with the maximum likelihood estimate obtained by EM,
        using only the computer code for EM and for the complete-data
        asymptotic variance-covariance matrix.
        Related references:
        Meng, Xiao-Li and Rubin, Donald B. Maximum likelihood estimation
        via the ECM algorithm: a general framework. Biometrika 80 (1993),
        no. 2, 267--278.
        Summary: "Two major reasons for the popularity of the EM algorithm
        are that its maximum step involves only complete-data maximum
        likelihood estimation, which is often computationally simple, and that
        its convergence is stable, with each iteration increasing the likelihood.
        When the associated complete-data maximum likelihood estimation itself
        is complicated, EM is less attractive because the M-step is
        computationally unattractive. In many cases, however, complete-data
        maximum likelihood estimation is relatively simple when conditional on
        some function of the parameters being estimated. We introduce a class of
        generalized EM algorithms, which we call the ECM algorithm, for
        expectation/conditional maximization (CM), that takes advantage of the
        simplicity of complete-data conditional maximum likelihood estimation
        by replacing a complicated M-step of EM with several computationally
        simpler CM-steps. We show that the ECM algorithm shares all the
        appealing convergence properties of EM, such as always increasing
        the likelihood, and present several illustrative examples."

        Maximum likelihood estimation via the ECM algorithm: computing
        the asymptotic variance. Statist. Sinica 5 (1995), no. 1, 55--75.
        Summary: "This paper provides detailed theory, algorithms,
        and illustrations for computing asymptotic variance-covariance matrices
        for maximum likelihood estimates using the ECM algorithm.

        Meng, Xiao-Li On the rate of convergence of the ECM algorithm.
        Ann. Statist. 22 (1994), no. 1, 326--339.
        The EM algorithm is a very useful iterative algorithm converging to
        a maximum likelihood estimator under incomplete data. The ECM
        and MCECM algorithms are generalizations of it. The author
        investigates the convergence rate of the ECM and MCECM algorithms.
        He obtains a very beautiful expression for the matrix convergence rate of
        ECM and MCECM as follows:
        DM^ECM(q^*) = DM^EM(q^*) + {I-DM^EM EM}(q^*)}\prod\sp
S_s=1P_s, where DM^EM(q^*) is the convergence rate of EM,
$P_s =\nabla_s[\nabla\sp T_s I^-1\sb {\rm com}(q^*)\nabla\sb s]^-1\nabla\sp T_s
I^-1\sb {\rm com}(q^*), s=1,2,...,, with $\nabla_s= \nabla g\sb s(q^*)$,
q^* is the limit point and g_s are functions given in ECM.

Its derivation is extremely ingenious. The author points out that this has
an appealing interpretation:
speed of ECM = (speed of EM) X (speed of CM).
Moreover, he examines the global rates of ECM and MCECM. Under
the situation Y₁,Y_2
~i.i.d N\Big[{q₁\chooseq₂},{1 r\choose r 1}\Big]$, he treats
the MLE of q based on (y₁₁, y₁₂- y₂₁) and, for this incomplete data problem,
he calculates concretely the matrix rates of convergence of EM, ECM and MCECM.
He compares their global rates of convergence and points out that no dominance result
holds in general.

     4. Louis, T.A. (1982). Finding the observed information matrix when
         using the EM algorithm. JRSSB, 44, 226-233.1.
     5. Baum, L. E., Petrie, T., Soules, G. and Weiss, N. (1970).
        A maximization technique occurring in the statistical analysis of
        Probabilistic functions of Markov chains. Ann. Math. Statist.
      41, 164-171.
     6. Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation
        via the ECM algorithm: a general framework. Biometrika, 80, 267-278.
        Summary: "Two major reasons for the popularity of the EM algorithm
        are that its maximum step involves only complete-data maximum
        likelihood estimation, which is often computationally simple, and that
        its convergence is stable, with each iteration increasing the likelihood.
        When the associated complete-data maximum likelihood estimation itself
        is complicated, EM is less attractive because the M-step is computationally
        unattractive. In many cases, however, complete-data maximum likelihood
        estimation is relatively simple when conditional on some function of
        the parameters being estimated.   We introduce a class of generalized
        EM algorithms, which we call the ECM algorithm, for expectation/conditional
        maximization (CM), that takes advantage of the simplicity of complete-data
        conditional maximum likelihood estimation by replacing a complicated
        M-step of EM with several computationally simpler CM-steps. We show that
        the ECM algorithm shares all the appealing convergence properties of EM,
        such as always increasing the likelihood, and present several illustrative
        examples."
     7. Render, R. A. and Walker, H. F. (1984). Mixture densities, maximum
        likelihood and the EM algorithm. SIAM Review, 26, 195-239.
     8. Meng, X. L. and van Dyk, D. (1997). The EM Algorithm-an Old
        Folk-song Sung to a Fast New Tune. JRSSB 511-567.
        Review: The EM algorithm is analysed with the aim of making it converge
        faster while maintaining its simplicity and stability. A brief historical
        account is given with many references and comments. The main
        methodological contribution of the paper is the introduction of the "working
        parameter" approach to searching for efficient data augmentation schemes
        for constructing fast EM-type algorithms. Here an optimal EM algorithm
        for the multivariate (including univariate) t-distribution with known degrees
        of freedom, simulation studies and theoretical derivations (the rate of
        convergence, the matrix rate of convergence) are presented. The main
        theoretical contribution is given in Section 3, where the formulation of
        the alternating expectation-conditional maximization (AECM) algorithm,
        which unifies several recent extensions of the EM algorithm that effectively
        combine data augmentation with model reduction, is given. As examples,
        a fitting of t-models with unknown degrees of freedom and an image
        reconstruction under the Poisson model are given.
        The paper is completed by a discussion. It was read before the Royal
        Statistical Society and four contributors were invited to lead the discussion
        (Donald B. Rubin, D. M. Titterington, Walter R. Gilks and Jean Diebolt).
        Many others (17) participated in it or wrote letters with comments. The authors
        replied to all of them in writing.
        The paper, together with the discussion, gives very good information on
         the contemporaneous state of the EM and related algorithms.
     9. Jamshidian, M. and R. Jennrich, R. (1997). Acceleration of the EM
        Algorithm by using quasi-newton methods. JRSSB, 569-587.
    10. The EM Algorithm and Extensions. G. McLachlan and T. Krishnan.
         1997, John Wiley.
    11. Bayesian Computation and Stochastic Systems. Statistical Science,
        1995, 3-66.

Image Analysis

Created: November 9th, 1999
Last Revised: mARCH 23rd, 2000
� Copyright 1999 Hung Chen

Back to the home page of Hung Chen.
Connect to the home page of Department of Mathematics, National Taiwan University, Taiwan.