# Large Sample Theory

(Est. February 2003, Revised:  11/02/2004)

Prerequisite: One year graduate level  mathematical statistics courses at the level of Casella and Berger's book entitled Statistical Inference.
• ### References

• Bickel, P.J. and Doksum, K.J. (2001) Mathematical Statistics: Basic Ideas and Selected Topics Volume 1. Prentice Hall.

• Ferguson, T.S. (1996) A Course in Large Sample Theory Chapman & Hall.

• Lehmann, E.L. and G. Casella (1998) Theory of Point Estimation, 2nd ed. New York : Springer.

• ### Internet Resources

課程內容

1. Bootstrap Method and its validity (updated 9/13/04)

• An "Introduction to the Bootstrap" course Webpage at Stanford University.

## Machine Learning;  Looking Inside the Black Box; Software for the Masses

• R News on Reampling Methods in R
• Practice of Statistics and Examples: Section 1
• Bootstrap method: Section 2
• Validity and Double Array CLT: Section 3
• Inconsistency: Section 4
• Bias Reduction: Section 5
• E.J. Beran's comments on Efron's paper "Efron, B. (1979) Bootstrap Methods: Another Look at the Jackknife" Annals of Statistics, 7, 1-26
• Sec. 2: It formulated the bootstrap idea as an intellectual object accessible to theoretical study and gave it a name.
• Sec. 2 and Remark G in Sec. 8:  It introduced the interpretation of a bootstrap distribution as a conditional distribution given the sample.
• Sec. 2: It described the natural Monte Carlo approach to approximating such conditional distributions.
• Sec. 3:  It pointed to the validity of the non­parametric bootstrap for the sample median, in contrast to the jackknife.
• Sec. 5:  It indicated that jackknife procedures might often be seen as approx­imations to conceptually simpler bootstrap methods.
• Sec. 6:  In specific two-sample situations, it illustrated the possible validity of nonparametric bootstrap estimates for bias and variance.
• Sec. 7:  It proposed a bootstrap competitor to the cross-validation estimate of error rate in discriminant bootstrapping.
• Remark G in Sec. 8:  It proved, for finite sample spaces, the first theorem on consistency of the nonparametric bootstrap distribution as an estimate of an unknown sampling distribution.
• Sec. 2 and Remark F in Sec. 8: It suggested, without quite pinning down the idea, that a nearly pivotal quantity would boot­strap more successfully than a less pivotal quantity.
• Remark D in Sec. 8:  it made provocative comments about the construction of boot­strap confidence intervals.
• The most alluring aspect of the bootstrap is its substitution of feasible computer experiments for analytically difficult distribution theory.
• The distribution theory behind the chi-squared test, the t-test, and the  F-test, all developed early in 20th century, created modern statistics and gave probability theory an important role in our discipline. The computational need to stay with statistical procedures that use tabulated distributions surely shaped research for several decades thereafter. Efron's 1979 pape on the bootstrap, the research it has inspired, and recent advances in the study of chaotic systems have revealed intriguing new possibilities for statistical  thought in the computer age.
• I would suggest that statistical perceptions in 1979 were influenced by four historical developments.
• First, by the late 1970s, the revolution in computing, and subsequently in data analysis, had put theoretical statistics on the defensive. It was becoming increasingly clear that the classical formulations of statistical theory, whether frequentist or Bayesian, did not provide a realistic paradigm for the analysis of large data sets. One response was growing theoretical interest in the jackknife, cross-validation, and certain other resampling schemes [see references in Efron (1982, The Jackknife, the Bootstrap and Other Resampling Plans, SIAM, Philadelphia , Pa.). These were all methods that seemed to rely on direct internal examination of the data, rather than on fitting an externally conceived statistical model.
• Second, some data analysts, not all professional statisticians, had been experimenting in the 1960s and 1970s with Monte Carlo simulations from fitted models as a means of generating plausible critical values for confidence statements or tests. Examples include Williams (Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures, Biometrics, 28, 23-32., 1970) and two astrophysical papers from 1976 cited in Press et al. (Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, NC1, 1986, Sec. 14.5). Such direct simulation approach were a natural response to the increased availability of inexpensive computing. However, they were supported more by intuition than by logical analysis and appeared mostly in papers published outside the main­stream statistical journals.
• Third, another response to the problems of data analysis was the theory of robust estimation. By the late 1970s, theoretical workers in robust statistics and nonparametrics were familiar with estimates as statistical functionals­that is, as functions of the empirical cdf. The first derivatives of statistical functionals were being calculated to assess their robustness and to find their asymptotic distributions [see Huber (1981, Chap. 1)]. This background prepared the way for subsequent interpretation and analysis of a bootstrap distribution as a statistical functional.
• Fourth, the probability theory of weak convergence had continued to develop through the 1970s. Studying optimality or robustness in large samples of various parametric and non parametric procedures had required statisticians to use weak convergence results for triangular arrays [see the references in Ibragimov and Has'minskii (1981) and LeCam (1986)]. Studying second-order asymptotic efficiency had brought new life and insight to the theory of Edgeworth expansions [refer to Bickel (1974) and Bhattacharya and Ghosh (1978)]. These statistically motivated developments in probability theory became crucial tools in the analysis of bootstrap procedures that started immediately after the publication of Efron's paper.

2. Order Statistics: finite sample property and large sample approximation  (updated 9/30/04)

3. Asymptotic analysis on linear statistics with independent components : (updated 10/18/04)

• Mode of Convergence: Sections 1 and 3

• in distribution, almost surely, in probability, in mean

• Remarks on measure and integration: Section 2

• Moment Condition: Section 4

• Convergence in Distribution: Sections 5 and 6

• Method and Variance-Stabilizing Transformations: Section 7

(updated 11/02/04)

• Information Bound: Section 2

• Maximum Likelihood: Section 3

• Asymptotic Analysis

• Generalized Method of Moments: Section 4  Prof. Kuan, Chung-Ming's Lecture Note on GMM (Discuss more on the connection with econometrics.)

• EM Algorithm: Section 5

5. Large Sample Tests and Confidence Regions: (updated ?/?/03)

## 成績評量方式

• 習題（70%每 三週指定一次每次約10題）
• 期中考（20%
• 課堂討論(10%)

## Homework Assignment

• Homework 0: Due date:  9/28/03

Please send me an email message, at hchen@math.ntu.edu.tw, with the words ":Large Sample Theory" in the subject line.   I'd like to get two things from your email:

1. I'd like to make up an email mailing list so I can get in touch with you easily, and
2. I'd like to find out something about you, such as your major, what year you are in, any special interests you might have, what you hope to get out of the class, and that sort of thing.
• Homework 1: Due date: 10/13/04   Solution
• Homework 2: Due date: 10/27/04   Solution
• Homework 3: Due date: 11/10/04   Solution
• Homework 4: Due date: 12/19/03  Solution
• Homework 5: Due date: 1/09/04   Solution

## Programming Exercises

• Homework should be turned in on time. No late homework will be accepted without legitimate reasons.
• There will be 1  programming assignments. You are expected to write your own codes and turn in your source code. Do not copy. Never ask your friends to write programs for you.
1. Project 1. Bootstap method (Read the note and there are five assignments in total.)  This is an individual project but you can discuss with your classmates. Due date: 11/28/02 Answer

### Software

• R programming language
• 可由網路取得at

Look for Precompiled Binary Distributions Windows (95 and later)

• #### Tutorial by Venables and Ripley

• An Introduction to R: It gives an introduction to the language and how to use R for doing statistical analysis and graphics.

• A draft of the R language definition: Document the language per se. They are useful to know when programming R functions.

• Writing R Extensions : It covers how to create your own packages, write R help files, and the foreign language (C, C++, Fortran, ...) interfaces.

• R Data Import/Export : Describe the import and export facilities available either in R itself or via packages which are available from CRAN.

• The R Reference Index : It contains all help files of the R standard packages in printable form.

• Quick look up: Rcard, Rguide.

• 本課程偏重於大樣本理論但並不要求學生有太多的預備知識不過仍假設學生知道或使用過大數法則中央極限定理及各種機率上之收斂或逼近方式。 為使得學習上不至於過分枯躁嘗試使用問題導向的方式來進行

舉例來說查機率分配表幾乎是統計分析中不可或少的一個步驟而該查那個表或是該造那個表就是個非常重要的問題傳統上與常態分配相關之機率分配表已經多被造如卡方分配或t分配。但當產生數據背後的分配並非常態分配時是否仍可使用依據常態分配所造出的表就是個急待解決的問題。

當回答這個問題時基本上有兩種解決的辦法第一個辦法是探討當分配並非常態分配時使用依據常態分配導出的結果是否已到達不可接受的程度;而第三個辦法是採用大樣本理論中漸近或近似的辦法。

舉例來說中央極限定理的運用就是個大樣本理論的辦法。這個定理告訴了我們當分配並非常態分配時仍有相當多之情況下使用t分配的表去求取期望值之信賴區間;但如使用卡方分配的表去求取變異數之信賴區間卻是一個相當不可靠的辦法。有關細節請參看講義第三章。

當然這兩個辦法都有其限制就大樣本理論而言確實常能夠對一個問題可很快的得出個初步之瞭解。但因其是個粗糙的工具不免有所缺失常無法得出精緻的答案。而用電腦模擬的辦法常無法確認該如何去模擬。

常因著只迷信一種解決問題的辦法造成彼此間之相互攻詰不過辦法的限制攻詰並非解決之道而該解決的科學問題仍待解決。而近二十年來最受囑目的代替辦法 Efron(1979) 所提出之自助法 (Bootstrap Method) 。故於講義第一章中先描述這個辦法並試著去分析及瞭解這個辦法。當決定使用自助法之後第一個面臨的問題是該如何去執? 最常使用的一個辦法是用蒙地卡羅法(Monte Carlo Method)給出一個近似之答案細節請參看講義第一章。當然這馬上衍生出許多新的問題如近似誤差與蒙地卡羅法計算量大小之關係為何 ? 該法於計算執行上之限制如何 ?

 當決定使用自助法之後，第二個面臨的問題，是自助法是否真能給出欲近似之未知分配的一個好的近似? 當然自助法並非一萬靈丹，故於該章中，先就一特例，說明或證明為何自助法可給一合理的答案。接下來，再用例子說明為何自助法會給出錯誤的答案。 為了證明白助法可給出一合理的答案，首先陳述了中央極限定理及陣列式 (double array) 之中央極限定理，而第一個證明的辦法，是使用了大樣本理論中漸近的辦法。就自助法而言，樣本數增加時，其產生數據背後之母群體會隨之變動，而一般中央極限定理的基本假設，是產生數據背後之母群體，並不會隨樣本數增加而改變，所以不可使用一般之中央極限定理，來說明為何自助法可被使用之類的問題。故引進較複雜陣列式之中央極限定理，來處理產生數據背後之母群體會隨之變動的問題。 第二個證明辦法，是使用了大樣本理論中近似的辦法，此時需引進 Berry-Esseen 定理。該定理對每個樣本數， 給出中央極限 定理中出現之常態分配，與待近似的未知真實分配問之近似誤差，這就逃掉了自助法背後 的母群體，會隨樣本數變動的困難。 當說明了自助法，在某些狀況下，是可能給出一合理的答案後。接下來於第一章第三節中，說明白助法 為何並非一萬靈丹，可用來解決所有的困難。而給出某些狀況下，自助法確 實不可能給出一合理的答案， 講義內容取材自 Shao(1994) 的文章。 講義第二章中，探討了順序統計量，而其重點有三。第一個重點是中值估計之分配，第二個重點是經驗 分配函數及密度函數，第三個重點是有關極值之分配問題。 對於中值估計之分配，我們是使用了大樣本 近似，得出了漸近分配，而用其作為未知分配之一個近似。考慮中值估計之分配，是有幾個原因的。當然 中值是個極重要的統計量，其最主要的原因，是其並非一線性統計量。 線性統計量之大樣本理論，通常並不是太困難去處理，如隨機變數間為相互獨立，藉著計算期望值及變異數， 使用柴比雪夫不等式，就可將機率及動差聞之關係找出來，此大致可告訴我們該線性統計量之大略表現。 如平均值就是一個線性統計量，大數法則告訴我們，平均值與未知期望值聞之差異並非太大，而大數法則 之證明，就是用上述這一套辦法。 如要能更精確的描述平均值與未知期望值間之差異，我們可藉助於中央極限定理，說明兩者問之差異， 並用常態分配來量化差異。如果我們熟悉特徵函數或動差生成函數，再加上微積分中之泰勒展式及展式誤差 之大小，對於線性統計量之漸近分配，通常亦不難求得。而中央極限定理之證明，是可用這一套辦法的。 於第三章中，對前述這兩套辦法，都加以使用。用以說明為何當分配並非常態分配時，使用卡方分配表 求取變異數之信賴區間是冒相當大的風險，而用t分配求取期望值之信賴區問是頗安全的。對第二套辦法， 我們用它來導出列聯表中之 Pearson Goodness of Fit 統計量之漸近分配。 而中值估計並非一線性統計量，故前述辦法並不可行，我們採用兩套辦法來解決，第一個辦法是先討論 均勻分配下，中值估計之表現，此時中值為一貝他分配，我們不難得出其漸近分配，但問題是當一般 分配下，中值估計之漸近分配為何 ? 這時我們引進了d-Method。 第二個辦法是將問題經過轉換，此時可用中央極限定理及 Berry--Esseen 定理所給出中央極限定理之近似誤差，得出中值估計之漸近分配。 Q-Q plot 是一常用圖形的辦法，來判定機率模型中，有關分配之假設是否合理 ? 當中值估計之漸近分配談清楚後，有關百分點估計之漸近分配也就清楚了。但極小或極大之百分點估計之行為卻不太相同，因篇幅關係，本章中僅討論極值之分配問題，並討論極值之相關應用。 與 Q-Q plot 辦演類同角色的是經驗分配函數，因經驗分配函數取值於 O 與 1 之間，故並無 Q-Q plot 中極小或極大之百分點難以估計的問題。但本章中並末詳細討論其漸近分配之問題，反倒是較詳細討論密度函數的估計問題，主因是無母數迴歸或密度函數的估計，於近十年中頗受注目，而在近年數據分析工具中，扮演重要角色，故本章中討論使用核估計辦法，所得之密度函數估計的大樣本行為。 第三章主要在討論大樣本理論中，常用的機率語言及工具，包含了不同的收斂行為。 最大概似估計被認為是一個好的估計辦法，在第四章中，主要想去討論，為何它是一個好的估計辦法。首先得定出一個判斷準則故探討了 Information Inequality 這個不等式告訴了，參數估計變異數的一個下界。接下來主要在討論最大概似估計法得出之參數估計，離這個下界究竟有多遠的問題 o 很不幸的是這是個大難題，為了避免這個難處，只好用大樣本的辦法加以探討。 一般而言，最大概似估計法得出之參數估計，並非一線性統計量，所以得用個新辦法來解決 o 通常解決非線性問題的辦法，是將其轉換為一線性問題來解決。舉例來說，一般方程式的解並不易求得，但我們會解線性方程式的解，所以就設計了牛頓法求取一般方程式的解，而其精神乃在於一個平滑的函數，局部可被線性函數逼近的好。如果此非線性方程式，於其解的附近，確能被一線性方程式逼近的好，加上連續的 要求，牛頓法就可得出一好的近似解。 故處理最大概似估計法得出之參數估計之大樣本理論，就是沿著這個思路走，證明的辦法，是取自於 Lehmann(1983)，這個證明僅說明，真實參數值之附近，必定有一個最大概似估計法得出之參數估計。當最大概似估計法得出之參數估計只有一個的時候，這就說明最大概似估計法得出之參數估計會接近真實參數值。但當最大概似估計法得出之參數估計不只有一個的時候，這時得面臨該取那個解的困難。該如何來擺脫這個困難，會在授課時約略說明。 非線性迴歸模型中，使用最小平方法得出之參數估計，是否接近於未知之真實參數值，其面臨的困難，類同於最大概似估計法，因得出之參數估計並非一線性統計量。但線性迴歸模型中，使用最小平方法得出之參數估計，確為一線性統計量，而其使用最小平方法得出之參數估計的性質是相當清楚的。所以上述處理最大概似估計法所得參數估計的這套辦法，也可用來處理使用最小平方法得出之非線性迴歸模型中參數估計，是否接近於未知之真實參數值的問題。 因著收集數據之種種限制，及考慮設定模型錯誤所造成之影響，一般的辦法是設定一個較大或較具彈性的模型，所以模型中之參數可概分為二類，一類是我們有興趣的參數，一類 是我們為設定模型所加入的參數。 從估計或研究的觀點，我們較關心這些有興趣的參數。這就面臨下述幾個問題 : 一、因數據是有限的，過多參數的加入，對於我們有興趣參數的估計造成的影響為何 ?   二、是否可能不付代價 ? 三、如需付代價，該如何來量化它 ?  四、當能將問題二及三弄清楚後，這就對該建立具何種彈性的模型，有較清楚之認識。且能量化收集數據有所限制時，所付之代價為何 ?   在本章最後的部分，試著討論前三個問題，這就回到了 adaptiveness 及 efficiency 這兩個觀點的探討。 於統計分析中，一重要且極困難的問題，是該如何去找出一合理的『近似』模型。一般而言，這並沒有系統化的方式來進行。但當模型設定後，卻是有一套辦法，可用來檢定所設定模型是否正確的問題，如線性迴歸模型中之殘差分析等。 第四章中並末去探討這個問題，但在第一節中，約略說明了模型穩健性 (robustness) 的一些看法。第五章主要討論一些大樣本的假說檢定，如 Likelihood Ratio Test, S core T e st 及 Wald T est.