Chen, Hung
Research Topic 1:

Locating Maximum of a Nonlinear Regression Surface

Chu, Shu-Jane; Huang, Wen-Jang; Chen, Hung
A study of asymptotic distributions of concomitants of certain order statistics.
Statist. Sinica 9 (1999), no. 3, 811--830.              (download pdf file)


Chen, Hung; Huang, Mong-Na Lo; Huang, Wen-Jang
Estimation of the location of the maximum of a regression function using extreme 

order statistics.
J. Multivariate Anal. 57 (1996), no. 2, 191--214.   (download pdf file)

Summary: "We consider the problem of approximating the location, x0 Î C, of a maximum 

of a regression function q(x) under certain weak assumptions on q.  Here C is a bounded 

interval in R.  A specific algorithm considered in this paper is as follows.  Taking a random 

sample X1 ,..., Xn  from a distribution over C, we have (Xi,Yi), where Yi is the outcome of 

a noisy measurement of  q(Xi).  Arrange the Yi's in nondecreasing order and take the average 

of the r Xi's which are associated with the r largest order statistics of Yi.  This average, \hat{x0}, 

is then used as an estimate of x0.  The utility of such an algorithm with fixed r is evaluated 

in this paper.  To be specific, the convergence rates of \hat{x0} to x0 are derived.  Those rates 

will depend on the right tail of the noise distribution and the shape of q(·) near x0."


Chen, Hung
Lower rate of convergence for locating a maximum of a function.
Ann. Statist. 16 (1988), no. 3, 1330--1334.  (download pdf file)

p>1 is an odd number, F is a class of functions on [-1,1], containing a sufficiently rich 

subclass of functions with |f(p)| 1; for f  Î F, let c(f ) be a point of global maximum
of f .  {g(·,x,t): x Î [-1,1],  t Î R} is a family of probability densities whose second-order
derivatives with respect to t satisfy some boundedness conditions.  A design associates 

with each in F two sequences {Xn} and {Yn} of random variables such that the conditional
distribution of X given the past does not depend on f  and that of Yis given by g(·,Xn,f (Xn)).
As estimates of c(f ), sequences {Tn} are considered with each Ta function of Xn and Yn.
The following result is proved: For every h Î (0,1) there is a c>0 such that, for all n and 

every design and estimate {Tn},
                           inff P(| Tn - c(f)| cn-(p-1)/(2p)) >=  h.

Research Topic 2: curreent research interest
Incomplete Covariate Regression


Chen, Yi-Hau; Chen, Hung
Incomplete covariates data in generalized linear models.
J. Statist. Plann. Inference 79 (1999), no. 2, 247--258.  (download pdf file)


Chen, Yi-Hau; Chen, Hung
A unified approach to regression analysis under double sampling design. 
J. Royal Statist. Society Ser. B 62 (2000),  449--460.  ( download PostScript file)


Chen, Hung; Tseng, Chien-Cheng
A study on conditional mean imputation method for missing covariate in linear regression models.
manuscript (2000).   ( download PostScript file)

Research Topic 3:
Semiparametric Regression Models

Chen, Hung and Co-authors
Term Structure of Continuous-Time Interest Rates.
A Very Preliminary Manuscript   (download PostScript file for co-authors)


Chen, Hung
Asymptotically efficient estimation in semiparametric generalized linear models.
Ann. Statist. 23 (1995), no. 4, 1102--1129.   (download pdf file)

Summary: "We use the method of maximum likelihood and regression splines to derive
estimates of the parametric and nonparametric components of semiparametric generalized
linear models.  The resulting estimators of both components are shown to be consistent.
Also, the asymptotic theory For the estimator of the parametric component is derived,
indicating that the parametric component can be estimated efficiently without
undersmoothing the nonparametric component."

Chen, Hung; Shiau, Jyh Jen Horng
Data-driven efficient estimators for a partially linear model.
Ann. Statist. 22 (1994), no. 1, 211--237.  (download pdf file)

The authors showed [J. Statist. Plann. Inference 27 (1991), no. 2, 187--201] that a two-stage
spline smoothing method and the partial regression method lead to efficient estimators for
the parametric component of a partially linear model when the smoothing parameter tends to 

zero at an appropriate rate. In this paper, they study the asymptotic behavior of these estimators
when the smoothing parameter is chosen either by the generalized cross validation (GCV) 

method or by the Mallows CL criterion. Under some regularity conditions, the estimated 

parametric component is asymptotically normal with the usual parametric rate of convergence 

for both spline estimation methods.


Chen, Hung; Chen, Keh-Wei
Selection of the splined variables and convergence rates in a partial spline model.
Canad. J. Statist. 19 (1991), no. 3, 323--339.  (download pdf file)

This paper belongs to a relatively new stream of works concerning inference in semiparametric
models, i.e., observations are made according to the scheme
(i) Y= m(X) + e, where Y Î R is the dependent variable, X Î Rd is the independent (vector)
     variable, e is an unobservable noise with mean 0 and finite variance.   A data set 

    {(yi,xli,...,xdi),  1 i n} is then used to determine the unknown regression function m(·).     

    The authors assume the following semiparametric model for m(·):
(ii) m(X)=WT·b+ q(Z), where X=(Xl,...,Xd)T is partitioned as X=(WT,ZT)T.
In (ii), q(·) denotes an unknown smooth function, while b is the vector of unknown
constant parameters.  In the general framework of (ii), the authors consider the problem of
estimating not only q and b , but the statistician also aims to decide how to partition the vector X
into subvectors W and Z.  More precisely, let A be a subset of {1,2,...,d} and let XA and ZA
denote column vectors with those Xi, i Î A, and Xi, i Ï A, respectively. Now, the problem
is to "recover" the proper subset of A, the vector bA and the function q(ZA)  in the model
(iii) Y=WATbA+ q(ZA)+ e.  The estimation procedure proposed in the paper can be summarized
as follows:
(1) For a given index set A, a tensor product polynomial spline with degree n and with Kn|A| knots
     is used to approximate q(ZA), where |A| denotes the cardinality of A.
(2) The method of least squares (LSM) is used to fit the function WAT·bA + q(ZA) to the data.
     Note that, when q(ZA) is approximated as in Step 1, LSM involves d- |A| +(Kn + n)|A|
(3) The structural parameters (n, Kn) are determined through an adjusted sum of squares,
     which is derived from the principle of unbiased risk estimate (the FPE criterion proposed
     by Akaike is used at this step).
(4) The index set is found as the set A for which the adjusted residual sums of squares
      is a minimum.
Under suitable regularity conditions, which are too technical to be described here in detail,
the following results are proved in the paper:
(a) The estimator obtained by applying the above steps attains the optimal convergence rate
      in the sense of Stone.
(b) If A0 denotes the correct index set, which selects independent variables in (iii), while
     is the corresponding set obtained by the proposed method, then
     P{ =A0} to 1 as n tends to infinity.

Chen, Hung; Shiau, Jyh-Jen Horng
A two-stage spline smoothing method for partially linear models.
J. Statist. Plann. Inference 27 (1991), no. 2, 187--201.  (download pdf file)

Summary: "Rice (1986) showed that the partial spline estimate of the parametric component
in a semiparametric regression model is generally biased and it is necessary to undersmooth
the nonparametric component to force the bias to be negligible with respect to the standard 

error.  We propose a two-stage spline smoothing method for estimating the parametric and nonparametric components in a semiparametric model. By appropriately choosing rates for 

the smoothing parameters, we show that the parametric component can be estimated at the parametric rate with the new estimate without undersmoothing the nonparametric component. 

We also show that the same result holds for the partial regression estimate proposed independently by Denby (1986) and Speckman (1988).
Asymptotic normality results for the parametric component are also shown for both estimates.
Furthermore, we associate these estimates with Wellner's (1986) efficient scores methods."

Chen, Hung
Convergence rates for parametric components in a partly linear model.
Ann. Statist. 16 (1988), no. 1, 136--146.   (download pdf file)

A regression model with random explanatory variables which is partly nonlinear is 

considered.  Let y be the real response function, x be a k-dimensional random variable, and 

t be a one-dimensional random variable. Then y = x'b+ g(t)+e, where b is an unknown
k-dimensional parameter vector, g is an unknown function from a given class of real
smooth functions, and e is an unobservable random error having mean zero and variance 

s2.  The aim is to estimate b and g on the basis of data (yi, xi, ti), i=1,...,n.
Least squares estimation is considered, where piecewise polynomials ĝ are used to
estimate g.  Under some assumptions on the degree of smoothness of g, on the distribution 

of t, and on the conditional distribution of x given t, the author studies the asymptotic 

behaviour of the least squares estimators. One of the results establishes convergence in 

distribution of n1/2(\hat{b} - b) to the normal distribution with mean zero and covariance 

matrix s2 Sd-1, where S is the difference of covariance(x) and covariance of E(x| t).

Research Topic 4:

   Curve Fitting

Chen, Hung
Polynomial splines and nonparametric regression.
J. Nonparametr. Statist. 1 (1991), no. 1-2, 143--156.   (download pdf file)

Summary: "Let (X,Y) Î [0,1]d R be a random vector and let the conditional
distribution of Y given  X= x have mean q( x) and satisfy a suitable moment condition.
It is assumed that the density function of  X  is bounded away from zero and infinity on 

[0,1]d.  Suppose that q(x) is known to be a general d-dimensional smooth function of x only.
Consider an estimator of q having the form of a polynomial spline with simple knots at
equally spaced grids over [0,1]d, where the coefficients are determined by the method of 

least squares based on a random sample of size n from the distribution of (X,Y).  It is shown 

that this estimator achieves the optimal rates of convergence for nonparametric regression estimation as defined by C. J. Stone [Ann. Statist. 10 (1982), no. 4, 1040--1053] under L2 norm 

and sup norm, respectively."


Chen, Hung
Estimation of a projection-pursuit type regression model.
Ann. Statist. 19 (1991), no. 1, 142--157.   (download pdf file)

Summary: "Since the pioneering work of Friedman and Stuetzle in 1981, projection-pursuit
algorithms have attracted increasing attention.  This is mainly due to their potential for 

overcoming or reducing difficulties arising in nonparametric regression models associated with 

the so-called curse of dimensionality---that is, the amount of data required to avoid an 

unacceptably large variance increasing rapidly with dimensionality.   Subsequent work has, however, uncovered a dependence on dimensionality for projection-pursuit regression models.   Here we propose a projection-pursuit-type estimation scheme, with two additional constraints imposed, for which the rate of convergence of the estimator is shown to be independent of the dimensionality.  Let ( X,Y) be a random vector such that  X = (X1,...,Xd)T  ranges over Rd.  The conditional mean of Y given Xx is assumed to be the sum of no more than d general smooth functions of biT x, where bi Î Sd-1, the unit sphere in  Rd centered at the origin.   A least-squares polynomial spline and the final prediction error criterion are used to fit the model to a random sample of size n from the distribution of (X,Y).   Under appropriate conditions, the rate of convergence of the proposed estimator is independent of d."