Nonparametric Identification And Estimation Of Multivariate Mixtures

QED Working Paper Number
1153

We study nonparametric identifiability of finite mixture models of k-variate data with M subpopulations, in which the components of the data vector are independent conditional on belonging to a subpopulation. We provide a sufficient condition for nonparametrically identifying M subpopulations when k>=3. Our focus is on the relationship between the number of values the components of the data vector can take on, and the number of identifiable subpopulations. Intuition would suggest that if the data vector can take many different values, then combining information from these different values helps identification. Hall and Zhou (2003) show, however, when k=2, two-component finite mixture models are not nonparametrically identifiable regardless of the number of the values the data vector can take. When k>=3, there emerges a link between the variation in the data vector, and the number of identifiable subpopulations: the number of identifiable subpopulations increases as the data vector takes on additional (different) values. This points to the possibility of identifying many components even when k=3, if the data vector has a continuously distributed element. Our identification method is constructive, and leads to an estimation strategy. It is not as efficient as the MLE, but can be used as the initial value of the optimization algorithm in computing the MLE. We also provide a sufficient condition for identifying the number of nonparametrically identifiable components, and develop a method for statistically testing and consistently estimating the number of nonparametrically identifiable components. We extend these procedures to develop a test for the number of components in binomial mixtures.

Author(s)

Hiroyuki Kasahara
Katsumi Shimotsu

JEL Codes

Keywords

finite mixture
binomial mixture
model selection
number of components
rank estimation

Working Paper

Download [PDF] (264.02 KB)