Inference With Large Clustered Datasets

QED Working Paper Number
1365

Inference using large datasets is not nearly as straightforward as conventional econometric theory suggests when the disturbances are clustered, even with very small intra-cluster correlations. The information contained in such a dataset grows much more slowly with the sample size than it would if the observations were independent. Moreover, inferences become increasingly unreliable as the dataset gets larger. These assertions are based on an extensive series of estimations undertaken using a large dataset taken from the U.S. Current Population Survey.

Author(s)

JEL Codes

Keywords

placebo laws
cluster-robust inference
earnings equation
wild cluster bootstrap
CPS data
sample size

Working Paper

Download [PDF] (496.73 KB)