Variance components estimation for continuous and discrete data, with emphasis on cross-classified sampling designs

Gray BR
In: Gitzen, R.A., J.J. Millspaugh, A.B. Cooper, D.S. Licht, editors. Design and analysis of long-term ecological monitoring studies. Cambridge University Press., ISBN: 9780521139298, 200-227

SUMMARY. Variance components may be estimated for scientific, management and study planning purposes. Scientific purposes include, for example, whether nitrate concentrations in lakes vary more among than within lakes, and whether either might be associated with putatively causal agents (e.g., agricultural runoff). Variance component estimates are used for study planning when, for example, an investigator wishes to select the number of groups (e.g., lakes) and the number of observations within each group for a future study. A major concern is that variance component estimators may be biased and yield imprecise estimates when the number of groups is small. This concern appears especially relevant for ecologists who, for logistic or cost reasons, may design studies with few sites and/or few years.
This chapter reviews the estimation of variance components and variance partition coefficients (VPCs) for continuous, categorical and count data that are clustered, and with emphases on studies with small sample sizes and crossed random effects. Variance components estimated from few groups using linear models of continuous outcomes may exhibit only modest bias when estimated using ANOVA or REML but may be substantially biased when estimated using FML. For all three estimation methods, however, precision is expected to be poor unless the number of groups is modest to large (e.g., more than 10, and possibly as many as 100).
Variance components estimated from GLMMs of categorical data may be expected to be both biased and imprecise when number of groups are few (e.g., <20 to as high as <100, depending on estimation method). The performance of Bayesian estimators of variance components from categorical data appears promising but has received relatively little attention in the literature.
GLMM estimators of variance components from count data have received less attention than have their categorical counterparts. Information supplied in this chapter suggests that, for cross-classified random effects models of count data, the Laplace estimator should be preferred over first-order quasilikelihood (QL) variance component estimators; the QL estimators suffered from poor convergence rates (PQL and RPQL) or substantial bias associated with (MQL). Readers interested in VC estimation from count data should also consider Markov chain Monte Carlo and, for fully nested models, adaptive Gaussian quadrature.
The estimation of VPCs has received relatively little attention in the ecological literature. This is particularly the case for VPCs from categorical and count data (for which methods appear to have first been published in 2002 and 2008, respectively; Goldstein et al. 2002, Stryhn et al. 2008). For these discrete outcomes, VPCs may be estimated on both measurement and modeling or link scales. A method for estimating VPCs for binary outcomes on the measurement scale from two-way cross-classified random effects designs is proposed in Appendix 2.

Number of levels
Model data structure
Response types
Multivariate response model?
Longitudinal data?
Further model keywords
Substantive discipline
Paper submitted by
Brian Gray, Upper Midwest Environmental Sciences Center, US Geological Survey,
Edit this page