Confirmatory Factor Analysis (CFA) in Biostatistics: A Complete Guide

What is Confirmatory Factor Analysis (CFA)?

Confirmatory Factor Analysis (CFA) is a statistical technique used to test the hypothesis that the relationships between observed variables and their underlying latent constructs are consistent with a prior theoretical model. Unlike Exploratory Factor Analysis (EFA), CFA is theory-driven and is often used in the validation of psychometric instruments in the biomedical and health sciences.

In biostatistics, CFA helps assess measurement models—for example, validating whether a set of physiological or psychological tests accurately measures constructs like stress, quality of life, or metabolic syndromes.

Confirmatory Factor Analysis

Difference Between EFA and CFA

Criteria Exploratory Factor Analysis (EFA) Confirmatory Factor Analysis (CFA)
Purpose Explore latent structures Confirm pre-specified models
Hypothesis Not required Required
Factor structure Not known Predefined
Rotation Often applied Usually not used
Common use Early stages of scale development Instrument validation

Applications of CFA in Biostatistics

CFA is extensively used in the following areas:

  • Validating survey instruments: e.g., SF-36 for health-related quality of life.
  • Behavioral health studies: Testing latent constructs like anxiety, depression, or fatigue.
  • Nutritional research: Modeling dietary behavior factors.
  • Genomics and systems biology: Assessing clusters of co-expressed genes.
  • Epidemiological studies: Validating risk factor scores or composite indices.

Assumptions of CFA

Before running CFA, the following assumptions must be met:

  • Multivariate normality: All observed variables should be normally distributed.
  • Adequate sample size: Usually a minimum of 10 cases per estimated parameter.
  • Linearity: Relationships between latent and observed variables should be linear.
  • No severe multicollinearity: High correlations among indicators may distort results.
  • Model identification: The model should be over-identified (more knowns than unknowns).

Steps to Perform CFA

  1. Model Specification: Define the number of factors and loadings based on theory.
  2. Model Identification: Ensure that the model has a unique solution.
  3. Model Estimation: Use software (e.g., R, Mplus, LISREL) to estimate parameters.
  4. Model Evaluation: Check model fit using indices like RMSEA, CFI, and Chi-square.
  5. Model Modification (if needed): Use modification indices to improve model fit.
  6. Interpretation: Review factor loadings, correlations, and fit indices.

CFA Example with R Code

Here’s how to perform CFA in R using the lavaan package:

# Install and load package
install.packages("lavaan")
library(lavaan)

# Simulated data example
data <- HolzingerSwineford1939

# Specify CFA model
model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

# Fit the model
fit <- cfa(model, data = data)

# Summary output with fit measures
summary(fit, fit.measures = TRUE, standardized = TRUE)
This code models three latent variables: visual, textual, and speed. Each is associated with three observed variables.

CFA Model Fit Indices Explained

Fit Index Acceptable Threshold Interpretation
Chi-square p > 0.05 Good fit if non-significant (sensitive to sample size)
CFI > 0.90 or 0.95 Comparative Fit Index, close to 1 is better
TLI > 0.90 Tucker-Lewis Index
RMSEA < 0.06 Root Mean Square Error of Approximation
SRMR < 0.08 Standardized Root Mean Square Residual
Always assess multiple indices for a comprehensive evaluation of the model.

Common Software Used in CFA

Biostatisticians use various tools for CFA:

  • R (lavaan, semPlot, psych) – Free, flexible, powerful.
  • SPSS AMOS – User-friendly, visual modeling.
  • Mplus – Advanced modeling for complex designs.
  • LISREL – Traditional, widely cited in SEM literature.
  • Stata – Built-in commands for SEM and CFA.
Each software has strengths depending on the complexity of your model and dataset.

Conclusion

Confirmatory Factor Analysis is an essential statistical tool in biostatistics for validating theoretical constructs and measurement instruments. Whether you are working with survey instruments, clinical scales, or gene expression profiles, CFA allows you to test whether your observed variables reliably reflect the underlying latent structures.

As biostatistical research grows in complexity, CFA—often used within Structural Equation Modeling (SEM) frameworks—provides researchers with a robust method to ensure the validity and reliability of their findings. With accessible software like R and Stata, even researchers with modest statistical backgrounds can apply CFA to strengthen the scientific rigor of their studies.

Post a Comment

Previous Post Next Post