Bio Statistics

Survival Analysis in Biostatistics: Concepts, Methods, and Applications

R Studio

Ecological Diversity Analysis Across Five Sites Using R

Data Analysis

Time Series Regression Analysis in Biostatistics: Evaluating PM2.5, Temperature, and Intervention Effects on Asthma Cases

Data Analysis

Interpretation of Time Series Analysis of Frog Population Data in R

R Studio

How to Calculate Correlation Coefficient (r) and Create a Scatter Plot in R Studio

Confirmatory Factor Analysis (CFA) in Biostatistics: A Complete Guide

byDr. Mohan Arthanari •May 03, 2025

0

What is Confirmatory Factor Analysis (CFA)?

Confirmatory Factor Analysis (CFA) is a statistical technique used to test the hypothesis that the relationships between observed variables and their underlying latent constructs are consistent with a prior theoretical model. Unlike Exploratory Factor Analysis (EFA), CFA is theory-driven and is often used in the validation of psychometric instruments in the biomedical and health sciences.

In biostatistics, CFA helps assess measurement models—for example, validating whether a set of physiological or psychological tests accurately measures constructs like stress, quality of life, or metabolic syndromes.

Confirmatory Factor Analysis

Difference Between EFA and CFA

Criteria	Exploratory Factor Analysis (EFA)	Confirmatory Factor Analysis (CFA)
Purpose	Explore latent structures	Confirm pre-specified models
Hypothesis	Not required	Required
Factor structure	Not known	Predefined
Rotation	Often applied	Usually not used
Common use	Early stages of scale development	Instrument validation

Applications of CFA in Biostatistics

CFA is extensively used in the following areas:

Validating survey instruments: e.g., SF-36 for health-related quality of life.
Behavioral health studies: Testing latent constructs like anxiety, depression, or fatigue.
Nutritional research: Modeling dietary behavior factors.
Genomics and systems biology: Assessing clusters of co-expressed genes.
Epidemiological studies: Validating risk factor scores or composite indices.

Assumptions of CFA

Before running CFA, the following assumptions must be met:

Multivariate normality: All observed variables should be normally distributed.
Adequate sample size: Usually a minimum of 10 cases per estimated parameter.
Linearity: Relationships between latent and observed variables should be linear.
No severe multicollinearity: High correlations among indicators may distort results.
Model identification: The model should be over-identified (more knowns than unknowns).

Steps to Perform CFA

Model Specification: Define the number of factors and loadings based on theory.
Model Identification: Ensure that the model has a unique solution.
Model Estimation: Use software (e.g., R, Mplus, LISREL) to estimate parameters.
Model Evaluation: Check model fit using indices like RMSEA, CFI, and Chi-square.
Model Modification (if needed): Use modification indices to improve model fit.
Interpretation: Review factor loadings, correlations, and fit indices.

CFA Example with R Code

Here’s how to perform CFA in R using the lavaan package:

# Install and load package

install.packages("lavaan")

library(lavaan)

# Simulated data example

data <- HolzingerSwineford1939

# Specify CFA model

model <- '

visual =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed =~ x7 + x8 + x9

'

# Fit the model

fit <- cfa(model, data = data)

# Summary output with fit measures

summary(fit, fit.measures = TRUE, standardized = TRUE)

This code models three latent variables: visual, textual, and speed. Each is associated with three observed variables.

CFA Model Fit Indices Explained

Fit Index	Acceptable Threshold	Interpretation
Chi-square	p > 0.05	Good fit if non-significant (sensitive to sample size)
CFI	> 0.90 or 0.95	Comparative Fit Index, close to 1 is better
TLI	> 0.90	Tucker-Lewis Index
RMSEA	< 0.06	Root Mean Square Error of Approximation
SRMR	< 0.08	Standardized Root Mean Square Residual

Always assess multiple indices for a comprehensive evaluation of the model.

Common Software Used in CFA

Biostatisticians use various tools for CFA:

R (lavaan, semPlot, psych) – Free, flexible, powerful.
SPSS AMOS – User-friendly, visual modeling.
Mplus – Advanced modeling for complex designs.
LISREL – Traditional, widely cited in SEM literature.
Stata – Built-in commands for SEM and CFA.

Each software has strengths depending on the complexity of your model and dataset.

Conclusion

Confirmatory Factor Analysis is an essential statistical tool in biostatistics for validating theoretical constructs and measurement instruments. Whether you are working with survey instruments, clinical scales, or gene expression profiles, CFA allows you to test whether your observed variables reliably reflect the underlying latent structures.

As biostatistical research grows in complexity, CFA—often used within Structural Equation Modeling (SEM) frameworks—provides researchers with a robust method to ensure the validity and reliability of their findings. With accessible software like R and Stata, even researchers with modest statistical backgrounds can apply CFA to strengthen the scientific rigor of their studies.

Tags: Bio Statistics

Trending