Ordinal Logistic Regression (OLR) is a statistical technique used when the dependent variable is ordinal, meaning it has categories with a meaningful order but the intervals between the categories are not necessarily equal. OLR models the relationship between one or more independent variables (predictors) and an ordinal outcome variable.
Key Characteristics of Ordinal Logistic Regression
- Ordinal Dependent Variable: Examples include Blood Pressure responses (e.g., Normal, Pre-Hypertension, and Hypertension).
- Independent Variables: These can be continuous, categorical, or a mix of both.
![]() |
Ordinal Logistic Regression in R |
1. Load the necessary libraries:
# Install and load the required package
install.packages("MASS")
library(MASS)
2. Prepare your data:
For this example, let's assume the dataset is named patient_data. You'll need to ensure the 'Blood_Pressure' column is treated as an ordered factor.
# Sample dataset
patient_data <- data.frame(
Patient_ID = 1:30,
Age = c(60, 45, 50, 70, 65, 40, 37, 55, 30, 43, 58, 63, 35, 43, 53, 67, 41, 72, 54, 48, 51, 32, 62, 49, 51, 41, 68, 59, 45, 65),
Gender = factor(c('F', 'F', 'M', 'F', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'M', 'M', 'F', 'F', 'M', 'M', 'M', 'F', 'M', 'F')),
Blood_Pressure = factor(c('Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension', 'Normal', 'Pre-Hypertension', 'Hypertension'),
levels = c('Normal', 'Pre-Hypertension', 'Hypertension'), ordered = TRUE)
)
# Convert 'Blood_Pressure' to an ordered factor
patient_data$Blood_Pressure <- factor(patient_data$Blood_Pressure,
levels = c('Normal', 'Pre-Hypertension', 'Hypertension'),
ordered = TRUE)
3. Fit the Ordinal Logistic Regression Model:
Now you can fit the model using the polr() function from the MASS package.
# Fit the ordinal logistic regression model
model <- polr(Blood_Pressure ~ Age + Gender, data = patient_data, Hess = TRUE)
# Display the summary of the model
summary(model)
4. Interpret the results:
To interpret the coefficients and test the significance of the predictors, you can check the output from the summary(model) function. The coefficients indicate how each predictor affects the odds of being in a higher category of blood pressure (i.e., moving from Normal to Pre-Hypertension or Hypertension).
5. Check model significance:
To get a more formal test of model significance, you can compute the p-values using the coef() and summary() functions:
# Compute p-values
z <- coef(model) / sqrt(diag(vcov(model)))
p_values <- 2 * (1 - pnorm(abs(z)))
p_values
This will give you the p-values for each predictor. A p-value less than 0.05 typically indicates that the predictor is statistically significant.
There are several ways to visualize the results of an ordinal logistic regression in R. Here are some common methods for plotting the results of an ordinal logistic regression:
1. Predictive Probabilities Plot
You can plot the predicted probabilities of each level of the outcome variable (in your case, "Blood_Pressure") for different values of a predictor (e.g., Age or Gender). Here's how to create such a plot:
# Create new data for prediction
new_data <- expand.grid(Age = seq(30, 80, by = 1),
Gender = factor(c("F", "M"), levels = c("F", "M")))
# Predict the probabilities
pred_probs <- predict(model, new_data, type = "probs")
# Plot the probabilities for each blood pressure category
library(ggplot2)
# Convert to a data frame for ggplot
plot_data <- data.frame(Age = rep(new_data$Age, 2),
Gender = rep(new_data$Gender, each = length(new_data$Age)),
Normal = pred_probs[, 1],
Pre_Hypertension = pred_probs[, 2],
Hypertension = pred_probs[, 3])
# Plot
ggplot(plot_data, aes(x = Age)) +
geom_line(aes(y = Normal, color = "Normal"), size = 1) +
geom_line(aes(y = Pre_Hypertension, color = "Pre-Hypertension"), size = 1) +
geom_line(aes(y = Hypertension, color = "Hypertension"), size = 1) +
facet_wrap(~ Gender) +
labs(title = "Predicted Probabilities of Blood Pressure Categories",
x = "Age", y = "Predicted Probability") +
scale_color_manual(values = c("Normal" = "blue", "Pre-Hypertension" = "orange", "Hypertension" = "red")) +
theme_minimal()
This plot will show the predicted probabilities of each blood pressure category (Normal, Pre-Hypertension, and Hypertension) for different values of Age, with separate lines for each gender.
2. Coefficient Plot
To visualize the relationship between the predictors and the outcome, you can create a plot of the model coefficients.
# Coefficient plot
coef_data <- as.data.frame(coef(model))
coef_data$Variable <- rownames(coef_data)
# Plot the coefficients
ggplot(coef_data, aes(x = Variable, y = V1)) +
geom_bar(stat = "identity", fill = "lightblue") +
coord_flip() +
labs(title = "Model Coefficients for Ordinal Logistic Regression",
x = "Variable", y = "Coefficient Estimate") +
theme_minimal()
This bar chart will show the estimated coefficients for each predictor in the ordinal logistic regression model.
3. Predicted vs. Observed Plot
This type of plot helps you compare the observed and predicted categories. You can check how well the model performs in predicting the actual categories.
# Predict the categories
predicted_classes <- predict(model, type = "class")
# Create confusion matrix
conf_matrix <- table(Observed = patient_data$Blood_Pressure, Predicted = predicted_classes)
# Plot confusion matrix
library(caret)
confusionMatrix(conf_matrix)
This provides a table of predicted vs. observed values, and confusionMatrix() will give you additional details like accuracy, sensitivity, and specificity.
4. Plot of Predicted Probabilities for Specific Groups
You can also plot the predicted probabilities for specific groups in your data (e.g., females vs males or different age groups).
# Predict probabilities for females only
female_data <- subset(patient_data, Gender == "F")
pred_female <- predict(model, female_data, type = "probs")
# Combine predicted probabilities with the original data
female_data$Normal_Prob <- pred_female[,1]
female_data$Pre_Hypertension_Prob <- pred_female[,2]
female_data$Hypertension_Prob <- pred_female[,3]
# Plot probabilities
ggplot(female_data, aes(x = Age)) +
geom_line(aes(y = Normal_Prob, color = "Normal"), size = 1) +
geom_line(aes(y = Pre_Hypertension_Prob, color = "Pre-Hypertension"), size = 1) +
geom_line(aes(y = Hypertension_Prob, color = "Hypertension"), size = 1) +
labs(title = "Predicted Probabilities for Females",
x = "Age", y = "Predicted Probability") +
scale_color_manual(values = c("Normal" = "blue", "Pre-Hypertension" = "orange", "Hypertension" = "red")) +
theme_minimal()
5. Effect of a Predictor (Age) on Blood Pressure Categories
To visualize how age affects the probability of being in each blood pressure category, you can plot the effect of age using the effects package.
# Install the 'effects' package if not already installed
install.packages("effects")
library(effects)
# Plot the effect of Age on Blood Pressure
effect_plot <- effect("Age", model)
plot(effect_plot, main = "Effect of Age on Blood Pressure Categories")
This will plot the effect of age on the odds of being in the different blood pressure categories.
Conclusion:
Ordinal Logistic Regression is a powerful statistical technique for analyzing ordinal outcome variables, providing meaningful insights into the relationship between predictors and ordered categories. By following this guide, you can effectively implement OLR in R, interpret the results, and visualize key findings through predictive probability plots, coefficient charts, and observed vs. predicted comparisons. Whether analyzing health data like blood pressure levels or other ordinal outcomes, OLR equips researchers with the tools to make data-driven decisions and enhance their understanding of complex relationships.