Introduction to Point-Biserial Correlation
In biological and environmental sciences, we often need to measure the relationship between a binary categorical variable (e.g., presence/absence of contamination) and a continuous variable (e.g., fish growth rate).
Point-biserial correlation is a special case of the Pearson correlation used when one variable is dichotomous (binary: 0 or 1) and the other is continuous.
When to Use Point-Biserial Correlation?
- Measuring the effect of pollution (High/Low) on plant growth.
- Analyzing whether disease status (Present/Absent) affects blood pressure levels.
- Checking if gene mutation (Yes/No) correlates with enzyme activity.
In this tutorial, we will simulate a dataset in R, perform point-biserial correlation, visualize the results with graphs, and interpret the findings.
Step 1: Understanding the Point-Biserial Correlation Formula
The point-biserial correlation coefficient () is calculated as:
Where:
- , = Mean of the continuous variable for each group.
- = Standard deviation of the continuous variable.
- , = Sample size for each group.
- = Total sample size.
Now, let’s generate a dataset and compute point-biserial correlation in R.
Step 2: Generate a Simulated Dataset in R
We'll create a dataset where:
- Contamination (Binary: Low = 0, High = 1) affects
- Fish Growth Rate (Continuous: cm/month)
Step 3: Compute Point-Biserial Correlation in R
Now, we calculate point-biserial correlation using cor.test().
# Compute Point-Biserial Correlation
cor_test <- cor.test(fish_data$GrowthRate, as.numeric(fish_data$Contamination), method = "pearson")
# Print correlation results
print(cor_test)
Step 4: Visualizing the Results
Boxplot: Fish Growth vs. Contamination Level
![]() |
Fish Growth vs. Contamination Level |
Scatter Plot with Regression Line
![]() |
Scatter Plot with Regression Line |
Step 5: Additional Statistical Analysis (T-Test)
We can also perform an independent t-test to check if growth rate significantly differs between contamination levels.
# Perform Independent T-Test
t_test <- t.test(GrowthRate ~ Contamination, data = fish_data)
# Print t-test results
print(t_test)
📥 Download Sample Dataset
👉 Click here to download the fish growth dataset (Excel)
Full R Code
Conclusion
In this article, we explored point-biserial correlation in R, a powerful statistical method used to measure the relationship between a binary categorical variable (e.g., contamination level) and a continuous variable (e.g., fish growth rate).
Through data simulation, correlation analysis, visualization, and hypothesis testing, we gained insights into how environmental contamination affects biological growth. Key takeaways include:
✅ Point-biserial correlation provides a statistical measure of association.
✅ Graphical representations (boxplots and scatter plots) help visualize the trend.
✅ T-tests confirm whether differences between groups are statistically significant.
This method is widely applicable in biostatistics, ecology, health sciences, and genetics, making it a valuable tool for researchers analyzing binary vs. continuous relationships.