Introduction: Understanding Correlation and Its Importance in Data Analysis
In statistics, understanding relationships between variables is crucial for data interpretation, decision-making, and predictive modeling. One of the most common and powerful tools for this is the correlation coefficient (r), which measures the strength and direction of a linear relationship between two continuous variables.
This tutorial will guide you through calculating the Pearson correlation coefficient in R Studio, using a practical example of height and weight data, and visualizing the relationship using a scatter plot with a regression line.
By the end of this post, you’ll be able to:
-     Calculate correlation in R
 -     Interpret correlation results
 -     Create a scatter plot in R
 -     Add a regression line and annotate your plot
 -     Understand how to use visualizations in exploratory data analysis (EDA)
 
What Is the Correlation Coefficient (r)?
Definition of Pearson’s Correlation Coefficient
The Pearson correlation coefficient (r) is a measure of the linear association between two continuous variables. It ranges from -1 to +1:
- r = +1: Perfect positive linear relationship
 - r = -1: Perfect negative linear relationship
 - r = 0: No linear relationship
 
Formula for Pearson’s r
Dataset Used for Correlation Analysis
Here, we’ll use a sample dataset representing height (cm) and weight (kg) of 15 individuals
Step-by-Step R Code to Calculate Correlation and Create Scatter Plot
Step 1 – Define the Data in R
height <- c(150, 152, 155, 158, 160, 162, 165, 168, 170, 172, 175, 178, 180, 183, 185)
weight <- c(48, 50, 52, 54, 56, 58, 60, 62, 64, 65, 68, 70, 72, 75, 78)
Here we are creating two numeric vectors: height and weight.
Step 2 – Calculate Pearson’s Correlation Coefficient
cor.test(height, weight, method = "pearson")
This line performs a Pearson correlation test, which not only calculates the value of r but also gives statistical significance (p-value and confidence interval).
You can extract the r-value and round it for display:
cor_result <- cor.test(height, weight, method = "pearson")
r_value <- round(cor_result$estimate, 2)
Sample Output
Pearson's product-moment correlation
data:  height and weight
t = 35.67, df = 13, p-value = 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9871 0.9988
sample estimates:
      cor 
0.99588
This means the correlation is very strong and positive (r ≈ 0.996).
Step 3 – Create a Basic Scatter Plot
plot(height, weight, 
     main = "Height vs Weight", 
     xlab = "Height (cm)", 
     ylab = "Weight (kg)", 
     pch = 19, 
     col = "blue")
This code generates a simple scatter plot with blue circular points.
Step 4 – Add Regression Line
abline(lm(weight ~ height), col = "red")
This line adds a linear regression line in red, showing the trend of the data.
Step 5 – Annotate the Plot with the r Value
text(x = min(height) + 5, y = max(weight) - 2,
     labels = paste("r =", r_value), 
     col = "darkgreen", cex = 1.2, font = 2)
This line places the correlation coefficient (r) on the plot.
.jpeg)  | 
| correlation-coefficient-scatter-plot-r-studio | 
Complete Code Block in R Studio
Interpretation of Results
Strength and Direction
The calculated r ≈ 0.996 shows a very strong, positive linear relationship between height and weight.
Statistical Significance
The p-value < 0.05 indicates that the correlation is statistically significant.
 Applications of Correlation in Biological Sciences
| Domain | 
Use Case | 
| Epidemiology | 
Height vs BMI, blood pressure vs cholesterol | 
| Psychology | 
Stress level vs sleep quality | 
| Environmental Sci. | 
Temperature vs species diversity | 
| Agriculture | 
Rainfall vs crop yield | 
Conclusion
The correlation coefficient (r) is a fundamental statistical tool that reveals relationships between continuous variables. Using R Studio, you can quickly compute this value, test its significance, and visually display it with a scatter plot and regression line.
In our example, height and weight showed a strong positive correlation (r ≈ 0.996), illustrating how R can effectively explore real-world relationships with just a few lines of code.