Introduction: Understanding Correlation and Its Importance in Data Analysis
In statistics, understanding relationships between variables is crucial for data interpretation, decision-making, and predictive modeling. One of the most common and powerful tools for this is the correlation coefficient (r), which measures the strength and direction of a linear relationship between two continuous variables.
This tutorial will guide you through calculating the Pearson correlation coefficient in R Studio, using a practical example of height and weight data, and visualizing the relationship using a scatter plot with a regression line.
By the end of this post, you’ll be able to:
- Calculate correlation in R
- Interpret correlation results
- Create a scatter plot in R
- Add a regression line and annotate your plot
- Understand how to use visualizations in exploratory data analysis (EDA)
What Is the Correlation Coefficient (r)?
Definition of Pearson’s Correlation Coefficient
The Pearson correlation coefficient (r) is a measure of the linear association between two continuous variables. It ranges from -1 to +1:
- r = +1: Perfect positive linear relationship
- r = -1: Perfect negative linear relationship
- r = 0: No linear relationship
Formula for Pearson’s r