How to Calculate Correlation Coefficient (r) and Create a Scatter Plot in R Studio

Introduction: Understanding Correlation and Its Importance in Data Analysis

In statistics, understanding relationships between variables is crucial for data interpretation, decision-making, and predictive modeling. One of the most common and powerful tools for this is the correlation coefficient (r), which measures the strength and direction of a linear relationship between two continuous variables.

This tutorial will guide you through calculating the Pearson correlation coefficient in R Studio, using a practical example of height and weight data, and visualizing the relationship using a scatter plot with a regression line.

By the end of this post, you’ll be able to:

  •     Calculate correlation in R
  •     Interpret correlation results
  •     Create a scatter plot in R
  •     Add a regression line and annotate your plot
  •     Understand how to use visualizations in exploratory data analysis (EDA)

What Is the Correlation Coefficient (r)?

Definition of Pearson’s Correlation Coefficient

The Pearson correlation coefficient (r) is a measure of the linear association between two continuous variables. It ranges from -1 to +1:

  • r = +1: Perfect positive linear relationship
  • r = -1: Perfect negative linear relationship
  • r = 0: No linear relationship

Formula for Pearson’s r

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}

Step-by-Step R Code to Calculate Correlation and Create Scatter Plot

Post a Comment

Previous Post Next Post