Introduction to Species Distribution Models (SDM)
Species Distribution Models (SDMs) predict where species are likely to occur based on environmental variables and species presence-absence data. These models are critical for ecological research, conservation planning, and understanding the impacts of environmental changes on biodiversity.
In this guide, we’ll demonstrate how to create an SDM in R using simulated data. The code includes generating environmental data, modeling species presence-absence, and visualizing species distribution.
Key Steps in Building an SDM in R
Step 1: Setting Up the Environment
Before diving into modeling, load the required R libraries. Here’s what we’ll use:
library(ggplot2) # For visualization
library(dplyr) # For data manipulation
set.seed(42) # For reproducibility
These libraries are essential for data processing and visualization in R.
Step 2: Simulating Data for SDM
To model species distribution, we simulate a dataset representing latitude, longitude, and environmental variables (e.g., temperature, precipitation, and elevation). The following code generates 100 random points with these variables:
n <- 100
latitude <- runif(n, min = -90, max = 90)
longitude <- runif(n, min = -180, max = 180)
temperature <- runif(n, min = -10, max = 40) # Celsius
precipitation <- runif(n, min = 0, max = 2000) # mm/year
elevation <- runif(n, min = 0, max = 4000) # meters
presence <- ifelse(
temperature > 10 & temperature < 30 & precipitation > 500 & precipitation < 1500,
1,
0
)
sdm_data <- data.frame(
Latitude = latitude,
Longitude = longitude,
Temperature = temperature,
Precipitation = precipitation,
Elevation = elevation,
Presence = presence
)
The variable presence is calculated based on predefined environmental thresholds, simulating where the species is likely to occur.
Step 3: Exploring the Dataset
Before building the model, inspect the dataset:
print(head(sdm_data))
summary(sdm_data)
This step ensures the data is clean and provides insights into the distributions of environmental variables.
Step 4: Visualizing Species Distribution
1. Map of Species Presence
The first plot maps species presence-absence across geographical coordinates:
ggplot(data = sdm_data, aes(x = Longitude, y = Latitude)) +
geom_point(aes(color = factor(Presence)), size = 3) +
scale_color_manual(values = c("red", "blue"),
labels = c("Absent", "Present"),
name = "Species Presence") +
labs(title = "Species Distribution Model (SDM)",
x = "Longitude",
y = "Latitude") +
theme_minimal()
This plot visualizes species presence (blue) and absence (red) across longitudes and latitudes, providing an intuitive understanding of the species’ geographical range.
2. Environmental Variables vs. Species Presence
The second plot explores how environmental factors influence species presence:
ggplot(data = sdm_data, aes(x = Temperature, y = Precipitation)) +
geom_point(aes(color = factor(Presence)), size = 3) +
scale_color_manual(values = c("red", "blue"),
labels = c("Absent", "Present"),
name = "Species Presence") +
labs(title = "Environmental Variables vs Species Presence",
x = "Temperature (°C)",
y = "Precipitation (mm/year)") +
theme_minimal()
This scatter plot highlights how specific combinations of temperature and precipitation correlate with species presence.
Step 5: Preparing for Real-World Applications
For real-world SDMs, data is often imported from external sources like Excel files or databases. R provides robust tools to handle such data:
library(readxl)
SDM_dataset <- read_excel("SDM_dataset.xlsx")
View(SDM_dataset)
This code demonstrates how to load external datasets into R for modeling.
Benefits of Using R for SDMs
- Flexibility: R supports diverse modeling techniques (e.g., logistic regression, MaxEnt).
- Visualization: ggplot2 enables customized, publication-ready visualizations.
- Reproducibility: R scripts ensure consistent results and transparency in research.
Conclusion
Building a Species Distribution Model in R is an insightful way to explore species-environment relationships. By simulating data, visualizing patterns, and interpreting results, you can uncover ecological insights that are essential for conservation and environmental management.
The provided R script offers a foundation for understanding SDMs and can be easily adapted for real-world datasets. Whether you’re a beginner or an experienced ecologist, this guide is a stepping stone toward mastering SDMs in R.