Case-control analysis in R is often done to assess the association between exposure and outcome using odds ratios, confidence intervals, and hypothesis testing. Below is a guide for conducting a basic case-control analysis in R.
Assume you have two groups (cases and controls) and a binary exposure variable (e.g., "exposed" vs. "not exposed").
For example:
exposed_cases
: Number of cases with exposure.not_exposed_cases
: Number of cases without exposure.exposed_controls
: Number of controls with exposure.not_exposed_controls
: Number of controls without exposure. # Example data
exposed_cases <- 50
not_exposed_cases <- 30
exposed_controls <- 20
not_exposed_controls <- 100
A contingency table helps to organize the data for further analysis.
# Create a 2x2 matrix
data_matrix <- matrix(c(exposed_cases, not_exposed_cases, exposed_controls, not_exposed_controls),
nrow = 2, byrow = TRUE,
dimnames = list(Exposure = c("Exposed", "Not Exposed"),
Outcome = c("Cases", "Controls")))
# Display the contingency table
print(data_matrix)
The odds ratio (OR) measures the association between exposure and outcome. In R, you can calculate it manually or use a package like epitools
.
Manual Calculation:
# Calculate the odds ratio
odds_ratio <- (exposed_cases * not_exposed_controls) / (not_exposed_cases * exposed_controls)
print(paste("Odds Ratio:", odds_ratio))
Using the epitools
Package:
# Install epitools if you haven't already
# install.packages("epitools")
library(epitools)
# Calculate odds ratio and confidence interval
odds_ratio_result <- oddsratio(data_matrix, method = "wald")
print(odds_ratio_result)
To compute the confidence interval manually: