Case-control analysis with R

Case-control analysis in R is often done to assess the association between exposure and outcome using odds ratios, confidence intervals, and hypothesis testing. Below is a guide for conducting a basic case-control analysis in R.

Step 1: Set up your data

Assume you have two groups (cases and controls) and a binary exposure variable (e.g., "exposed" vs. "not exposed").

For example:

exposed_cases: Number of cases with exposure.
not_exposed_cases: Number of cases without exposure.
exposed_controls: Number of controls with exposure.
not_exposed_controls: Number of controls without exposure.

 # Example data
 exposed_cases <- 50
 not_exposed_cases <- 30
 exposed_controls <- 20
 not_exposed_controls <- 100

Step 2: Create a Contingency Table

A contingency table helps to organize the data for further analysis.

 # Create a 2x2 matrix
 data_matrix <- matrix(c(exposed_cases, not_exposed_cases, exposed_controls, not_exposed_controls),
                       nrow = 2, byrow = TRUE,
                       dimnames = list(Exposure = c("Exposed", "Not Exposed"),
                                       Outcome = c("Cases", "Controls")))
 
 # Display the contingency table
 print(data_matrix)

Step 3: Calculate Odds Ratio

The odds ratio (OR) measures the association between exposure and outcome. In R, you can calculate it manually or use a package like epitools.

Manual Calculation:

 # Calculate the odds ratio
 odds_ratio <- (exposed_cases * not_exposed_controls) / (not_exposed_cases * exposed_controls)
 print(paste("Odds Ratio:", odds_ratio))

Using the epitools Package:

 # Install epitools if you haven't already
 # install.packages("epitools")
 library(epitools)
 
 # Calculate odds ratio and confidence interval
 odds_ratio_result <- oddsratio(data_matrix, method = "wald")
 print(odds_ratio_result)

Step 4: Calculate Confidence Interval for the Odds Ratio

To compute the confidence interval manually: