To perform case-control analysis in Python, we typically calculate the odds ratio, confidence intervals, and perform hypothesis testing, such as a chi-square test or logistic regression for adjustment. Here’s a step-by-step guide to conducting case-control analysis in Python.

Step 1: Set up your data

We assume two groups, "cases" (with the condition) and "controls" (without the condition), and a binary exposure variable (e.g., "exposed" vs. "not exposed").

For example:

 # Example data
 exposed_cases = 50
 not_exposed_cases = 30
 exposed_controls = 20
 not_exposed_controls = 100

Step 2: Create a Contingency Table

A contingency table organizes the data for further analysis.

 import pandas as pd
 ​
 # Creating a 2x2 contingency table
 data = pd.DataFrame(
     {
         "Cases": [exposed_cases, not_exposed_cases],
         "Controls": [exposed_controls, not_exposed_controls]
     },
     index=["Exposed", "Not Exposed"]
 )
 ​
 print(data)

Step 3: Calculate Odds Ratio

The odds ratio (OR) measures the association between exposure and outcome. Here’s how to calculate it manually.

$$ \text{OR} = \frac{(\text{exposed\_cases} \times \text{not\_exposed\_controls})}{(\text{not\_exposed\_cases} \times \text{exposed\_controls})} $$

 # Calculating the odds ratio
 odds_ratio = (exposed_cases * not_exposed_controls) / (not_exposed_cases * exposed_controls)
 print(f"Odds Ratio: {odds_ratio}")

Alternatively, you can use the statsmodels library to calculate the odds ratio and confidence intervals:

 import statsmodels.api as sm
 import numpy as np
 ​
 # Using statsmodels to calculate odds ratio and confidence interval
 table = np.array([[exposed_cases, not_exposed_cases], [exposed_controls, not_exposed_controls]])
 oddsratio, p_value = sm.stats.table2x2(table).oddsratio, sm.stats.table2x2(table).oddsratio_confint()
 ​
 print(f"Odds Ratio: {oddsratio}")
 print(f"95% Confidence Interval: {p_value}")

Step 4: Calculate Confidence Interval for the Odds Ratio (Manually)

For a 95% confidence interval, use the formula: