Markov Chain Monte Carlo (MCMC) is a powerful computational technique used in Bayesian statistics to draw samples from a posterior distribution when direct sampling is difficult. It is particularly useful when the posterior distribution is complex and does not have a closed-form solution.
Key Concepts of MCMC
-
Bayesian Inference \& Posterior Distribution
-
In Bayesian statistics, we estimate parameters using Bayes' Theorem:
$$
P(\theta \mid D)=\frac{P(D \mid \theta) P(\theta)}{P(D)}
$$
where:
- $P(\theta \mid D) \rightarrow$ Posterior (distribution of parameters given data)
- $P(D \mid \theta) \rightarrow$ Likelihood (how well the data fits a parameter)
- $P(\theta) \rightarrow$ Prior (belief about the parameter before data)
- $P(D) \rightarrow$ Evidence (normalization constant)
- Since the denominator $P(D)$ is often intractable, MCMC helps approximate the posterior without needing to compute it explicitly.
-
Markov Chains
- A Markov Chain is a stochastic process where the next state depends only on the current state.
- MCMC constructs a Markov chain whose stationary distribution is the target posterior.
-
Monte Carlo Sampling
- The Monte Carlo method draws random samples to approximate the posterior distribution.
- Over many iterations, these samples approximate the true posterior.
Popular MCMC Algorithms
- Metropolis-Hastings Algorithm
- A general MCMC method to generate samples.
- Steps:
-
Start with an initial guess $\theta_0$.
-
Propose a new sample $\theta^$ from a proposal distribution $q\left(\theta^ \mid \theta\right)$.
-
Compute the acceptance ratio:
$$
A=\frac{P\left(D \mid \theta^\right) P\left(\theta^\right)}{P(D \mid \theta) P(\theta)}
$$
-
Accept $\theta ^*$ with probability $A$, otherwise stay at $\theta$.
-
Repeat until convergence.
- Gibbs Sampling
- Special case of Metropolis-Hastings where new samples are drawn from full conditional distributions.
- Efficient for models with multiple parameters.
- Hamiltonian Monte Carlo (HMC)
- Uses gradient-based sampling to improve efficiency.
- Commonly used in modern Bayesian computing (e.g., Stan, PyMC3).
🧠Python Example: MCMC with Metropolis-Hastings
Here's an implementation to estimate the mean of a normal distribution using MCMC:
https://gist.github.com/viadean/8b70b20cd524951303047564eec8ba5e
Why Use MCMC?
- Works for Complex Posteriors → No need for analytical solutions
- Scales to Large Datasets → Used in Bayesian deep learning
- Efficient for High-Dimensional Problems → HMC is widely used in probabilistic programming