Rejection Sampling and Importance Sampling

Statistical Computing for Data Analysis

Where we are in the course

So far, we have mostly relied on:

But in practice, you often face one (or more) of these problems:

  1. You can evaluate a density \(f(x)\), but you cannot invert the CDF.
  2. The event you care about is rare, so naive simulation wastes most samples.

Today: two practical tools for these situations.

Outline

  1. Rejection sampling (to generate data when no inverse CDF is available)
  2. Importance sampling (to estimate expectations/probabilities efficiently, especially rare events)

A quick decision guide (preview)

Goal: Generate independent draws from a distribution

Goal: Estimate \(E[h(X)]\) or \(P(X \in A)\)

Rejection sampling: when and why

When to use it

You want to simulate from a target density \(f(x)\) but:

Key idea

Sample from an easier proposal distribution \(g(x)\), then accept a draw with some probability so that accepted values follow the target.

Rejection sampling: the algorithm

Assume:

Algorithm:

  1. Draw \(Y \sim g\)
  2. Draw \(U \sim \text{Uniform}(0,1)\)
  3. Accept \(Y\) if \[ U \le \frac{\tilde f(Y)}{M g(Y)}. \]
  4. Repeat until you have \(n\) accepted draws

Practical note: The acceptance rate is about \(1/M\) (higher is better).

Rejection sampling example: a “weird” density with no inverse CDF

Target density on \([-2,2]\): \[ f(x) \propto \exp(-x^4), \quad -2 \le x \le 2. \]

Because \(\tilde f(x) = \exp(-x^4) \le 1\) and \(g(x)=1/4\), we can use \(M=4\).

Rejection sampler in base R

set.seed(4279)

f_tilde <- function(x) exp(-x^4)                # unnormalized target
g_samp  <- function(n) runif(n, min=-2, max=2)  # proposal sampler
g_dens  <- function(x) dunif(x, min=-2, max=2)  # proposal density

rej_samp <- function(n, M = 4) {
  out <- numeric(0)
  n_try <- 0

  while (length(out) < n) {
    y <- g_samp(1)
    u <- runif(1)
    n_try <- n_try + 1

    # acceptance probability = f_tilde(y) / (M * g(y))
    # Here it simplifies to exp(-y^4).
    if (u <= f_tilde(y) / (M * g_dens(y))) out <- c(out, y)
  }

  list(draws = out, acceptance_rate = n / n_try)
}

sim <- rej_samp(5000)
sim$acceptance_rate
## [1] 0.4593477

Visual check: histogram vs (scaled) density

x <- seq(-2, 2, length.out = 400)

hist(sim$draws, breaks = 40, freq = FALSE, col = "lightgray",
     main = "Rejection Sampling: draws from f(x) ∝ exp(-x^4)",
     xlab = "x")

# Overlay a *scaled* version of the unnormalized density (for shape only)
curve(f_tilde(x) / integrate(f_tilde, -2, 2)$value, add = TRUE, lwd = 2, col = "blue")

Rejection sampling: practical tips and pitfalls

Choosing the proposal \(g\)

Red flags in practice

Takeaway: Rejection sampling is great for generating data, but can become inefficient in higher dimensions or with poor proposals.

Importance sampling: when and why

When to use it

You want to estimate an expectation or probability under a target distribution, e.g.

and naive simulation is inefficient, especially if:

Key idea

Simulate from a different distribution that visits “important” regions more often, and then reweight.

Importance sampling: the rare-event problem

Suppose we care about a very rare event, like:

If we simulate normally (from the usual model), almost every simulated run is a non-failure.

That means:

Importance sampling: the basic idea

Instead of sampling under “typical conditions,” we:

  1. Intentionally over-sample high-risk conditions where failures are more likely

  2. Down-weight those high-risk samples so we don’t exaggerate the overall risk

So we get:

A helpful mental picture

But after we collect needles from the smaller pile, we correct for the fact that we looked in a biased place by applying weights.

Importance sampling: what is a weight?

A weight is a correction factor that says:

“How much should this sample count when estimating the original probability?”

You can think of weights as undoing the bias introduced by “oversampling risk.”

Our Goal: estimate an average or probability under the original model \(f(x)\): \[ \mu = \mathbb{E}_{f}[h(X)]. \]

We simulate from a different model \(g(x)\) and correct with weights: \[ \mu = \int h(x)\,f(x)\,dx = \int h(x)\,\frac{f(x)}{g(x)}\,g(x)\,dx = \mathbb{E}_{g}\left[h(X)\,\underbrace{\frac{f(X)}{g(X)}}_{\text{weight}}\right]. \]

That’s all: sample from \(g\), then reweight to estimate under \(f\).

Practical guidance: choosing the “high-risk sampler”

Your “high-risk” sampling distribution should:

Rule of thumb for tail events:

Common pitfall (what can go wrong)

Bad choice: your high-risk sampler avoids some region where the original model can go.

Result:

In practice, you should always check:

Importance sampling example: estimating a rare event probability

Target: \(Z \sim N(0,1)\). We want \[ p = P(Z > 4). \]

This is rare (about 3e-5), so with naive sampling you might see zero events unless \(n\) is huge.

Proposal: \(Z \sim N(\mu, 1)\) with \(\mu=4\), so we generate more tail values near 4.

Naive Monte Carlo vs importance sampling

set.seed(4279)
n <- 50000
threshold <- 4

# True value (for checking only)
p_true <- 1 - pnorm(threshold)

# 1) Naive MC: Z ~ N(0,1)
z_naive <- rnorm(n)
p_hat_naive <- mean(z_naive > threshold)

# 2) Importance sampling: sample from N(mu,1)
mu <- 4
z_is <- rnorm(n, mean = mu, sd = 1)

# weights: f/g = dnorm(z;0,1) / dnorm(z;mu,1)
w <- dnorm(z_is, mean = 0, sd = 1) / dnorm(z_is, mean = mu, sd = 1)

p_hat_is <- mean((z_is > threshold) * w)

c(p_true = p_true, naive = p_hat_naive, importance = p_hat_is)
##       p_true        naive   importance 
## 3.167124e-05 2.000000e-05 3.158224e-05

A quick diagnostic: effective sample size (ESS)

Importance sampling can fail if weights are extremely variable.

A simple diagnostic is the effective sample size: \[ \text{ESS} = \frac{\left(\sum_i w_i\right)^2}{\sum_i w_i^2}. \]

ess <- (sum(w)^2) / sum(w^2)
c(n = n, ESS = ess, ESS_fraction = ess/n)
##            n          ESS ESS_fraction 
## 5.000000e+04 6.514954e+00 1.302991e-04