Keywords: variance reduction |  stratification |   |  Download Notebook

Contents

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print("Finished Setup")
Finished Setup

The key idea in stratification is to split the domain on which we wish to calculate an expectation or integral into strata. Then, on each of these strata, we calculate the sub-integral as an expectation separately, using whatever method is appropriate for the stratum, and which gives us the lowest variance. These expectations are then combined together to get the final value.

In other words we can achieve better sampling in needed regions by going away from a one size fits all sampling scheme. One way to think about it is that regions with higher variability might need more samples, while not-so-varying regions could make do with less.

The diagram below illustrates this a bit. One could stratify by staying on the given grid, but the blue and yellow strata we have chosen might be better.

Now, instead of taking $N$ samples, we break the interval into $M$ strata and take $n_j$ samples for each strata $j$, such that $N=\sum_j n_j$.

Defining:

We can then define the stratified estimator of the overall expectation

which is an unbiased estimator of $\mu$.

where

is the “population variance” of $h(x)$ with respect to pdf $f_j(x)$ in region of support $D_j$.

Example

For a one-dimensional application we take $ x/(x^2+1)$ and integrate from $[0,1]$. We break $[0,10]$ into $M$ strata and for each stratum, take $N/M$ samples with uniform probability distribution. Compute the average within each stratum, and then calculate the overall average.

See http://www.public.iastate.edu/~mervyn/stat580/Notes/s09mc.pdf

plt.figure(figsize=[14,8])
Y = lambda x: x/(x**2+1.0);
intY = lambda x: np.log(x**2 + 1.0)/2.0;



N = 10000
Nrep = 1000
Ntry = 1000
Ns = 10   # number of strata 

xmin=0
xmax =10
step = (xmax - xmin)/Ns
# analytic solution 
Ic = intY(xmax)-intY(xmin)


Imc = np.zeros(Nrep)
Is = np.zeros(Nrep)
Is2 = np.zeros(Nrep)

## Ploting the original functions 
plt.subplot(1,2,1)
x = np.linspace(0,10,100)
plt.plot(x, Y(x), label=u'$x/(x**2+1)$')
for j in range(Ns+1):
    plt.axvline(xmin + j*step, 0, 1, color='r', alpha=0.2)
    
sigmas = np.zeros(Ns)
Utry = np.random.uniform(low=xmin, high=xmax, size=Ntry)
Ytry = Y(Utry)
Umin = 0 
Umax = step
for reg in np.arange(0,Ns):
    localmask = (Utry >= Umin) & (Utry < Umax)
    sigmas[reg] = np.std(Ytry[localmask])
    Umin = Umin + step
    Umax = Umin + step
nums = np.ceil(N*sigmas/np.sum(sigmas)).astype(int)
print(sigmas, nums, np.sum(nums))
    
for k in np.arange(0,Nrep):
    # First lets do it with mean MC method 
    U = np.random.uniform(low=xmin, high=xmax, size=N)
    Imc[k] = (xmax-xmin)* np.mean(Y(U))

    #stratified it in Ns regions
    Umin = 0 
    Umax = step
    Ii = 0
    I2i = 0
    for reg in np.arange(0,Ns):
        x = np.random.uniform(low=Umin, high=Umax, size=N//Ns);
        Ii = Ii + (Umax-Umin)*np.mean(Y(x))
        x2 = np.random.uniform(low=Umin, high=Umax, size=nums[reg]);
        I2i = I2i + (Umax-Umin)*np.mean(Y(x2))
        Umin = Umin + step
        Umax = Umin + step


    Is[k] = Ii
    Is2[k] = I2i

plt.subplot(1,2,2)
plt.hist(Imc,30, histtype='stepfilled', label=u'Normal MC', alpha=0.1)
plt.hist(Is, 30, histtype='stepfilled', label=u'Stratified', alpha=0.3)
plt.hist(Is2, 30, histtype='stepfilled', label=u'Stratified (sigmaprop)', alpha=0.5)


plt.legend()

print(np.std(Imc), np.std(Is), np.std(Is2))
[ 0.16531214  0.03284132  0.0292761   0.01743354  0.01195081  0.00838309
  0.00645184  0.00485085  0.0036279   0.00303405] [5839 1160 1034  616  423  297  228  172  129  108] 10006
0.012575540594 0.00507214579121 0.002644516884

png