Download Notebook

Summary

**Due Date: ** Friday, Febrary 9th, 2017 at 10am

Instructions:

Problem 1: Monte Carlo Integration

Let $X$ be a random variable with distribution described by the following pdf:

Let $h$ be the following function of $X$:

Compute $\mathbb{E}[h(X)]$ via Monte Carlo simulation using the following sampling methods:

Problem 2: Variance Reduction

Part A

Compute the variance of each estimate of $\mathbb{E}[h(X)]$ obtained in Problem 1. What do you see?

Part B (Stratified Sampling)

Often, a complex integral can be computed with more ease if one can break up the domain of the integral into pieces and if on each piece of the domain the integral is simplified.

Read more about Stratified Sampling:

  1. Monte Carlo Methods

  2. Variance Reduction Techniques Chapter

Problem 3: Linear Regression

Consider the following base Regression class, which roughly follows the API in the python package scikit-learn.

Our model is the the multivariate linear model whose MLE solution or equivalent cost minimization was talked about in lecture:

where $y$ is a length $n$ vector, $X$ is an $m \times p$ matrix created by stacking the features for each data point, and $\beta$ is a $p$ length vector of coefficients.

The class showcases the API:

$fit(X, y)$: Fits linear model to $X$ and $y$.

$get_params()$: Returns $\hat{\beta}$ for the fitted model. The parameters should be stored in a dictionary with keys “intercept” and “coef” that give us $\hat{\beta_0}$ and $\hat{\beta_{1:}}$. (The second value here is thus a numpy array of coefficient values)

$predict(X)$: Predict new values with the fitted model given $X$.

$score(X, y)$: Returns $R^2$ value of the fitted model.

$set_params()$: Manually set the parameters of the linear model.

class Regression(object):
    
    def __init__(self):
        self.params = dict()
    
    def get_params(self, k):
        return self.params[k]
    
    def set_params(self, **kwargs):
        for k,v in kwargs.iteritems():
            self.params[k] = v
        
    def fit(self, X, y):
        raise NotImplementedError()
        
    def predict(self, X):
        raise NotImplementedError()
        
    def score(self, X, y):
        raise NotImplementedError()

Part A: a class for Ordinary Least Squares

Inherit from this class to create an ordinary Least Squares Linear Regression class.

It’s signature will look like this:

class OLS(Regression):

Implement fit, predict and score. This will involve some linear algebra. (You might want to read up on pseudo-inverses before you directly implement the linear algebra on the lecure slides).

$R^2$ score

To implement score, look below:

The $R^2$ score is defined as:

Where:

where ${y_i}$ are the original data values, $\hat{y_i}$ are the predicted values, and $\bar{y_i}$ is the mean of the original data values.

Part B: test your code

We’ll create a synthetic data set using the code below. (Read the documentation for make_regression to see what is going on).

from sklearn.datasets import make_regression
import numpy as np
np.random.seed(99)
X, y, coef = make_regression(30,10, 10, bias=1, noise=2, coef=True)
coef
array([ 76.6568183 ,  77.67682678,  63.78807738,  19.3299907 ,
        59.01638708,  53.13633737,  28.77629958,  10.01888939,
         9.25346811,  59.55220395])

Verify that your code recovers these coefficients approximately on doing the fit. Plot the predicted y against the actual y. Also calculate the score using the same sets X and y. The usage will look something like:

lr = OLS()
lr.fit(X,y)
lr.get_params['coef']
lr.redict(X,y)
lr.score(X,y)