Download Notebook

Summary

Harvard University
Spring 2018
Instructors: Rahul Dave
**Due Date: ** Friday, Febrary 16th, 2018 at 10:00am

Instructions:

Problem 1: Optimization via Descent

Suppose you are building a pricing model for laying down telecom cables over a geographical region. Your model takes as input a pair of coordinates, $(x, y)$, and contains two parameters, $\lambda_1, \lambda_2$. Given a coordinate, $(x, y)$, and model parameters, the loss in revenue corresponding to the price model at location $(x, y)$ is described by Read the data contained in HW3_data.csv. This is a set of coordinates configured on the curve $y^2 - x^2 = -0.1$. Given the data, find parameters $\lambda_1, \lambda_2$ that minimize the net loss over the entire dataset.

Part A

Part B

Part C

Compare the performance of stochastic gradient descent for the following learning rates: 1, 0.1, 0.001, 0.0001. Based on your observations, briefly describe the effect of the choice of learning rate on the performance of the algorithm.

Problem 2. SGD for Multinomial Logistic Regresion on MNIST

The MNIST dataset is one of the classic datasets in Machine Learning and is often one of the first datasets against which new classification algorithms test themselves. It consists of 70,000 images of handwritten digits, each of which is 28x28 pixels. You will be using PyTorch to build a handwritten digit classifier that you will train and test with MNIST.

** The MNIST dataset (including a train/test split which you must use) is part of PyTorch in the torchvision module. The Lab will have details of how to load it. **

Your classifier must implement a multinomial logistic regression model (using softmax). It will take as input an array of pixel values in an image and output the images most likely digit label (i.e. 0-9). You should think of the pixel values as features of the input vector.

  1. Plot 10 sample images from the MNIST dataset (to develop intuition for the feature space).
  2. Construct a softmax formulation in PyTorch of multinomial logistic regression with Cross Entropy Loss.
  3. Train your model using SGD to minimize the cost function. Use a batch size of 64, a learning rate $\eta = 0.01$, and 10 epochs.
  4. Plot the cross-entropy loss on the training set as a function of iteration.
  5. What are the training and test set accuracies?
  6. Plot some (around 5) examples of misclassifications.