{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mixtures and MCMC\n", "\n", "##### Keywords: supervised learning, semi-supervised learning, unsupervised learning, mixture model, gaussian mixture model, pymc3, label-switching, identifiability, normal distribution, pymc3 potentials\n", "\n", "We now do a study of learning mixture models with MCMC. We have already done this in the case of the Zero-Inflated Poisson Model, and will stick to Gaussian Mixture models for now." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import scipy as sp\n", "import matplotlib as mpl\n", "import matplotlib.cm as cm\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "pd.set_option('display.width', 500)\n", "pd.set_option('display.max_columns', 100)\n", "pd.set_option('display.notebook_repr_html', True)\n", "import seaborn as sns\n", "sns.set_style(\"whitegrid\")\n", "sns.set_context(\"poster\")\n", "import pymc3 as pm\n", "import theano.tensor as tt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mixture of 2 Gaussians, the old faithful data\n", "\n", "We start by considering waiting times from the Old-Faithful Geyser at Yellowstone National Park." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | eruptions | \n", "waiting | \n", "
---|---|---|
0 | \n", "3.600 | \n", "79 | \n", "
1 | \n", "1.800 | \n", "54 | \n", "
2 | \n", "3.333 | \n", "74 | \n", "
3 | \n", "2.283 | \n", "62 | \n", "
4 | \n", "4.533 | \n", "85 | \n", "