-
-
Save jtokoph/40b726c20ed84047660454707a80aa4b to your computer and use it in GitHub Desktop.
TensorFlow 101
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import numpy as np\n", | |
| "import matplotlib.cm as cm\n", | |
| "import matplotlib.pyplot as plt\n", | |
| "import seaborn as sns\n", | |
| "import tensorflow as tf\n", | |
| "%matplotlib inline" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "TensorFlow 101\n", | |
| "==============\n", | |
| "\n", | |
| "Some of the main objects in TensorFlow are \"placeholders\", which are things that you can feed things into, \"Variables\", which contain intermediate results of calculations, and \"sessions\" in which one carries out predefined calculations." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "a = tf.placeholder('int64') # a will be an integer\n", | |
| "b = tf.placeholder('int64') # b will also be an integer\n", | |
| "c = tf.add(a, b) # c = a + b\n", | |
| "\n", | |
| "with tf.Session() as sess: # Start a TensorFlow \"session\"\n", | |
| " # Feed in the values of a and b and evaluate c\n", | |
| " result = sess.run(c, feed_dict={a: 2, b: 2})\n", | |
| " print(result)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Linear regression in TensorFlow\n", | |
| "===============================\n", | |
| "\n", | |
| "Let's do a slightly more interesting example: OLS linear regression. Let's first set up some data outside of TensorFlow" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "train_X = np.array([7.33892413, 6.09065249, 3.6782235 , 1.5484478 , 5.2024495 ,\n", | |
| " 1.55973193, 1.90577005, 3.52333413, 1.89705273, 7.16303144,\n", | |
| " 6.69186969, 6.35116542, 8.40910037, 6.81104552, 9.0276311])\n", | |
| "train_Y = np.array([6.01218497, 7.15272823, 4.39711002, 0.78859037, 4.66163788,\n", | |
| " 2.1366047 , 0.75108498, 2.4262601 , 2.01357347, 5.1216737 ,\n", | |
| " 8.01817765, 7.07644285, 9.11215858, 8.64146407, 8.13432732])\n", | |
| "\n", | |
| "n_samples = train_X.shape[0]\n", | |
| "\n", | |
| "print(\"Number of samples: \", n_samples)\n", | |
| "plt.plot(train_X, train_Y, 'o');" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "At the end of the day, our model will be $y = Wx + b$, where we feed in the training values for $x$ and $y$, and the model will have to determine $W$ and $b$." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "X = tf.placeholder(\"float\")\n", | |
| "Y = tf.placeholder(\"float\")\n", | |
| "\n", | |
| "# Set model weights; initialize all weights as zero\n", | |
| "W = tf.Variable(tf.zeros([1], \"float\"), name=\"weight\")\n", | |
| "b = tf.Variable(tf.zeros([1], \"float\"), name=\"bias\")\n", | |
| "\n", | |
| "# Construct a linear model\n", | |
| "pred = tf.add(tf.mul(X, W), b)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Recall that OLS works by minimizing the mean squared error, $\\frac{1}{2n} \\sum_{i=1}^n (W x_i + b - y_i)^2$; in TensorFlow, we explicitly specify this \"cost\" and how to go about minimizing it." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Mean squared error\n", | |
| "cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)\n", | |
| "\n", | |
| "# Gradient descent\n", | |
| "learning_rate = 0.001\n", | |
| "optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)\n", | |
| "\n", | |
| "# Initializing the variables\n", | |
| "init = tf.global_variables_initializer()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "With the variables in place, we can run the TensorFlow session and examine what it learns." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "with tf.Session() as sess:\n", | |
| " sess.run(init)\n", | |
| "\n", | |
| " # Run the optimization algorithm 1000 times\n", | |
| " for i in range(1000):\n", | |
| " sess.run(optimizer, feed_dict={X: train_X, Y: train_Y})\n", | |
| " \n", | |
| " # Visualize the results\n", | |
| " print('W = ', sess.run(W))\n", | |
| " print('b = ', sess.run(b))\n", | |
| " plt.plot(train_X, train_Y, 'o')\n", | |
| "\n", | |
| " # Make predictions for new values of x\n", | |
| " x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n", | |
| " predictions = sess.run(pred, feed_dict={X: x})\n", | |
| " plt.plot(x, predictions)\n", | |
| " plt.show()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Logistic regression in TensorFlow\n", | |
| "=================================" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Next up, let us take a look at a classification problem instead." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "group1 = np.random.multivariate_normal([-4, -4], 20*np.identity(2), size=40)\n", | |
| "group2 = np.random.multivariate_normal([4, 4], 20*np.identity(2), size=40)\n", | |
| "plt.plot(*group1.T, 'o')\n", | |
| "plt.plot(*group2.T, 'o')\n", | |
| "plt.show()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The plan will be to find a line separating the two groups, so that if one day, someone comes with a new point in the plane, we will be able to say if that point should be green or blue.\n", | |
| "\n", | |
| "We proceed as with linear regression with a few notable differences:\n", | |
| "- Inputs $X$ are now 2-dimensional, which will be reflected in our placeholders and variables, we now have two weights, $w_1$ and $w_2$.\n", | |
| "- Our prediction function will be the *logistic* or *sigmoid* function $p(X) = 1/(1 + \\exp(w^tX + b))$, taking values between $0$ and $1$.\n", | |
| "\n", | |
| "At the end of the day, $p(X)$ represents the probability that the point $X$ should be labelled \"green\". Here's what its logarithm looks like:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "x = np.arange(-10, 11)\n", | |
| "plt.title('The logarithm of the logistic function')\n", | |
| "plt.xlabel('$w^t X + b$')\n", | |
| "plt.ylabel('$p(X)$')\n", | |
| "plt.plot(x, np.log(1/(1+np.exp(x))));" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Inputs are now two-dimensional and come with labels \"blue\" or \"green\" (represented by 0 or 1)\n", | |
| "X = tf.placeholder(\"float\", shape=[None, 2])\n", | |
| "labels = tf.placeholder(\"float\", shape=[None])\n", | |
| "\n", | |
| "# Set model weights and bias as before\n", | |
| "W = tf.Variable(tf.zeros([2, 1], \"float\"), name=\"weight\")\n", | |
| "b = tf.Variable(tf.zeros([1], \"float\"), name=\"bias\")\n", | |
| "\n", | |
| "# Predictor is now the logistic function\n", | |
| "pred = tf.sigmoid(tf.to_double(tf.reduce_sum(tf.matmul(X, W), axis=[1]) + b))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Just as we replaced our prediction method, we replace our cost function with the *cross-entropy*, $$-\\sum_{i=1}^n l(X_i) \\log(p(X_i)) + (1-l(X_i))\\log(1-p(X_i)),$$ where $l(X_i)$ is the label of $X_i$ (which is $0$ or $1$)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Similarly, the OLS cost function from before is now replaced by cross-entropy\n", | |
| "cost = -tf.reduce_sum(tf.to_double(labels) * tf.log(pred) + (1-tf.to_double(labels)) * tf.log(1-pred))\n", | |
| "\n", | |
| "# Gradient descent\n", | |
| "learning_rate = 0.001\n", | |
| "optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)\n", | |
| "\n", | |
| "# Initializing the variables\n", | |
| "init = tf.global_variables_initializer()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We are now in a position to run our optimization and plot the resulting values of $p$." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "with tf.Session() as sess:\n", | |
| " # We stack our two groups of 2-dimensional points and label them 0 and 1 respectively\n", | |
| " train_X = np.vstack((group1, group2))\n", | |
| " train_labels = np.array([0.0] * 40 + [1.0] * 40)\n", | |
| "\n", | |
| " sess.run(init)\n", | |
| "\n", | |
| " # Run the optimization algorithm 1000 times\n", | |
| " for i in range(1000):\n", | |
| " sess.run(optimizer, feed_dict={X: train_X, labels: train_labels})\n", | |
| " \n", | |
| " # Plot the predictions: the values of p\n", | |
| " Xmin = np.min(train_X)-1\n", | |
| " Xmax = np.max(train_X)+1\n", | |
| " x = np.arange(Xmin, Xmax, 0.1)\n", | |
| " y = np.arange(Xmin, Xmax, 0.1)\n", | |
| " \n", | |
| " plt.plot(*group1.T, 'o')\n", | |
| " plt.plot(*group2.T, 'o')\n", | |
| " plt.xlim(Xmin, Xmax)\n", | |
| " plt.ylim(Xmin, Xmax)\n", | |
| " print('W = ', sess.run(W))\n", | |
| " print('b = ', sess.run(b))\n", | |
| " \n", | |
| " xx, yy = np.meshgrid(x, y)\n", | |
| " predictions = sess.run(pred, feed_dict={X: np.array((xx.ravel(), yy.ravel())).T})\n", | |
| " \n", | |
| " plt.title('Probability that model will label a given point \"green\"')\n", | |
| " plt.contour(x, y, predictions.reshape(len(x), len(y)), cmap=cm.BuGn, levels=np.arange(0.0, 1.1, 0.1))\n", | |
| " plt.colorbar()" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "anaconda-cloud": {}, | |
| "kernelspec": { | |
| "display_name": "Python [Root]", | |
| "language": "python", | |
| "name": "Python [Root]" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.2" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment