Last active
July 21, 2017 12:26
-
-
Save fmoessbauer/22a03a8c8eeeedfbe03c4cea7f167cdb to your computer and use it in GitHub Desktop.
MMMOG Exercise 5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import numpy as np\n", | |
| "from scipy.stats import chisquare" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Bot Detection with Bayes" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Observed data\n", | |
| "O = np.array([3,2,1,4,1,2,2,3,1])\n", | |
| "Mplayer = np.array([0.1, 0.2, 0.3, 0.4])\n", | |
| "Mbot = np.array([0.25, 0.25, 0.25, 0.25]) # equiv distribution\n", | |
| "\n", | |
| "# generate histogram of observations\n", | |
| "Ohist, _ = np.histogram(O, bins=len(Mplayer))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Bayesian Apporach\n", | |
| "\n", | |
| "Probability for Observation: $P(O|B) = \\prod_{i=1}^4 p_i^{n_i}$ where $n_i$ denotes the number of occurences of class i." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "P(O|B) = 3.814697265625e-06\n", | |
| "P(O|^B) = 2.880000000000001e-07\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "Pbot = np.prod(Mbot ** Ohist)\n", | |
| "Pplayer = np.prod(Mplayer ** Ohist)\n", | |
| "\n", | |
| "print(\"P(O|B) = {}\".format(Pbot))\n", | |
| "print(\"P(O|^B) = {}\".format(Pplayer))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Use Bayes to calculate $P(B|O)$ with $P(B) = 0.01$:\n", | |
| "\n", | |
| "$P(B|O) = \\frac{P(O|B)P(B)}{P(O)}$\n", | |
| "\n", | |
| "$P(O) = P(O|B)P(B) + P(O|\\overline{B})P(\\overline{B})$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "P(O) = 3.232669726562501e-07\n", | |
| "P(B|O) = 0.11800454696253197\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "Pb = 0.01\n", | |
| "Po = Pbot * Pb + Pplayer * (1-Pb)\n", | |
| "\n", | |
| "PBo = (Pbot * Pb) / Po\n", | |
| "\n", | |
| "print(\"P(O) = {}\".format(Po))\n", | |
| "print(\"P(B|O) = {}\".format(PBo))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Statistical Test\n", | |
| "\n", | |
| "It is also possible (and some would say more useful) to use an frequentist approach for analyzing the data. Here I decided to use the [Chi-squared Test](https://en.wikipedia.org/wiki/Chi-squared_test). The p-value of the test denotes how likely it is to get this result (or extremer ones) using the given model.\n", | |
| "\n", | |
| "Hence, calculate $\\mathcal{X}^2$ Values for each model." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "p-bot = 0.748\n", | |
| "p-player = 0.051\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# perform chi-square test\n", | |
| "Xplayer, pplayer = chisquare(Ohist, Mplayer * O.size)\n", | |
| "Xbot, pbot = chisquare(Ohist, Mbot * O.size)\n", | |
| "\n", | |
| "# output p-value\n", | |
| "print(\"p-bot = {0:.3f}\".format(pbot))\n", | |
| "print(\"p-player = {0:.3f}\".format(pplayer))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "With the calculated p-values we can conclude that the null-hypothesis $H_0:$ \"data is generated by a player\" cannot be rejected on a (common) significance niveau of $\\alpha = 0.05$, as $p_{player} > \\alpha$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.3" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment