fmoessbauer · July 21, 2017 12:26
diff --git a/solution5-1.ipynb b/solution5-1.ipynb
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.stats import chisquare"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bot Detection with Bayes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Observed data\n",
    "O = np.array([3,2,1,4,1,2,2,3,1])\n",
    "Mplayer = np.array([0.1, 0.2, 0.3, 0.4])\n",
    "Mbot    = np.array([0.25, 0.25, 0.25, 0.25]) # equiv distribution\n",
    "\n",
    "# generate histogram of observations\n",
    "Ohist, _ = np.histogram(O, bins=len(Mplayer))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bayesian Apporach\n",
    "\n",
    "Probability for Observation: $P(O|B) = \\prod_{i=1}^4 p_i^{n_i}$ where $n_i$ denotes the number of occurences of class i."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P(O|B)  = 3.814697265625e-06\n",
      "P(O|^B) = 2.880000000000001e-07\n"
     ]
    }
   ],
   "source": [
    "Pbot    = np.prod(Mbot    ** Ohist)\n",
    "Pplayer = np.prod(Mplayer ** Ohist)\n",
    "\n",
    "print(\"P(O|B)  = {}\".format(Pbot))\n",
    "print(\"P(O|^B) = {}\".format(Pplayer))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use Bayes to calculate $P(B|O)$ with $P(B) = 0.01$:\n",
    "\n",
    "$P(B|O) = \\frac{P(O|B)P(B)}{P(O)}$\n",
    "\n",
    "$P(O) = P(O|B)P(B) + P(O|\\overline{B})P(\\overline{B})$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P(O)   = 3.232669726562501e-07\n",
      "P(B|O) = 0.11800454696253197\n"
     ]
    }
   ],
   "source": [
    "Pb = 0.01\n",
    "Po = Pbot * Pb + Pplayer * (1-Pb)\n",
    "\n",
    "PBo = (Pbot * Pb) / Po\n",
    "\n",
    "print(\"P(O)   = {}\".format(Po))\n",
    "print(\"P(B|O) = {}\".format(PBo))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statistical Test\n",
    "\n",
    "It is also possible (and some would say more useful) to use an frequentist approach for analyzing the data. Here I decided to use the [Chi-squared Test](https://en.wikipedia.org/wiki/Chi-squared_test). The p-value of the test denotes how likely it is to get this result (or extremer ones) using the given model.\n",
    "\n",
    "Hence, calculate $\\mathcal{X}^2$ Values for each model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p-bot    = 0.748\n",
      "p-player = 0.051\n"
     ]
    }
   ],
   "source": [
    "# perform chi-square test\n",
    "Xplayer, pplayer = chisquare(Ohist, Mplayer * O.size)\n",
    "Xbot,    pbot    = chisquare(Ohist, Mbot    * O.size)\n",
    "\n",
    "# output p-value\n",
    "print(\"p-bot    = {0:.3f}\".format(pbot))\n",
    "print(\"p-player = {0:.3f}\".format(pplayer))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the calculated p-values we can conclude that the null-hypothesis $H_0:$ \"data is generated by a player\" cannot be rejected on a (common) significance niveau of $\\alpha = 0.05$, as $p_{player} > \\alpha$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
diff --git a/solution5-2.ipynb b/solution5-2.ipynb
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import numpy as np\n",
	"from scipy.stats import chisquare"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Bot Detection with Bayes"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"# Observed data\n",
	"O = np.array([3,2,1,4,1,2,2,3,1])\n",
	"Mplayer = np.array([0.1, 0.2, 0.3, 0.4])\n",
	"Mbot = np.array([0.25, 0.25, 0.25, 0.25]) # equiv distribution\n",
	"\n",
	"# generate histogram of observations\n",
	"Ohist, _ = np.histogram(O, bins=len(Mplayer))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Bayesian Apporach\n",
	"\n",
	"Probability for Observation: $P(O\|B) = \\prod_{i=1}^4 p_i^{n_i}$ where $n_i$ denotes the number of occurences of class i."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"P(O\|B) = 3.814697265625e-06\n",
	"P(O\|^B) = 2.880000000000001e-07\n"
	]
	}
	],
	"source": [
	"Pbot = np.prod(Mbot ** Ohist)\n",
	"Pplayer = np.prod(Mplayer ** Ohist)\n",
	"\n",
	"print(\"P(O\|B) = {}\".format(Pbot))\n",
	"print(\"P(O\|^B) = {}\".format(Pplayer))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Use Bayes to calculate $P(B\|O)$ with $P(B) = 0.01$:\n",
	"\n",
	"$P(B\|O) = \\frac{P(O\|B)P(B)}{P(O)}$\n",
	"\n",
	"$P(O) = P(O\|B)P(B) + P(O\|\\overline{B})P(\\overline{B})$"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"P(O) = 3.232669726562501e-07\n",
	"P(B\|O) = 0.11800454696253197\n"
	]
	}
	],
	"source": [
	"Pb = 0.01\n",
	"Po = Pbot * Pb + Pplayer * (1-Pb)\n",
	"\n",
	"PBo = (Pbot * Pb) / Po\n",
	"\n",
	"print(\"P(O) = {}\".format(Po))\n",
	"print(\"P(B\|O) = {}\".format(PBo))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Statistical Test\n",
	"\n",
	"It is also possible (and some would say more useful) to use an frequentist approach for analyzing the data. Here I decided to use the [Chi-squared Test](https://en.wikipedia.org/wiki/Chi-squared_test). The p-value of the test denotes how likely it is to get this result (or extremer ones) using the given model.\n",
	"\n",
	"Hence, calculate $\\mathcal{X}^2$ Values for each model."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"p-bot = 0.748\n",
	"p-player = 0.051\n"
	]
	}
	],
	"source": [
	"# perform chi-square test\n",
	"Xplayer, pplayer = chisquare(Ohist, Mplayer * O.size)\n",
	"Xbot, pbot = chisquare(Ohist, Mbot * O.size)\n",
	"\n",
	"# output p-value\n",
	"print(\"p-bot = {0:.3f}\".format(pbot))\n",
	"print(\"p-player = {0:.3f}\".format(pplayer))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"With the calculated p-values we can conclude that the null-hypothesis $H_0:$ \"data is generated by a player\" cannot be rejected on a (common) significance niveau of $\\alpha = 0.05$, as $p_{player} > \\alpha$"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.5.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}
No results found