Skip to content

Instantly share code, notes, and snippets.

@fmoessbauer
Last active July 21, 2017 12:26
Show Gist options
  • Select an option

  • Save fmoessbauer/22a03a8c8eeeedfbe03c4cea7f167cdb to your computer and use it in GitHub Desktop.

Select an option

Save fmoessbauer/22a03a8c8eeeedfbe03c4cea7f167cdb to your computer and use it in GitHub Desktop.
MMMOG Exercise 5
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"from scipy.stats import chisquare"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bot Detection with Bayes"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Observed data\n",
"O = np.array([3,2,1,4,1,2,2,3,1])\n",
"Mplayer = np.array([0.1, 0.2, 0.3, 0.4])\n",
"Mbot = np.array([0.25, 0.25, 0.25, 0.25]) # equiv distribution\n",
"\n",
"# generate histogram of observations\n",
"Ohist, _ = np.histogram(O, bins=len(Mplayer))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bayesian Apporach\n",
"\n",
"Probability for Observation: $P(O|B) = \\prod_{i=1}^4 p_i^{n_i}$ where $n_i$ denotes the number of occurences of class i."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"P(O|B) = 3.814697265625e-06\n",
"P(O|^B) = 2.880000000000001e-07\n"
]
}
],
"source": [
"Pbot = np.prod(Mbot ** Ohist)\n",
"Pplayer = np.prod(Mplayer ** Ohist)\n",
"\n",
"print(\"P(O|B) = {}\".format(Pbot))\n",
"print(\"P(O|^B) = {}\".format(Pplayer))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use Bayes to calculate $P(B|O)$ with $P(B) = 0.01$:\n",
"\n",
"$P(B|O) = \\frac{P(O|B)P(B)}{P(O)}$\n",
"\n",
"$P(O) = P(O|B)P(B) + P(O|\\overline{B})P(\\overline{B})$"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"P(O) = 3.232669726562501e-07\n",
"P(B|O) = 0.11800454696253197\n"
]
}
],
"source": [
"Pb = 0.01\n",
"Po = Pbot * Pb + Pplayer * (1-Pb)\n",
"\n",
"PBo = (Pbot * Pb) / Po\n",
"\n",
"print(\"P(O) = {}\".format(Po))\n",
"print(\"P(B|O) = {}\".format(PBo))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Statistical Test\n",
"\n",
"It is also possible (and some would say more useful) to use an frequentist approach for analyzing the data. Here I decided to use the [Chi-squared Test](https://en.wikipedia.org/wiki/Chi-squared_test). The p-value of the test denotes how likely it is to get this result (or extremer ones) using the given model.\n",
"\n",
"Hence, calculate $\\mathcal{X}^2$ Values for each model."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p-bot = 0.748\n",
"p-player = 0.051\n"
]
}
],
"source": [
"# perform chi-square test\n",
"Xplayer, pplayer = chisquare(Ohist, Mplayer * O.size)\n",
"Xbot, pbot = chisquare(Ohist, Mbot * O.size)\n",
"\n",
"# output p-value\n",
"print(\"p-bot = {0:.3f}\".format(pbot))\n",
"print(\"p-player = {0:.3f}\".format(pplayer))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the calculated p-values we can conclude that the null-hypothesis $H_0:$ \"data is generated by a player\" cannot be rejected on a (common) significance niveau of $\\alpha = 0.05$, as $p_{player} > \\alpha$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment