Skip to content

Instantly share code, notes, and snippets.

@KatsuyaITO
Last active November 20, 2016 06:22
Show Gist options
  • Select an option

  • Save KatsuyaITO/e45a75526e7cd4ea59eaad3ecb79c2b1 to your computer and use it in GitHub Desktop.

Select an option

Save KatsuyaITO/e45a75526e7cd4ea59eaad3ecb79c2b1 to your computer and use it in GitHub Desktop.
import pandas as pd
import numpy as np
from scipy import stats
from collections import namedtuple
BrunnerMunzelResult = namedtuple('BrunnerMunzelResult', ('statistic','pvalue'))
def brunner_munzel_test(x,y,alternative="two_sided",alpha=0.5):
"""
Computes the Brunner Munzel statistic
Missing values in `x` and/or `y` are discarded.
Parameters
----------
x : sequence
Input
y : sequence
Input
alternative : {greater, less, two_sided }
Returns
-------
statistic : float
The Brunner Munzel statistics
pvalue : float
Approximate p-value assuming a t distribution.
"""
x = np.ma.asarray(x).compressed().view(np.ndarray)
y = np.ma.asarray(y).compressed().view(np.ndarray)
ranks = stats.rankdata(np.concatenate([x,y]))
(nx, ny) = (len(x), len(y))
rankx = stats.rankdata(x)
ranky = stats.rankdata(y)
rank_mean1 = np.mean(ranks[0:nx])
rank_mean2 = np.mean(ranks[nx :nx+ny])
pst = (rank_mean2 - (ny + 1)/2)/nx
v1_set = [(i - j - rank_mean1 + (nx + 1)/2)**2 for (i,j) in zip(ranks[0:nx],rankx)]
v2_set = [(i - j - rank_mean2 + (ny + 1)/2)**2 for (i,j) in zip(ranks[nx :nx+ny] ,ranky)]
v1 = np.sum(v1_set)/(nx - 1)
v2 = np.sum(v2_set)/(ny - 1)
statistic = nx * ny * (rank_mean2 - rank_mean1)/(nx + ny)/np.sqrt(nx * v1 + ny * v2)
dfbm = ((nx * v1 + ny * v2)**2)/(((nx * v1)**2)/(nx - 1) + ((ny * v2)**2)/(ny - 1))
if ((alternative == "greater") | (alternative == "g")) :
prob = stats.t.cdf(statistic,dfbm)
elif ((alternative == "less") | (alternative == "l")) :
prob = 1-stats.t.cdf(statistic,dfbm)
else:
abst = np.abs(statistic)
prob = stats.t.cdf(abst,dfbm)
prob= 2 * min(prob, 1-prob)
return BrunnerMunzelResult(statistic,prob)
@KatsuyaITO
Copy link
Author

This function performs the Brunner-Munzel test for stochastic equality of two samples, which is also known as the Generalized Wilcoxon Test. NAs from the data are omitted.
Although R package "lawstat" provides functions for Brunner-Munzel test , scipy is not ready for this test.

References

  • Brunner, E. and Munzel, U. (2000) The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation, Biometrical Journal 42, 17-25. Neubert, K., Brunner, E. (2007) A Studentized Permutation Test for the Non-parametric

  • Behrens-Fisher Problem, Computational Statistics and Data Analysis 51, 5192-5204. Reiczigel, J., Zakarias, I. and Rozsa, L. (2005) A Bootstrap Test of Stochastic Equality of Two Populations, The American Statistician 59, 1-6.

  • lawstat: Tools for Biostatistics, Public Policy, and Law https://cran.r-project.org/web/packages/lawstat/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment