Skip to content

Instantly share code, notes, and snippets.

@sunfoxy2k
sunfoxy2k / target_encode
Created February 8, 2020 14:10 — forked from lmassaron/target_encode
Preprocessing scheme for high-cardinality categorical attributes
def add_noise(series, noise_level):
return series * (1 + noise_level * np.random.randn(len(series)))
def target_encode(trn_series=None, tst_series=None, target=None, k=1, f=1, noise_level=0):
"""
Encoding is computed like in the following paper by:
Micci-Barreca, Daniele. "A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems." ACM SIGKDD Explorations Newsletter 3.1 (2001): 27-32.
trn_series (pd.Series) : categorical feature in-sample