Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save kristiyanto/4338ef09e7c57c86421e71001144a5d5 to your computer and use it in GitHub Desktop.

Select an option

Save kristiyanto/4338ef09e7c57c86421e71001144a5d5 to your computer and use it in GitHub Desktop.
heartbeat = transaction.select(f.max('created_date')).collect()[0][0] - pd.Timedelta('60 days')
all_users = (transaction.filter(f.col('created_date')>f.lit(heartbeat))
.select('user_id', f.concat_ws('|', 'birth_year', 'device').alias('stratum'), 'home_city')
.distinct()
)
fractions = all_users.select('stratum').distinct().withColumn('frac', lit(0.5)).rdd.collectAsMap()
group_A = all_users.sampleBy('stratum', fractions, 555)
group_B = all_users.subtract(group_A)
print('All users: ', all_users.count(), 'Group A:', group_A.count(), 'Group B:', group_B.count())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment