Skip to content

Instantly share code, notes, and snippets.

@grahamdaley
Last active August 29, 2015 14:17
Show Gist options
  • Select an option

  • Save grahamdaley/e3ea186a31a3427a4058 to your computer and use it in GitHub Desktop.

Select an option

Save grahamdaley/e3ea186a31a3427a4058 to your computer and use it in GitHub Desktop.
Data Science Lesson 3 – Classwork
#!/usr/bin/python
import pandas as pd
df = pd.read_csv('nytimes_in.csv')
group_cols = ['Age', 'Gender', 'Signed_In']
all_cols = group_cols + ['Clicks', 'Impressions']
dfg = df[all_cols].groupby(group_cols).agg([np.mean])
dfg['Click_Thru_Mean'] = dfg['Clicks'] / dfg['Impressions']
dfg = dfg.drop(['Clicks', 'Impressions'], axis=1)
dfg.to_csv('nytimes_aggregation.csv')
#!/usr/bin/python
import pandas as pd
import matplotlib.pyplot as plt
# This file is output from nytimes_aggregate.py, and then header corrected by hand
df = pd.read_csv('nytimes_agg_clean.csv')
df.plot(figsize=(18,10), x='Signed_In', y='Click_Thrus')
df.plot(figsize=(18,10), x='Age', y='Click_Thrus')
df.plot(figsize=(18,10), x='Gender', y='Click_Thrus')
plt.show()
#!/usr/bin/python
import requests
with open('nytimes_in.csv', 'w') as f:
for file in range(1, 31):
url = "http://stat.columbia.edu/~rachel/datasets/nyt{0}.csv".format(file)
print "Retrieving:", url
response = requests.get(url)
if response.ok:
lines = response.text.splitlines(True)
for i, line in enumerate(lines):
# Only copy the header once
if file == 1 or i > 0:
f.write(line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment