Skip to content

Instantly share code, notes, and snippets.

@RayOnFire
Created September 20, 2018 03:25
Show Gist options
  • Select an option

  • Save RayOnFire/ee328f37efac44e81850582113478983 to your computer and use it in GitHub Desktop.

Select an option

Save RayOnFire/ee328f37efac44e81850582113478983 to your computer and use it in GitHub Desktop.
[Pandas with json fields] #Python #Pandas
def load_df(csv_path='../input/train.csv', nrows=None):
JSON_COLUMNS = ['device', 'geoNetwork', 'totals', 'trafficSource']
df = pd.read_csv(csv_path,
converters={column: json.loads for column in JSON_COLUMNS},
dtype={'fullVisitorId': 'str'}, # Important!!
nrows=nrows)
for column in JSON_COLUMNS:
column_as_df = json_normalize(df[column].tolist()) # 这里tolist跟其他教程不一样,因为直接输入Series在这里不能解析,不知道是不是版本原因
column_as_df.columns = [f"{column}.{subcolumn}" for subcolumn in column_as_df.columns]
df = df.drop(column, axis=1).merge(column_as_df, right_index=True, left_index=True)
print(f"Loaded {os.path.basename(csv_path)}. Shape: {df.shape}")
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment