Skip to content

Instantly share code, notes, and snippets.

View rdgozum's full-sized avatar
🤖
Hey there!

Ryan Paul Gozum rdgozum

🤖
Hey there!
  • Philippines
View GitHub Profile
@rdgozum
rdgozum / faster_toPandas.py
Created July 8, 2021 14:27 — forked from joshlk/faster_toPandas.py
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.