Skip to content

Instantly share code, notes, and snippets.

@ClaudioGM
ClaudioGM / gist:ac1f31113b80ea3e1c567b648508db97
Created August 12, 2021 11:19 — forked from bortzmeyer/gist:1284249
The only simple way to do SSH in Python today is to use subprocess + OpenSSH...
#!/usr/bin/python
# All SSH libraries for Python are junk (2011-10-13).
# Too low-level (libssh2), too buggy (paramiko), too complicated
# (both), too poor in features (no use of the agent, for instance)
# Here is the right solution today:
import subprocess
import sys
import airflow.hooks.S3_hook
def upload_file_to_S3_with_hook(filename, key, bucket_name):
hook = airflow.hooks.S3_hook.S3Hook('my_S3_conn')
hook.load_file(filename, key, bucket_name)
@ClaudioGM
ClaudioGM / airflow-s3-hook.py
Created May 29, 2021 19:07 — forked from bdnf/airflow-s3-hook.py
Creating an S3 hook in Apache Airflow
import datetime
import logging
from airflow import DAG
from airflow.models import Variable
from airflow.operators.python_operator import PythonOperator
from airflow.hooks.S3_hook import S3Hook
def list_keys():
@ClaudioGM
ClaudioGM / pandas_s3_streaming.py
Created May 29, 2021 18:51
Streaming pandas DataFrame to/from S3 with on-the-fly processing and GZIP compression
def s3_to_pandas(client, bucket, key, header=None):
# get key using boto3 client
obj = client.get_object(Bucket=bucket, Key=key)
gz = gzip.GzipFile(fileobj=obj['Body'])
# load stream directly to DF
return pd.read_csv(gz, header=header, dtype=str)
def s3_to_pandas_with_processing(client, bucket, key, header=None):