Skip to content

Instantly share code, notes, and snippets.

View ikreymer's full-sized avatar

Ilya Kreymer ikreymer

View GitHub Profile
@evertheylen
evertheylen / multi-gzip.ts
Created October 4, 2024 19:53
Drop-in replacement for `new DecompressionStream("gzip")`, which can handle multiple concatenated streams
// Drop-in replacement for `new DecompressionStream("gzip")`, which can handle multiple concatenated streams
// --> see https://github.com/whatwg/compression/issues/39
// Example usage:
// (await fetch("foobar.gz")).body.pipeThrough(multiGzipDecompressor()).pipeTo(...)
// Copyright 2024 Evert Heylen.
// License: MIT
// ------------------------------------------------
@anarchivist
anarchivist / pcdm-profile-template.md
Last active October 24, 2017 15:46
Profile template for PCDM-based models

title: PCDM Profile Template author:

  • Mark A. Matienzo
  • Christina Harlow date: 2016-10-20 profile: project: PCDM Profiles namespaces: pcdm: http://pcdm.org/models#
# -*- coding: utf-8 -*-
"""
common-crawl-cdx.py
A simple example program to analyze the Common Crawl index.
This is implemented as a single stream job which accesses S3 via HTTP,
so that it can be easily be run from any laptop, but it could easily be
converted to an EMR job which processed the 300 index files in parallel.
@atomotic
atomotic / Readme.md
Last active September 9, 2022 09:39
Internet Archive Save Page Now