title: PCDM Profile Template author:
- Mark A. Matienzo
- Christina Harlow date: 2016-10-20 profile: project: PCDM Profiles namespaces: pcdm: http://pcdm.org/models#
| // Drop-in replacement for `new DecompressionStream("gzip")`, which can handle multiple concatenated streams | |
| // --> see https://github.com/whatwg/compression/issues/39 | |
| // Example usage: | |
| // (await fetch("foobar.gz")).body.pipeThrough(multiGzipDecompressor()).pipeTo(...) | |
| // Copyright 2024 Evert Heylen. | |
| // License: MIT | |
| // ------------------------------------------------ |
title: PCDM Profile Template author:
| # -*- coding: utf-8 -*- | |
| """ | |
| common-crawl-cdx.py | |
| A simple example program to analyze the Common Crawl index. | |
| This is implemented as a single stream job which accesses S3 via HTTP, | |
| so that it can be easily be run from any laptop, but it could easily be | |
| converted to an EMR job which processed the 300 index files in parallel. |
put the function in your .zshrc or .bashrc and then
~ ia-save http://twitter.com/atomotic
https://web.archive.org/web/20140702123925/http://twitter.com/atomotic