Created
May 6, 2025 00:46
-
-
Save satorunooshie/3e2a2dfe8b9081718147ea3eab2241bb to your computer and use it in GitHub Desktop.
Create Reproducible ZIP Archives
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import zipfile | |
| import os | |
| import sys | |
| def create_reproducible_zip(source_dir, output_path, exclude_prefix=None): | |
| with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf: | |
| for root, _, files in os.walk(source_dir): | |
| for file in sorted(files): | |
| full_path = os.path.join(root, file) | |
| rel_path = os.path.relpath(full_path, source_dir) | |
| # 除外パスの処理(プレフィックス一致で除外) | |
| if exclude_prefix and rel_path.startswith(exclude_prefix): | |
| continue | |
| info = zipfile.ZipInfo(rel_path) | |
| info.date_time = (1980, 1, 1, 0, 0, 0) | |
| info.create_system = 0 # avoid Unix metadata | |
| with open(full_path, "rb") as f: | |
| zf.writestr(info, f.read()) | |
| if __name__ == "__main__": | |
| if len(sys.argv) < 3: | |
| print("Usage: python create_reproducible_zip.py <source_dir> <output_zip> [exclude_prefix]") | |
| sys.exit(1) | |
| source_dir = sys.argv[1] | |
| output_zip = sys.argv[2] | |
| exclude_prefix = sys.argv[3] if len(sys.argv) >= 4 else None | |
| create_reproducible_zip(source_dir, output_zip, exclude_prefix) |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To ensure consistency and integrity when publishing artifacts such as source code or Lambda layers, we generate reproducible ZIP archives—ZIP files that produce the exact same binary output every time they are built from the same content.
This approach guarantees that:
• File timestamps inside the ZIP are fixed (e.g., to 1980-01-01 00:00:00)
• File order is consistent and sorted
• Extra metadata (such as OS-specific attributes or permissions) is excluded
• The resulting .zip has a deterministic SHA256 hash
Standard zip commands (like those from Info-ZIP 3.0) do not support
SOURCE_DATE_EPOCH, so we use Python’s zipfile.ZipInfo to take full control over archive contents and metadata.This is especially important when:
• Comparing ZIP artifacts in CI (e.g., GitHub Actions)
• Sharing ZIPs in open repositories or Gists where hash stability matters
• Verifying reproducibility across environments