Skip to content

Instantly share code, notes, and snippets.

@satorunooshie
Created May 6, 2025 00:46
Show Gist options
  • Select an option

  • Save satorunooshie/3e2a2dfe8b9081718147ea3eab2241bb to your computer and use it in GitHub Desktop.

Select an option

Save satorunooshie/3e2a2dfe8b9081718147ea3eab2241bb to your computer and use it in GitHub Desktop.
Create Reproducible ZIP Archives
import zipfile
import os
import sys
def create_reproducible_zip(source_dir, output_path, exclude_prefix=None):
with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf:
for root, _, files in os.walk(source_dir):
for file in sorted(files):
full_path = os.path.join(root, file)
rel_path = os.path.relpath(full_path, source_dir)
# 除外パスの処理(プレフィックス一致で除外)
if exclude_prefix and rel_path.startswith(exclude_prefix):
continue
info = zipfile.ZipInfo(rel_path)
info.date_time = (1980, 1, 1, 0, 0, 0)
info.create_system = 0 # avoid Unix metadata
with open(full_path, "rb") as f:
zf.writestr(info, f.read())
if __name__ == "__main__":
if len(sys.argv) < 3:
print("Usage: python create_reproducible_zip.py <source_dir> <output_zip> [exclude_prefix]")
sys.exit(1)
source_dir = sys.argv[1]
output_zip = sys.argv[2]
exclude_prefix = sys.argv[3] if len(sys.argv) >= 4 else None
create_reproducible_zip(source_dir, output_zip, exclude_prefix)
@satorunooshie
Copy link
Author

To ensure consistency and integrity when publishing artifacts such as source code or Lambda layers, we generate reproducible ZIP archives—ZIP files that produce the exact same binary output every time they are built from the same content.

This approach guarantees that:
• File timestamps inside the ZIP are fixed (e.g., to 1980-01-01 00:00:00)
• File order is consistent and sorted
• Extra metadata (such as OS-specific attributes or permissions) is excluded
• The resulting .zip has a deterministic SHA256 hash

Standard zip commands (like those from Info-ZIP 3.0) do not support SOURCE_DATE_EPOCH, so we use Python’s zipfile.ZipInfo to take full control over archive contents and metadata.

This is especially important when:
• Comparing ZIP artifacts in CI (e.g., GitHub Actions)
• Sharing ZIPs in open repositories or Gists where hash stability matters
• Verifying reproducibility across environments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment