With GitHub Actions, a workflow can publish artifacts, typically logs or binaries. As of early 2020, the life time of an artifact is hard-coded to 90 days (this may change in the future). After 90 days, an artifact is automatically deleted. But, in the meantime, artifacts for a repository may accumulate and generate mega-bytes or even giga-bytes of data files.
It is unclear if there is a size limit for the total accumulated size of artifacts for a public repository. But GitHub cannot reasonably let multi-giga-bytes of artifacts data accumulate without doing anything. So, if your workflows regularly produce large artifacts (such as "nightly build" procedures for instance), it is wise to cleanup and delete older artifacts without waiting for the 90 days limit.
Using the Web page for the "Actions" of a repository, it is possible to browse old workflow runs and manually delete artifacts. But the procedure is slow and tedious. It is fine to delete one selected artifact. It is not for a regular cleanup. We need automation.
The GitHub Actions API gives the possibility to browse through the history of workflows and delete selected artifacts. The shell-script below is an example of artifact cleanup which can be run on a regular basis (from your crontab for instance).
Each artifact is identified by a "name" (the name parameter in the actions/upload-artifact step). The shell-script browses through all runs of all workflows on the selected repository. For each artifact name, the five most recent instances of this artifact are kept. All older ones are deleted.
Implementation notes:
- You need to customize your repository URL, your GitHub user name and GitHub personal token. This token must have admin rights to be allowed to delete artifacts.
- Since the list of existing artifacts can be very long, it is "paged", i.e. each API invocation returns only a "page" of artifacts. The script loops on all pages, starting with the main URL for the request. The URL of the next "page" can be found in the response headers.
- The artifacts are always returned from most recent to oldest. So, we skip the first
$KEEPones for each artifact name and delete all others.
#!/usr/bin/env bash
set -e
# Customize those three lines with your repository and credentials:
REPO=https://api.github.com/repos/OWNER/REPO
GITHUB_USER=your-github-user-name
GITHUB_TOKEN=token-with-workflow-rights-on-repo
# Number of most recent versions to keep for each artifact: (0 for total purge)
KEEP=0
# A shortcut to call GitHub API.
ghapi() { curl --silent --location --user $GITHUB_USER:$GITHUB_TOKEN "$@"; }
# A temporary file which receives HTTP response headers.
TMPFILE=/tmp/tmp.$$
# An associative array, key: artifact name, value: number of artifacts of that name.
declare -A ARTCOUNT
# Process all artifacts on this repository, loop on returned "pages".
URL=$REPO/actions/artifacts
JSON=$(ghapi --dump-header $TMPFILE "$URL")
# echo "$JSON" > test_json # dump json for debuging
declare -i TOTAL_SIZE=0
declare -i TOTAL_DELETED=0
while [[ -n "$URL" ]]; do
# Get current page, get response headers in a temporary file.
JSON=$(ghapi --dump-header $TMPFILE "$URL")
# Get URL of next page. Will be empty if we are at the last page.
URL=$(grep '^link:' "$TMPFILE" | tr ',' '\n' | grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')
rm -f $TMPFILE
# Number of artifacts on this page:
COUNT=$(( $(jq <<<$JSON -r '.artifacts | length') ))
TOTAL_SIZE+=$(( $(jq <<<$JSON -r '[.artifacts | .[] | .size_in_bytes] | add') ))
# printf "#total_size: %d\n" $TOTAL_SIZE
# Loop on all artifacts on this page.
for ((i=0; $i < $COUNT; i++)); do
# Get name of artifact and count instances of this name.
name=$(jq <<<$JSON -r ".artifacts[$i].name?")
ARTCOUNT[$name]=$(( $(( ${ARTCOUNT[$name]} )) + 1))
printf "#%d %s - %d\n" $i "$name" ${ARTCOUNT[$name]}
# Check if we must delete this one.
if [[ ${ARTCOUNT[$name]} -gt $KEEP ]]; then
id=$(jq <<<$JSON -r ".artifacts[$i].id?")
size=$(( $(jq <<<$JSON -r ".artifacts[$i].size_in_bytes?") ))
TOTAL_DELETED+=$size
printf "Deleting %s #%d, %d bytes\n" "$name" ${ARTCOUNT[$name]} $size
ghapi -X DELETE $REPO/actions/artifacts/$id
fi
done
done
printf "#total_size of artifacts in bytes: %d\n" $TOTAL_SIZE
printf "#total_size of deleted artifacts in bytes: %d\n" $TOTAL_DELETED
printf "#difference in bytes: %d\n" $(( TOTAL_SIZE - TOTAL_DELETED ))
This script has been successfully tested on macOS (with latest bash from Homebrew).
Pre-requisites:
bashversion 4 or higher because the script uses an associative array.curlto perform HTTP request on the GitHub API.jqto parse and query the JSON responses from this API.