Skip to content

Instantly share code, notes, and snippets.

@apuignav
Forked from rcoup/rsync_parallel.sh
Last active December 30, 2015 21:29
Show Gist options
  • Select an option

  • Save apuignav/7888068 to your computer and use it in GitHub Desktop.

Select an option

Save apuignav/7888068 to your computer and use it in GitHub Desktop.
#!/bin/bash
set -e
# Usage:
# rsync_parallel.sh [--parallel=N] [rsync args...]
#
# Options:
# --parallel=N Use N parallel processes for transfer. Defaults to 10.
#
# Notes:
# * Requires GNU Parallel
# * Use with ssh-keys. Lots of password prompts will get very annoying.
# * Does an itemize-changes first, then chunks the resulting file list and launches N parallel
# rsyncs to transfer a chunk each.
# * be a little careful with the options you pass through to rsync. Normal ones will work, you
# might want to test weird options upfront.
#
if [[ "$1" == --parallel=* ]]; then
PARALLEL="${1##*=}"
shift
else
PARALLEL=10
fi
echo "Using up to $PARALLEL processes for transfer..."
TMPDIR=$(mktemp -d 2>/dev/null || mktemp -d -t 'rsync_parallel_tmp')
trap "rm -rf $TMPDIR" EXIT
echo "Using $TMPDIR temp dir..."
echo "Figuring out file list..."
# sorted by size (descending)
rsync $@ --out-format="%l %n" --no-v --dry-run | sort -n -r > $TMPDIR/files.all
# check for nothing-to-do
TOTAL_FILES=$(cat $TMPDIR/files.all | wc -l)
if [ "$TOTAL_FILES" -eq "0" ]; then
echo "Nothing to transfer :)"
exit 0
fi
function array_min {
# return the (index, value) of the minimum element in the array
IC=($(tr ' ' '\n' <<<$@ | cat -n | sort -k2,2nr | tail -n1))
echo $((${IC[0]} - 1)) ${IC[1]}
}
echo "Calculating chunks..."
# declare chunk-size array
for ((I = 0 ; I < PARALLEL ; I++ )); do
CHUNKS["$I"]=0
done
# add each file to the emptiest chunk, so they're as balanced by size as possible
#cat $TMPDIR/files.all
while read FSIZE FPATH; do
MIN=($(array_min ${CHUNKS[@]}))
FSIZE=$(echo "$FSIZE" | tr -d '\r')
#echo "Smallest chunk is $MIN"
#echo "Adding $FSIZE to ${CHUNKS["${MIN[0]}"]}" | sed -n l
#echo ""
CHUNKS["${MIN[0]}"]=$((${CHUNKS["${MIN[0]}"]} + $FSIZE))
#echo "Added"
echo $FPATH
echo $FPATH >> $TMPDIR/chunk.${MIN[0]}
done < $TMPDIR/files.all
# I have to comment this in order to avoid Linux/Mac incompatibilities
# see http://stackoverflow.com/questions/752818/why-does-macs-find-not-have-the-option-printf
#find "$TMPDIR" -type f -name "chunk.*" -printf "\n*** %s ***\n" -exec cat {} \;
echo "Starting transfers..."
find "$TMPDIR" -type f -name "chunk.*" | parallel --delay=2 -j $PARALLEL -t --verbose --progress rsync --files-from={} $@
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment