Skip to content

Instantly share code, notes, and snippets.

@Jazznight
Forked from glts/README
Created May 25, 2016 07:30
Show Gist options
  • Select an option

  • Save Jazznight/7f711d83c433ba44c1cfb0a926c13e08 to your computer and use it in GitHub Desktop.

Select an option

Save Jazznight/7f711d83c433ba44c1cfb0a926c13e08 to your computer and use it in GitHub Desktop.
Benchmark for Vim regexp engine performance
Regular expressions and data from
http://lh3lh3.users.sourceforge.net/reb.shtml
Regular expressions benchmarked:
URI ([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?
Email ([^ @]+)@([^ @]+)
Date ([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)
URI|Email ([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)
Word .*SCSI-
Results (in seconds):
URI Email Date Sum3 URI|Email Word
re=1 16.34 13.65 4.07 34.06 29.46 0.49
re=2 92.03 9.75 4.47 106.25 105.39 5.22
Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40
Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31
GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03
(Five runs each, Vim 7.3.1010, 64-bit i7-2700K CPU @ 3.50GHz x 8.)
The Vim results were obtained with the bench.sh script.
Python, Perl, and egrep were timed in similar fashion using these invocations:
perl script.pl 'pattern' </path/to/data/howto >/dev/null
python script.py 'pattern' </path/to/data/howto >/dev/null
egrep 'pattern' /path/to/data/howto >/dev/null
The data file "howto" (~38M) is available at
http://people.unipmn.it/manzini/lightweight/corpus/howto.bz2
#!/bin/bash
# Usage: ./bench.sh <engine> <script>
# where engine is (1|2)
# script is (uri|email|date|uriemail|word)
VIM="/path/to/vim/src/vim"
DATA="/path/to/data/howto"
vimrc="vimrc-${1:-1}"
rescript="re-${2:-word}.vim"
cmd=( "${VIM}" -N -u "${vimrc}" -i NONE -n -e -s -S "${rescript}" +quit "${DATA}" )
echo "${cmd[@]}" >&2
tmpfile="/tmp/,,tmp.$$"
for i in {1..5}; do
\time -f '%e' -ao "${tmpfile}" "${cmd[@]}" &>/dev/null
echo -n . >&2
done
echo >&2
result=$( awk '{ sum += $1 } END { printf "%.2f", sum / 5 }' "${tmpfile}" )
rm -f "${tmpfile}"
echo "${result}"
g/\%([0-9][0-9]\=\)\/\%([0-9][0-9]\=\)\/\%([0-9][0-9]\%([0-9][0-9]\)\=\)/p
g/\%([^ @]\+\)@\%([^ @]\+\)/p
g/\%([a-zA-Z][a-zA-Z0-9]*\):\/\/\%([^ /]\+\)\%(\/[^ ]*\)\=/p
g/\%([a-zA-Z][a-zA-Z0-9]*\):\/\/\%([^ /]\+\)\%(\/[^ ]*\)\=\|\%([^ @]\+\)@\%([^ @]\+\)/p
#!/usr/bin/env perl
use strict;
use warnings;
my $reobj = qr/$ARGV[0]/;
while (<STDIN>) {
print $_ if /$reobj/;
}
#!/usr/bin/env python
import re
import sys
reobj = re.compile(sys.argv[1])
for line in sys.stdin:
if reobj.search(line):
sys.stdout.write(line)
if exists('&regexpengine')
set regexpengine=1
endif
if exists('&regexpengine')
set regexpengine=2
endif
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment