Last active
October 23, 2024 07:54
-
-
Save folkertdev/977183fb706b7693863bd7f358578292 to your computer and use it in GitHub Desktop.
Revisions
-
folkertdev revised this gist
Oct 22, 2024 . 1 changed file with 92 additions and 90 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -19,119 +19,121 @@ this runs 4 benchmarks - labeled-match-len: the `len` function and friends now use labeled match - labeled-match-fast: the `len` and friends, and `inflate_fast_help` functions now use labeld match The benchmark is run for various chunk sizes (2 to the power 4, 7 and 16), which varies what logic is run: a chunk size of 2^16 spends most time in an inner loop, while 2^4 spends much more time in the state machine logic. ## results Mostly what we see is that labeled match gives significant speedups for small chunk sizes. For larger chunk sizes, the results are less clear. I believe really the result is net-zero, but we need clearly need to perform some further tuning. Note in particular how `loop-plus-match` works well for small _and_ big inputs, but terribly for medium inputs. The `labeled-match-fast` change barely seems to do anything versus just `labeled-match-len`. ``` Benchmark 1 (69 runs): /tmp/uncompress-baseline rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 72.7ms ± 4.55ms 71.1ms … 108ms 3 ( 4%) 0% peak_rss 24.1MB ± 56.3KB 23.9MB … 24.1MB 12 (17%) 0% cpu_cycles 294M ± 17.7M 290M … 434M 6 ( 9%) 0% instructions 914M ± 449 914M … 914M 1 ( 1%) 0% cache_references 2.99M ± 407K 2.68M … 6.10M 4 ( 6%) 0% cache_misses 134K ± 23.8K 101K … 302K 2 ( 3%) 0% branch_misses 4.09M ± 8.56K 4.08M … 4.14M 2 ( 3%) 0% Benchmark 2 (71 runs): /tmp/loop-plus-match rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 70.6ms ± 499us 69.9ms … 72.9ms 3 ( 4%) ⚡- 2.8% ± 1.5% peak_rss 24.1MB ± 60.2KB 24.0MB … 24.1MB 0 ( 0%) - 0.1% ± 0.1% cpu_cycles 287M ± 1.53M 285M … 294M 5 ( 7%) ⚡- 2.7% ± 1.4% instructions 792M ± 336 792M … 792M 0 ( 0%) ⚡- 13.4% ± 0.0% cache_references 2.92M ± 81.8K 2.77M … 3.12M 0 ( 0%) - 2.2% ± 3.2% cache_misses 113K ± 12.4K 89.9K … 169K 3 ( 4%) ⚡- 16.0% ± 4.7% branch_misses 4.10M ± 4.76K 4.09M … 4.13M 3 ( 4%) + 0.2% ± 0.1% Benchmark 3 (80 runs): /tmp/labeled-match-len rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 62.6ms ± 555us 61.7ms … 66.1ms 2 ( 3%) ⚡- 14.0% ± 1.4% peak_rss 24.1MB ± 77.9KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.1% cpu_cycles 249M ± 1.87M 248M … 263M 5 ( 6%) ⚡- 15.4% ± 1.3% instructions 686M ± 267 686M … 686M 0 ( 0%) ⚡- 24.9% ± 0.0% cache_references 3.01M ± 480K 2.76M … 7.16M 2 ( 3%) + 0.5% ± 4.8% cache_misses 123K ± 6.92K 100K … 137K 2 ( 3%) ⚡- 8.4% ± 4.1% branch_misses 4.08M ± 2.78K 4.08M … 4.09M 4 ( 5%) - 0.1% ± 0.0% Benchmark 4 (81 runs): /tmp/labeled-match-fast rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 61.9ms ± 468us 61.3ms … 64.0ms 6 ( 7%) ⚡- 14.8% ± 1.4% peak_rss 24.1MB ± 56.9KB 24.0MB … 24.1MB 20 (25%) - 0.0% ± 0.1% cpu_cycles 246M ± 1.53M 245M … 255M 12 (15%) ⚡- 16.4% ± 1.3% instructions 689M ± 365 689M … 689M 1 ( 1%) ⚡- 24.6% ± 0.0% cache_references 2.97M ± 211K 2.79M … 4.37M 3 ( 4%) - 0.8% ± 3.4% cache_misses 91.4K ± 9.22K 72.0K … 114K 0 ( 0%) ⚡- 31.8% ± 4.2% branch_misses 4.08M ± 4.06K 4.08M … 4.10M 1 ( 1%) - 0.1% ± 0.1% Benchmark 1 (108 runs): /tmp/uncompress-baseline rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 46.3ms ± 1.37ms 45.0ms … 56.5ms 10 ( 9%) 0% peak_rss 24.1MB ± 56.3KB 24.0MB … 24.1MB 26 (24%) 0% cpu_cycles 174M ± 4.67M 173M … 214M 10 ( 9%) 0% instructions 516M ± 443 516M … 516M 3 ( 3%) 0% cache_references 3.16M ± 219K 2.88M … 4.43M 6 ( 6%) 0% cache_misses 83.6K ± 11.6K 62.4K … 161K 5 ( 5%) 0% branch_misses 2.00M ± 5.17K 2.00M … 2.04M 9 ( 8%) 0% Benchmark 2 (78 runs): /tmp/loop-plus-match rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 64.5ms ± 1.29ms 63.6ms … 71.3ms 6 ( 8%) 💩+ 39.3% ± 0.8% peak_rss 24.1MB ± 72.6KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.1% cpu_cycles 257M ± 3.65M 255M … 279M 12 (15%) 💩+ 47.4% ± 0.7% instructions 720M ± 384 720M … 720M 1 ( 1%) 💩+ 39.7% ± 0.0% cache_references 3.21M ± 182K 2.97M … 4.32M 3 ( 4%) + 1.8% ± 1.9% cache_misses 57.9K ± 8.34K 47.0K … 98.8K 4 ( 5%) ⚡- 30.7% ± 3.6% branch_misses 2.00M ± 2.44K 2.00M … 2.01M 5 ( 6%) - 0.2% ± 0.1% Benchmark 3 (111 runs): /tmp/labeled-match-len rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 45.2ms ± 781us 44.2ms … 51.3ms 5 ( 5%) ⚡- 2.5% ± 0.6% peak_rss 24.1MB ± 66.5KB 23.9MB … 24.1MB 0 ( 0%) - 0.0% ± 0.1% cpu_cycles 170M ± 3.34M 168M … 199M 9 ( 8%) ⚡- 2.8% ± 0.6% instructions 510M ± 359 510M … 510M 1 ( 1%) ⚡- 1.0% ± 0.0% cache_references 3.21M ± 178K 2.97M … 4.55M 6 ( 5%) + 1.7% ± 1.7% cache_misses 31.2K ± 4.33K 25.0K … 52.0K 6 ( 5%) ⚡- 62.7% ± 2.8% branch_misses 1.99M ± 1.35K 1.99M … 2.00M 5 ( 5%) - 0.4% ± 0.0% Benchmark 4 (111 runs): /tmp/labeled-match-fast rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 45.1ms ± 454us 44.3ms … 46.8ms 5 ( 5%) ⚡- 2.5% ± 0.6% peak_rss 24.1MB ± 73.0KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.1% cpu_cycles 169M ± 1.27M 168M … 176M 10 ( 9%) ⚡- 3.0% ± 0.5% instructions 515M ± 295 515M … 515M 0 ( 0%) - 0.1% ± 0.0% cache_references 3.20M ± 81.1K 2.97M … 3.38M 0 ( 0%) + 1.5% ± 1.4% cache_misses 47.5K ± 8.09K 36.6K … 72.1K 1 ( 1%) ⚡- 43.2% ± 3.2% branch_misses 2.00M ± 1.86K 1.99M … 2.00M 2 ( 2%) - 0.2% ± 0.1% Benchmark 1 (182 runs): /tmp/uncompress-baseline rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.5ms ± 474us 26.7ms … 31.4ms 7 ( 4%) 0% peak_rss 24.1MB ± 48.1KB 24.0MB … 24.1MB 29 (16%) 0% cpu_cycles 90.0M ± 1.27M 89.4M … 102M 17 ( 9%) 0% instructions 239M ± 253 239M … 239M 3 ( 2%) 0% cache_references 2.28M ± 57.3K 2.20M … 2.78M 5 ( 3%) 0% cache_misses 48.4K ± 2.85K 43.3K … 68.4K 5 ( 3%) 0% branch_misses 1.05M ± 1.62K 1.05M … 1.06M 2 ( 1%) 0% Benchmark 2 (186 runs): /tmp/loop-plus-match rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 26.9ms ± 335us 26.3ms … 29.5ms 4 ( 2%) ⚡- 2.4% ± 0.3% peak_rss 24.1MB ± 63.3KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.0% cpu_cycles 87.2M ± 732K 86.8M … 94.7M 20 (11%) ⚡- 3.1% ± 0.2% instructions 248M ± 262 248M … 248M 0 ( 0%) 💩+ 3.8% ± 0.0% cache_references 2.26M ± 85.8K 2.18M … 2.97M 5 ( 3%) - 0.8% ± 0.7% cache_misses 52.0K ± 2.13K 47.6K … 67.0K 5 ( 3%) 💩+ 7.4% ± 1.1% branch_misses 1.05M ± 1.58K 1.04M … 1.05M 2 ( 1%) - 0.5% ± 0.0% Benchmark 3 (182 runs): /tmp/labeled-match-len rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.4ms ± 407us 26.7ms … 31.6ms 3 ( 2%) - 0.4% ± 0.3% peak_rss 24.1MB ± 64.3KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.0% cpu_cycles 89.3M ± 985K 88.8M … 101M 13 ( 7%) - 0.8% ± 0.3% instructions 254M ± 326 254M … 254M 2 ( 1%) 💩+ 6.1% ± 0.0% cache_references 2.26M ± 68.6K 2.18M … 2.75M 3 ( 2%) - 0.9% ± 0.6% cache_misses 55.1K ± 2.58K 50.5K … 68.0K 4 ( 2%) 💩+ 13.8% ± 1.2% branch_misses 1.05M ± 2.09K 1.04M … 1.05M 0 ( 0%) - 0.4% ± 0.0% Benchmark 4 (182 runs): /tmp/labeled-match-fast rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.5ms ± 703us 26.8ms … 33.7ms 6 ( 3%) + 0.0% ± 0.4% peak_rss 24.1MB ± 62.8KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.0% cpu_cycles 89.8M ± 1.93M 89.1M … 108M 20 (11%) - 0.3% ± 0.4% instructions 253M ± 257 253M … 253M 1 ( 1%) 💩+ 5.7% ± 0.0% cache_references 2.26M ± 101K 2.18M … 3.05M 7 ( 4%) - 1.1% ± 0.7% cache_misses 50.2K ± 2.61K 45.6K … 66.8K 5 ( 3%) 💩+ 3.8% ± 1.2% branch_misses 1.05M ± 1.14K 1.05M … 1.06M 2 ( 1%) - 0.2% ± 0.0% ``` -
folkertdev renamed this gist
Oct 22, 2024 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
folkertdev created this gist
Oct 22, 2024 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,137 @@ # zlib-rs labeled match benchmarks ## build the toolchain A proof of concept implementation can be found at https://github.com/trifectatechfoundation/rust/tree/labeled-match. Build it with `./x build`, and then set up the [toolchain](https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html#creating-a-rustup-toolchain). Now `cargo +stage1 build` should use a compiler with `labeled-match` available. ## run the benchmark ``` git clone https://github.com/trifectatechfoundation/zlib-rs.git git checkout len-as-match sh replicate-labeled-match-benchmarks.sh ``` this runs 4 benchmarks - baseline: the current zlib-rs main branch approach using tail calls - loop-plus-match: standard approach using a loop and match; suffers from branch misprediction - labeled-match-len: the `len` function and friends now use labeled match - labeled-match-fast: the `len` and friends, and `inflate_fast_help` functions now use labeld match ## results Mostly what we see is that labeled match gives significant speedups for small chunk sizes. For larger chunk sizes, the results are less clear. I believe really the result is net-zero, but we need clearly need to perform some further tuning. Note in particular how `loop-plus-match` works well for small _and_ big inputs, but terribly for medium inputs. ``` Benchmark 1 (69 runs): /tmp/uncompress-baseline rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 72.5ms ± 2.46ms 71.0ms … 91.0ms 7 (10%) 0% peak_rss 24.1MB ± 77.8KB 23.9MB … 24.1MB 0 ( 0%) 0% cpu_cycles 294M ± 9.90M 291M … 371M 7 (10%) 0% instructions 914M ± 274 914M … 914M 0 ( 0%) 0% cache_references 3.04M ± 519K 2.69M … 6.09M 4 ( 6%) 0% cache_misses 156K ± 39.1K 126K … 463K 1 ( 1%) 0% branch_misses 4.09M ± 10.8K 4.08M … 4.17M 5 ( 7%) 0% Benchmark 2 (71 runs): /tmp/loop-plus-match rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 70.9ms ± 817us 69.9ms … 76.2ms 2 ( 3%) ⚡- 2.1% ± 0.8% peak_rss 24.1MB ± 58.4KB 24.0MB … 24.1MB 0 ( 0%) + 0.1% ± 0.1% cpu_cycles 287M ± 2.33M 285M … 305M 3 ( 4%) ⚡- 2.5% ± 0.8% instructions 792M ± 300 792M … 792M 1 ( 1%) ⚡- 13.4% ± 0.0% cache_references 2.98M ± 89.8K 2.78M … 3.33M 1 ( 1%) - 1.9% ± 4.0% cache_misses 154K ± 15.5K 115K … 207K 1 ( 1%) - 1.6% ± 6.3% branch_misses 4.10M ± 3.85K 4.09M … 4.11M 1 ( 1%) + 0.2% ± 0.1% Benchmark 3 (79 runs): /tmp/labeled-match-len rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 63.9ms ± 1.66ms 62.7ms … 74.9ms 5 ( 6%) ⚡- 11.8% ± 0.9% peak_rss 24.1MB ± 64.6KB 23.9MB … 24.1MB 18 (23%) + 0.1% ± 0.1% cpu_cycles 254M ± 6.97M 252M … 307M 9 (11%) ⚡- 13.6% ± 0.9% instructions 710M ± 259 710M … 710M 0 ( 0%) ⚡- 22.3% ± 0.0% cache_references 3.05M ± 736K 2.79M … 9.38M 5 ( 6%) + 0.4% ± 6.8% cache_misses 134K ± 14.6K 111K … 176K 1 ( 1%) ⚡- 14.0% ± 5.9% branch_misses 4.08M ± 5.27K 4.08M … 4.11M 3 ( 4%) - 0.1% ± 0.1% Benchmark 4 (81 runs): /tmp/labeled-match-fast rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 62.1ms ± 554us 61.4ms … 64.4ms 3 ( 4%) ⚡- 14.3% ± 0.8% peak_rss 24.1MB ± 63.0KB 23.9MB … 24.1MB 0 ( 0%) + 0.0% ± 0.1% cpu_cycles 246M ± 1.75M 245M … 257M 6 ( 7%) ⚡- 16.3% ± 0.7% instructions 689M ± 258 689M … 689M 0 ( 0%) ⚡- 24.6% ± 0.0% cache_references 3.02M ± 336K 2.77M … 5.78M 4 ( 5%) - 0.6% ± 4.5% cache_misses 128K ± 16.9K 101K … 186K 5 ( 6%) ⚡- 17.7% ± 6.0% branch_misses 4.08M ± 5.14K 4.08M … 4.10M 2 ( 2%) - 0.1% ± 0.1% Benchmark 1 (108 runs): /tmp/uncompress-baseline rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 46.5ms ± 649us 45.3ms … 50.1ms 2 ( 2%) 0% peak_rss 24.1MB ± 57.0KB 24.0MB … 24.1MB 27 (25%) 0% cpu_cycles 174M ± 1.71M 173M … 187M 9 ( 8%) 0% instructions 516M ± 277 516M … 516M 1 ( 1%) 0% cache_references 3.14M ± 181K 2.93M … 4.33M 3 ( 3%) 0% cache_misses 45.8K ± 17.2K 29.9K … 93.9K 7 ( 6%) 0% branch_misses 2.00M ± 2.42K 2.00M … 2.02M 2 ( 2%) 0% Benchmark 2 (78 runs): /tmp/loop-plus-match rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 64.8ms ± 705us 63.8ms … 69.2ms 1 ( 1%) 💩+ 39.3% ± 0.4% peak_rss 24.1MB ± 76.1KB 23.9MB … 24.1MB 0 ( 0%) - 0.1% ± 0.1% cpu_cycles 257M ± 2.22M 256M … 273M 9 (12%) 💩+ 47.5% ± 0.3% instructions 720M ± 385 720M … 720M 1 ( 1%) 💩+ 39.7% ± 0.0% cache_references 3.18M ± 92.4K 2.99M … 3.55M 2 ( 3%) + 1.5% ± 1.4% cache_misses 37.4K ± 17.6K 26.5K … 174K 7 ( 9%) ⚡- 18.4% ± 11.1% branch_misses 2.00M ± 2.47K 2.00M … 2.01M 3 ( 4%) - 0.2% ± 0.0% Benchmark 3 (104 runs): /tmp/labeled-match-len rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 48.0ms ± 500us 47.2ms … 50.3ms 1 ( 1%) 💩+ 3.2% ± 0.3% peak_rss 24.1MB ± 59.7KB 24.0MB … 24.1MB 0 ( 0%) - 0.0% ± 0.1% cpu_cycles 181M ± 1.18M 180M … 187M 6 ( 6%) 💩+ 3.8% ± 0.2% instructions 586M ± 354 586M … 586M 4 ( 4%) 💩+ 13.6% ± 0.0% cache_references 3.25M ± 202K 3.02M … 4.80M 6 ( 6%) 💩+ 3.5% ± 1.6% cache_misses 48.6K ± 8.60K 31.5K … 83.5K 5 ( 5%) + 6.0% ± 8.0% branch_misses 2.00M ± 1.77K 2.00M … 2.01M 5 ( 5%) - 0.2% ± 0.0% Benchmark 4 (110 runs): /tmp/labeled-match-fast rs-chunked 7 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 45.5ms ± 927us 44.5ms … 51.4ms 4 ( 4%) ⚡- 2.2% ± 0.5% peak_rss 24.1MB ± 73.1KB 23.9MB … 24.1MB 0 ( 0%) - 0.0% ± 0.1% cpu_cycles 170M ± 2.85M 168M … 190M 12 (11%) ⚡- 2.7% ± 0.4% instructions 515M ± 249 515M … 515M 0 ( 0%) - 0.1% ± 0.0% cache_references 3.27M ± 107K 3.07M … 3.94M 2 ( 2%) 💩+ 4.2% ± 1.3% cache_misses 109K ± 5.80K 98.7K … 139K 4 ( 4%) 💩+139.0% ± 7.4% branch_misses 2.00M ± 4.10K 1.99M … 2.03M 4 ( 4%) - 0.2% ± 0.0% Benchmark 1 (181 runs): /tmp/uncompress-baseline rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.6ms ± 433us 26.8ms … 31.4ms 6 ( 3%) 0% peak_rss 24.1MB ± 58.9KB 23.9MB … 24.1MB 45 (25%) 0% cpu_cycles 90.0M ± 919K 89.5M … 99.7M 15 ( 8%) 0% instructions 239M ± 322 239M … 239M 3 ( 2%) 0% cache_references 2.28M ± 53.6K 2.19M … 2.70M 4 ( 2%) 0% cache_misses 43.5K ± 2.44K 40.3K … 64.5K 5 ( 3%) 0% branch_misses 1.05M ± 1.70K 1.05M … 1.06M 4 ( 2%) 0% Benchmark 2 (186 runs): /tmp/loop-plus-match rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.0ms ± 304us 26.3ms … 28.6ms 3 ( 2%) ⚡- 2.5% ± 0.3% peak_rss 24.1MB ± 64.1KB 23.9MB … 24.1MB 0 ( 0%) - 0.0% ± 0.1% cpu_cycles 87.3M ± 654K 86.8M … 91.7M 19 (10%) ⚡- 3.0% ± 0.2% instructions 248M ± 266 248M … 248M 2 ( 1%) 💩+ 3.8% ± 0.0% cache_references 2.25M ± 55.5K 2.17M … 2.75M 5 ( 3%) - 1.2% ± 0.5% cache_misses 45.4K ± 2.26K 40.6K … 52.2K 1 ( 1%) 💩+ 4.2% ± 1.1% branch_misses 1.05M ± 1.67K 1.04M … 1.05M 4 ( 2%) - 0.5% ± 0.0% Benchmark 3 (184 runs): /tmp/labeled-match-len rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.2ms ± 817us 26.3ms … 33.7ms 16 ( 9%) ⚡- 1.7% ± 0.5% peak_rss 24.1MB ± 55.9KB 23.9MB … 24.1MB 39 (21%) + 0.0% ± 0.0% cpu_cycles 87.4M ± 1.95M 86.4M … 103M 23 (13%) ⚡- 2.9% ± 0.3% instructions 248M ± 278 248M … 248M 2 ( 1%) 💩+ 3.6% ± 0.0% cache_references 2.28M ± 112K 2.18M … 2.93M 16 ( 9%) + 0.0% ± 0.8% cache_misses 47.9K ± 11.1K 40.5K … 168K 15 ( 8%) 💩+ 10.0% ± 3.8% branch_misses 1.05M ± 2.32K 1.04M … 1.06M 4 ( 2%) - 0.4% ± 0.0% Benchmark 4 (182 runs): /tmp/labeled-match-fast rs-chunked 16 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 27.5ms ± 252us 26.9ms … 28.5ms 2 ( 1%) - 0.7% ± 0.3% peak_rss 24.1MB ± 64.1KB 23.9MB … 24.1MB 44 (24%) - 0.0% ± 0.1% cpu_cycles 89.5M ± 477K 89.1M … 93.9M 18 (10%) - 0.6% ± 0.2% instructions 253M ± 306 253M … 253M 1 ( 1%) 💩+ 5.7% ± 0.0% cache_references 2.27M ± 67.3K 2.19M … 3.00M 3 ( 2%) - 0.5% ± 0.5% cache_misses 66.5K ± 2.10K 59.8K … 73.6K 2 ( 1%) 💩+ 52.7% ± 1.1% branch_misses 1.05M ± 1.22K 1.05M … 1.06M 1 ( 1%) - 0.2% ± 0.0% ```