Final tokenizer benchmark results
Pinned to CPU cores 0–7 for both Python and Rust runs. OMP_NUM_THREADS=1, 8 rayon/tiktoken threads. Results are the best of (warmup + iters) per combination.
Host : AMD EPYC 7R13 Processor (nproc: 64)
Python run timestamp : 2026-04-23T06:10:10.228895+00:00 load avg at start: 2.58 / 2.16 / 2.54
Rust run timestamp : 2026-04-23T06:19:48.992276Z load avg post-bench: 6.11 / 5.61 / 4.01
Python config : batches=[1, 32, 128] lengths=[128, 2048, 8192] threads=8 iters=4 warmup=2
Python world — 10-model leaderboard
encode peak throughput (MB/s)
model
tokenizers
iree
tiktoken
best × tokenizers
meta-llama/Llama-3.2-1B
27.1
26.7
36.1
1.33× tiktoken
Qwen/Qwen2.5-7B
21.0
18.6
–
0.89× iree
Qwen/Qwen3-8B
20.9
22.3
–
1.07× iree
deepseek-ai/DeepSeek-V3
19.6
10.6
–
0.54× iree
zai-org/GLM-4.5-Air
26.4
25.5
–
0.97× iree
mistralai/Mistral-Nemo-Instruct-2407
26.0
26.1
–
1.00× iree
01-ai/Yi-1.5-9B
27.6
28.2
–
1.02× iree
bigcode/starcoder2-7b
21.0
15.7
–
0.75× iree
EleutherAI/gpt-neox-20b
21.0
21.1
–
1.01× iree
tiiuae/falcon-7b
14.1
9.9
–
0.70× iree
decode peak throughput (Mtok/s)
model
tokenizers
iree
tiktoken
best × tokenizers
meta-llama/Llama-3.2-1B
9.9
72.0
45.3
7.24× iree
Qwen/Qwen2.5-7B
10.5
57.4
–
5.46× iree
Qwen/Qwen3-8B
10.5
76.6
–
7.28× iree
deepseek-ai/DeepSeek-V3
10.2
60.5
–
5.94× iree
zai-org/GLM-4.5-Air
10.1
74.4
–
7.35× iree
mistralai/Mistral-Nemo-Instruct-2407
10.2
77.2
–
7.54× iree
01-ai/Yi-1.5-9B
10.3
67.4
–
6.51× iree
bigcode/starcoder2-7b
10.3
73.1
–
7.13× iree
EleutherAI/gpt-neox-20b
10.5
71.6
–
6.80× iree
tiiuae/falcon-7b
10.2
74.8
–
7.36× iree
Rust world — Criterion matrix (llama-3 tokenizer)
batch
input_len
mean ms
throughput
1
128
0.15
4.2 MB/s
1
1024
1.37
3.3 MB/s
1
8192
9.87
3.6 MB/s
8
128
0.27
18.7 MB/s
8
1024
1.55
23.5 MB/s
8
8192
11.64
24.4 MB/s
32
128
0.78
25.8 MB/s
32
1024
4.95
29.4 MB/s
32
8192
43.74
26.0 MB/s
128
128
2.56
31.4 MB/s
128
1024
18.38
31.7 MB/s
128
8192
167.15
27.2 MB/s
batch
input_len
mean ms
throughput
1
128
0.15
4.3 MB/s
1
1024
1.19
3.8 MB/s
1
8192
8.78
4.0 MB/s
8
128
0.21
23.8 MB/s
8
1024
1.29
28.3 MB/s
8
8192
9.88
28.8 MB/s
32
128
0.65
30.9 MB/s
32
1024
4.30
33.9 MB/s
32
8192
36.26
31.4 MB/s
128
128
2.16
37.2 MB/s
128
1024
15.21
38.3 MB/s
128
8192
126.52
36.0 MB/s
batch
input_len
mean ms
throughput
1
128
0.01
9.5 Mtok/s
1
1024
0.16
6.6 Mtok/s
1
8192
1.26
6.5 Mtok/s
8
128
0.05
20.7 Mtok/s
8
1024
0.23
36.4 Mtok/s
8
8192
1.53
42.8 Mtok/s
32
128
0.12
35.2 Mtok/s
32
1024
0.70
46.8 Mtok/s
32
8192
5.14
51.0 Mtok/s
128
128
0.35
46.6 Mtok/s
128
1024
2.30
56.9 Mtok/s
128
8192
17.93
58.5 Mtok/s
Python vs Rust overhead (llama-3, batch=128, len=8192)
phase
Python (MB/s or Mtok/s)
Rust (MB/s or Mtok/s)
python / rust
encode (fast)
27.1 MB/s
36.0 MB/s
75%
decode
9.9 Mtok/s
58.5 Mtok/s
17%
python_leaderboard.json — full row-level results per model/backend/combo
python_leaderboard.md — model × backend peak table
python_leaderboard.log — full interactive log incl. per-model tables
rust_matrix.json — criterion estimates parsed into a compact form
rust_matrix.md — encode/encode-fast/decode matrices
rust_matrix.log — raw criterion output