abhishekmishragithub/llm_benchmarking_glossary.md

Last active September 25, 2025 05:16

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/abhishekmishragithub/a814397048e55cf43b0b664e54ffa494.js"></script>
Save abhishekmishragithub/a814397048e55cf43b0b664e54ffa494 to your computer and use it in GitHub Desktop.

Download ZIP

llm benchmarking terms / glossary

Raw

llm_benchmarking_glossary.md

RPS (Requests per Second): how many requests the system completed per second. RPS = successful_requests / benchmark_duration_sec

TPS (Tokens per Second): ambiguous term, can be more explicit:

Output Token Throughput (tok/s) = total_output_tokens / duration
Total Token Throughput (tok/s) = (input_tokens + output_tokens) / duration
Per-request TPS (your sequential runs) = tokens_returned / request_latency

TTFT (Time To First Token): **time from request send → first token received. Proxy for perceived snappiness. Lower is better.

TPOT (Time Per Output Token) (excl. first token): cadence after streaming begins. TPOT ≈ 1 / (tokens_per_second_per_stream)

ITL (Inter-Token Latency): measured gap between consecutive tokens when streaming. Similar to TPOT; sometimes includes small transport overheads.

E2E (End-to-End latency): request send → last token received. Depends heavily on output length.

Concurrency (in-flight): how many requests are simultaneously being processed.

Burstiness factor: if arrivals are Poisson (default), a factor of 1.0 = classic Poisson; >1 makes bursts spikier.

rule of thumb: TTFT = “how fast does it start talking?”, ITL/TPOT = “how fast does it keep talking?”, E2E = “when does it finish?”, tok/s = “how much work per second can it do?”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment