Skip to content

Instantly share code, notes, and snippets.

@nenkoru
Last active April 1, 2026 18:37
Show Gist options
  • Select an option

  • Save nenkoru/890ccb088d62b4f59cc9c95b297fe69b to your computer and use it in GitHub Desktop.

Select an option

Save nenkoru/890ccb088d62b4f59cc9c95b297fe69b to your computer and use it in GitHub Desktop.
Turboquant llama-bench on Nvidia V100
nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo1.5 -ctk turbo1.5 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB):
Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB
Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB
common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf
| model | size | params | backend | ngl | type_k | type_v | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 | 842.63 ± 0.58 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 | 28.71 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d10000 | 669.50 ± 1.76 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d10000 | 24.24 ± 0.01 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d20000 | 552.62 ± 1.70 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d20000 | 21.17 ± 0.07 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d30000 | 469.62 ± 2.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d30000 | 18.96 ± 0.01 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d40000 | 408.63 ± 0.56 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d40000 | 17.41 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d50000 | 361.31 ± 1.10 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d50000 | 16.04 ± 0.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d60000 | 323.84 ± 0.25 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d60000 | 15.07 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d70000 | 292.74 ± 1.33 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d70000 | 13.99 ± 0.09 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d80000 | 267.18 ± 0.65 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d80000 | 13.06 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d90000 | 245.72 ± 0.39 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d90000 | 12.17 ± 0.05 |
nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo2 -ctk turbo2 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB):
Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB
Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB
common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf
| model | size | params | backend | ngl | type_k | type_v | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 | 850.45 ± 0.68 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 | 28.82 ± 0.07 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d10000 | 674.29 ± 1.33 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d10000 | 25.97 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d20000 | 556.24 ± 0.84 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d20000 | 23.53 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d30000 | 472.06 ± 1.37 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d30000 | 21.76 ± 0.04 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d40000 | 409.79 ± 0.47 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d40000 | 20.36 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d50000 | 363.37 ± 0.53 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d50000 | 19.09 ± 0.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d60000 | 324.09 ± 0.71 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d60000 | 18.24 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d70000 | 292.87 ± 0.55 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d70000 | 17.24 ± 0.07 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d80000 | 267.23 ± 0.59 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d80000 | 16.33 ± 0.07 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d90000 | 246.42 ± 0.50 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d90000 | 15.34 ± 0.07 |
nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo3 -ctk turbo3 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB):
Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB
Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB
common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf
| model | size | params | backend | ngl | type_k | type_v | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 | 850.97 ± 1.13 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 | 28.52 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d10000 | 672.08 ± 3.06 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d10000 | 23.34 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d20000 | 554.60 ± 2.16 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d20000 | 19.60 ± 0.06 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d30000 | 472.52 ± 0.57 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d30000 | 17.19 ± 0.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d40000 | 409.64 ± 0.53 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d40000 | 15.53 ± 0.10 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d50000 | 362.80 ± 0.57 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d50000 | 14.06 ± 0.04 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d60000 | 322.75 ± 0.50 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d60000 | 13.19 ± 0.06 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d70000 | 292.25 ± 0.55 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d70000 | 12.16 ± 0.11 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d80000 | 267.41 ± 0.61 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d80000 | 11.35 ± 0.07 |
nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo4 -ctk turbo3 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150
000,200000,250000
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB):
Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB
Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB
common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf
| model | size | params | backend | ngl | type_k | type_v | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 | 851.85 ± 0.50 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 | 28.62 ± 0.04 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d10000 | 672.62 ± 0.96 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d10000 | 24.22 ± 0.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d20000 | 555.97 ± 0.59 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d20000 | 20.70 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d30000 | 470.88 ± 0.41 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d30000 | 18.35 ± 0.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d40000 | 409.48 ± 1.07 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d40000 | 16.62 ± 0.10 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d50000 | 361.04 ± 1.08 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d50000 | 15.13 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d60000 | 323.77 ± 0.90 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d60000 | 14.16 ± 0.06 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d70000 | 293.00 ± 0.85 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d70000 | 13.09 ± 0.10 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d80000 | 267.51 ± 1.15 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d80000 | 12.23 ± 0.07 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d90000 | 245.90 ± 0.12 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d90000 | 11.25 ± 0.06 |
nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo4 -ctk turbo4 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB):
Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB
Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB
common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf
| model | size | params | backend | ngl | type_k | type_v | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 | 842.76 ± 1.35 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 | 28.74 ± 0.03 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d10000 | 670.43 ± 1.09 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d10000 | 25.11 ± 0.02 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d20000 | 551.45 ± 1.52 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d20000 | 22.15 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d30000 | 470.57 ± 0.75 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d30000 | 20.01 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d40000 | 408.70 ± 0.73 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d40000 | 18.36 ± 0.09 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d50000 | 361.86 ± 0.87 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d50000 | 16.97 ± 0.05 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d60000 | 323.68 ± 0.62 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d60000 | 15.97 ± 0.06 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d70000 | 292.89 ± 0.54 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d70000 | 15.04 ± 0.08 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d80000 | 267.72 ± 0.40 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d80000 | 14.16 ± 0.06 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d90000 | 245.43 ± 0.44 |
| qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d90000 | 13.17 ± 0.06 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment