Last active
April 1, 2026 18:37
-
-
Save nenkoru/890ccb088d62b4f59cc9c95b297fe69b to your computer and use it in GitHub Desktop.
Turboquant llama-bench on Nvidia V100
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo1.5 -ctk turbo1.5 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000 | |
| ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB): | |
| Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB | |
| Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB | |
| common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf | |
| | model | size | params | backend | ngl | type_k | type_v | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 | 842.63 ± 0.58 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 | 28.71 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d10000 | 669.50 ± 1.76 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d10000 | 24.24 ± 0.01 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d20000 | 552.62 ± 1.70 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d20000 | 21.17 ± 0.07 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d30000 | 469.62 ± 2.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d30000 | 18.96 ± 0.01 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d40000 | 408.63 ± 0.56 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d40000 | 17.41 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d50000 | 361.31 ± 1.10 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d50000 | 16.04 ± 0.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d60000 | 323.84 ± 0.25 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d60000 | 15.07 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d70000 | 292.74 ± 1.33 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d70000 | 13.99 ± 0.09 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d80000 | 267.18 ± 0.65 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d80000 | 13.06 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | pp512 @ d90000 | 245.72 ± 0.39 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo1.5 | turbo1.5 | tg128 @ d90000 | 12.17 ± 0.05 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo2 -ctk turbo2 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000 | |
| ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB): | |
| Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB | |
| Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB | |
| common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf | |
| | model | size | params | backend | ngl | type_k | type_v | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 | 850.45 ± 0.68 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 | 28.82 ± 0.07 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d10000 | 674.29 ± 1.33 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d10000 | 25.97 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d20000 | 556.24 ± 0.84 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d20000 | 23.53 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d30000 | 472.06 ± 1.37 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d30000 | 21.76 ± 0.04 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d40000 | 409.79 ± 0.47 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d40000 | 20.36 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d50000 | 363.37 ± 0.53 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d50000 | 19.09 ± 0.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d60000 | 324.09 ± 0.71 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d60000 | 18.24 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d70000 | 292.87 ± 0.55 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d70000 | 17.24 ± 0.07 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d80000 | 267.23 ± 0.59 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d80000 | 16.33 ± 0.07 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | pp512 @ d90000 | 246.42 ± 0.50 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo2 | turbo2 | tg128 @ d90000 | 15.34 ± 0.07 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo3 -ctk turbo3 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000 | |
| ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB): | |
| Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB | |
| Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB | |
| common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf | |
| | model | size | params | backend | ngl | type_k | type_v | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 | 850.97 ± 1.13 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 | 28.52 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d10000 | 672.08 ± 3.06 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d10000 | 23.34 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d20000 | 554.60 ± 2.16 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d20000 | 19.60 ± 0.06 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d30000 | 472.52 ± 0.57 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d30000 | 17.19 ± 0.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d40000 | 409.64 ± 0.53 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d40000 | 15.53 ± 0.10 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d50000 | 362.80 ± 0.57 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d50000 | 14.06 ± 0.04 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d60000 | 322.75 ± 0.50 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d60000 | 13.19 ± 0.06 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d70000 | 292.25 ± 0.55 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d70000 | 12.16 ± 0.11 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | pp512 @ d80000 | 267.41 ± 0.61 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo3 | tg128 @ d80000 | 11.35 ± 0.07 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo4 -ctk turbo3 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150 | |
| 000,200000,250000 | |
| ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB): | |
| Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB | |
| Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB | |
| common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf | |
| | model | size | params | backend | ngl | type_k | type_v | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 | 851.85 ± 0.50 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 | 28.62 ± 0.04 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d10000 | 672.62 ± 0.96 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d10000 | 24.22 ± 0.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d20000 | 555.97 ± 0.59 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d20000 | 20.70 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d30000 | 470.88 ± 0.41 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d30000 | 18.35 ± 0.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d40000 | 409.48 ± 1.07 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d40000 | 16.62 ± 0.10 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d50000 | 361.04 ± 1.08 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d50000 | 15.13 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d60000 | 323.77 ± 0.90 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d60000 | 14.16 ± 0.06 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d70000 | 293.00 ± 0.85 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d70000 | 13.09 ± 0.10 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d80000 | 267.51 ± 1.15 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d80000 | 12.23 ± 0.07 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | pp512 @ d90000 | 245.90 ± 0.12 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo3 | turbo4 | tg128 @ d90000 | 11.25 ± 0.06 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| nenkoru@bayfut-ubuntu-v100-vgpu:~/turbo3-cuda$ ./build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -fa on -ctv turbo4 -ctk turbo4 -d 0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000,200000,250000 | |
| ggml_cuda_init: found 2 CUDA devices (Total VRAM: 40960 MiB): | |
| Device 0: GRID V100DX-32Q, compute capability 7.0, VMM: no, VRAM: 32768 MiB | |
| Device 1: GRID V100DX-8Q, compute capability 7.0, VMM: no, VRAM: 8192 MiB | |
| common_download_file_single_online: using cached file: /home/nenkoru/.cache/huggingface/hub/models--unsloth--Qwen3.5-27B-GGUF/snapshots/3221f178a6b842d04f1fb42f1c413534adcc0a6a/Qwen3.5-27B-UD-Q4_K_XL.gguf | |
| | model | size | params | backend | ngl | type_k | type_v | test | t/s | | |
| | ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | --------------: | -------------------: | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 | 842.76 ± 1.35 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 | 28.74 ± 0.03 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d10000 | 670.43 ± 1.09 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d10000 | 25.11 ± 0.02 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d20000 | 551.45 ± 1.52 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d20000 | 22.15 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d30000 | 470.57 ± 0.75 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d30000 | 20.01 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d40000 | 408.70 ± 0.73 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d40000 | 18.36 ± 0.09 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d50000 | 361.86 ± 0.87 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d50000 | 16.97 ± 0.05 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d60000 | 323.68 ± 0.62 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d60000 | 15.97 ± 0.06 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d70000 | 292.89 ± 0.54 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d70000 | 15.04 ± 0.08 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d80000 | 267.72 ± 0.40 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d80000 | 14.16 ± 0.06 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | pp512 @ d90000 | 245.43 ± 0.44 | | |
| | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | CUDA | 99 | turbo4 | turbo4 | tg128 @ d90000 | 13.17 ± 0.06 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment