qwen3-coder-next % ./run --log-file log.txt
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
build: 7939 (b536eb023) with GNU 13.3.0 for Linux x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16

system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | BLACKWELL_NATIVE_FP4 = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

Running without SSL
init: using 15 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model './Qwen3-Coder-Next-Q3_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 37528 MiB of device memory vs. 15082 MiB of free device memory
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 23470 MiB
llama_params_fit_impl: context size set by user to 32768 -> no change
llama_params_fit_impl: with only dense weights in device memory there is a total surplus of 11494 MiB
llama_params_fit_impl: filling dense-only layers back-to-front:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 49 layers,   3355 MiB used,  11726 MiB free
llama_params_fit_impl: converting dense-only layers to full layers and filling them front-to-back with overflow to next device/system memory:
llama_params_fit_impl:   - CUDA0 (NVIDIA GeForce RTX 5060 Ti): 49 layers (34 overflowing),  13834 MiB used,   1247 MiB free
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 3.91 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5060 Ti) (0000:01:00.0) - 15158 MiB free
llama_model_loader: loaded meta data with 52 key-value pairs and 843 tensors from ./Qwen3-Coder-Next-Q3_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen3next
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 40
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Qwen3-Coder-Next
llama_model_loader: - kv   6:                           general.basename str              = Qwen3-Coder-Next
llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.size_label str              = 512x2.5B
llama_model_loader: - kv   9:                            general.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.license.link str              = https://huggingface.co/Qwen/Qwen3-Cod...
llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Qwen3 Coder Next
llama_model_loader: - kv  14:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen3-Cod...
llama_model_loader: - kv  16:                               general.tags arr[str,2]       = ["unsloth", "text-generation"]
llama_model_loader: - kv  17:                      qwen3next.block_count u32              = 48
llama_model_loader: - kv  18:                   qwen3next.context_length u32              = 262144
llama_model_loader: - kv  19:                 qwen3next.embedding_length u32              = 2048
llama_model_loader: - kv  20:              qwen3next.feed_forward_length u32              = 5120
llama_model_loader: - kv  21:             qwen3next.attention.head_count u32              = 16
llama_model_loader: - kv  22:          qwen3next.attention.head_count_kv u32              = 2
llama_model_loader: - kv  23:                   qwen3next.rope.freq_base f32              = 5000000.000000
llama_model_loader: - kv  24: qwen3next.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  25:                qwen3next.expert_used_count u32              = 10
llama_model_loader: - kv  26:             qwen3next.attention.key_length u32              = 256
llama_model_loader: - kv  27:           qwen3next.attention.value_length u32              = 256
llama_model_loader: - kv  28:                     qwen3next.expert_count u32              = 512
llama_model_loader: - kv  29:       qwen3next.expert_feed_forward_length u32              = 512
llama_model_loader: - kv  30: qwen3next.expert_shared_feed_forward_length u32              = 512
llama_model_loader: - kv  31:                  qwen3next.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  32:                   qwen3next.ssm.state_size u32              = 128
llama_model_loader: - kv  33:                  qwen3next.ssm.group_count u32              = 16
llama_model_loader: - kv  34:               qwen3next.ssm.time_step_rank u32              = 32
llama_model_loader: - kv  35:                   qwen3next.ssm.inner_size u32              = 4096
llama_model_loader: - kv  36:             qwen3next.rope.dimension_count u32              = 64
llama_model_loader: - kv  37:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  38:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  39:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  40:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  41:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  42:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  43:            tokenizer.ggml.padding_token_id u32              = 151654
llama_model_loader: - kv  44:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  45:                    tokenizer.chat_template str              = {% macro render_extra_keys(json_dict,...
llama_model_loader: - kv  46:               general.quantization_version u32              = 2
llama_model_loader: - kv  47:                          general.file_type u32              = 12
llama_model_loader: - kv  48:                      quantize.imatrix.file str              = Qwen3-Coder-Next-GGUF/imatrix_unsloth...
llama_model_loader: - kv  49:                   quantize.imatrix.dataset str              = unsloth_calibration_Qwen3-Coder-Next.txt
llama_model_loader: - kv  50:             quantize.imatrix.entries_count u32              = 576
llama_model_loader: - kv  51:              quantize.imatrix.chunks_count u32              = 154
llama_model_loader: - type  f32:  313 tensors
llama_model_loader: - type q3_K:  229 tensors
llama_model_loader: - type q4_K:  245 tensors
llama_model_loader: - type q5_K:    7 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type bf16:   48 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q3_K - Medium
print_info: file size   = 35.65 GiB (3.84 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 26
load: token to piece cache size = 0.9311 MB
print_info: arch                  = qwen3next
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 262144
print_info: n_embd                = 2048
print_info: n_embd_inp            = 2048
print_info: n_layer               = 48
print_info: n_head                = 16
print_info: n_head_kv             = 2
print_info: n_rot                 = 64
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 256
print_info: n_embd_head_v         = 256
print_info: n_gqa                 = 8
print_info: n_embd_k_gqa          = 512
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 5120
print_info: n_expert              = 512
print_info: n_expert_used         = 10
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = 0
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 5000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 262144
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: ssm_d_conv            = 4
print_info: ssm_d_inner           = 4096
print_info: ssm_d_state           = 128
print_info: ssm_dt_rank           = 32
print_info: ssm_n_group           = 16
print_info: ssm_dt_b_c_rms        = 0
print_info: model type            = 80B.A3B
print_info: model params          = 79.67 B
print_info: general.name          = Qwen3-Coder-Next
print_info: vocab type            = BPE
print_info: n_vocab               = 151936
print_info: n_merges              = 151387
print_info: BOS token             = 11 ','
print_info: EOS token             = 151645 '<|im_end|>'
print_info: EOT token             = 151645 '<|im_end|>'
print_info: PAD token             = 151654 '<|vision_pad|>'
print_info: LF token              = 198 'Ċ'
print_info: FIM PRE token         = 151659 '<|fim_prefix|>'
print_info: FIM SUF token         = 151661 '<|fim_suffix|>'
print_info: FIM MID token         = 151660 '<|fim_middle|>'
print_info: FIM PAD token         = 151662 '<|fim_pad|>'
print_info: FIM REP token         = 151663 '<|repo_name|>'
print_info: FIM SEP token         = 151664 '<|file_sep|>'
print_info: EOG token             = 151643 '<|endoftext|>'
print_info: EOG token             = 151645 '<|im_end|>'
print_info: EOG token             = 151662 '<|fim_pad|>'
print_info: EOG token             = 151663 '<|repo_name|>'
print_info: EOG token             = 151664 '<|file_sep|>'
print_info: max token length      = 256
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloaded 49/49 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 36263.77 MiB
load_tensors:        CUDA0 model buffer size = 12571.13 MiB
....................................................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 32768
llama_context: n_ctx_seq     = 32768
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = false
llama_context: freq_base     = 5000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (32768) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.58 MiB
llama_kv_cache:      CUDA0 KV buffer size =   768.00 MiB
llama_kv_cache: size =  768.00 MiB ( 32768 cells,  12 layers,  1/1 seqs), K (f16):  384.00 MiB, V (f16):  384.00 MiB
llama_memory_recurrent:      CUDA0 RS buffer size =    75.38 MiB
llama_memory_recurrent: size =   75.38 MiB (     1 cells,  48 layers,  1 seqs), R (f32):    3.38 MiB, S (f32):   72.00 MiB
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   420.06 MiB
sched_reserve:  CUDA_Host compute buffer size =    72.01 MiB
sched_reserve: graph nodes  = 14666 (with bs=512), 5918 (with bs=1)
sched_reserve: graph splits = 104 (with bs=512), 74 (with bs=1)
sched_reserve: reserve took 587.99 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv    load_model: initializing slots, n_slots = 1
no implementations specified for speculative decoding
slot   load_model: id  0 | task -1 | speculative decoding context not initialized
slot   load_model: id  0 | task -1 | new slot, n_ctx = 32768
srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache
srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
init: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://127.0.0.1:8080
main: starting the main loop...
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 0 | processing task, is_child = 0
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 158
slot update_slots: id  0 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 94, batch.n_tokens = 94, progress = 0.594937
srv  params_from_: Chat format: Qwen3 Coder
slot update_slots: id  0 | task 0 | n_tokens = 94, memory_seq_rm [94, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_tokens = 158, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_tokens = 158, batch.n_tokens = 64
slot init_sampler: id  0 | task 0 | init sampler, took 0.02 ms, tokens: text = 158, total = 158
slot update_slots: id  0 | task 0 | created context checkpoint 1 of 8 (pos_min = 93, pos_max = 93, size = 75.376 MiB)
slot print_timing: id  0 | task 0 |
prompt eval time =   20427.03 ms /   158 tokens (  129.29 ms per token,     7.73 tokens per second)
       eval time =     852.28 ms /    11 tokens (   77.48 ms per token,    12.91 tokens per second)
      total time =   21279.31 ms /   169 tokens
slot      release: id  0 | task 0 | stop processing: n_tokens = 168, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10688094175
srv  get_availabl: updating prompt cache
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv   prompt_save:  - saving prompt with length 168, total state size = 79.316 MiB
srv          load:  - looking for better prompt, base f_keep = 0.125, sim = 0.001
srv        update:  - cache state: 1 prompts, 154.691 MiB (limits: 8192.000 MiB, 32768 tokens, 32768 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv  get_availabl: prompt cache update took 422.29 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2 | processing task, is_child = 0
slot update_slots: id  0 | task 2 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 16962
slot update_slots: id  0 | task 2 | n_past = 21, slot.prompt.tokens.size() = 168, seq_id = 0, pos_min = 167, n_swa = 1
slot update_slots: id  0 | task 2 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 2 | erased invalidated context checkpoint (pos_min = 93, pos_max = 93, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 2 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.120740
slot update_slots: id  0 | task 2 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.241481
slot update_slots: id  0 | task 2 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.362221
slot update_slots: id  0 | task 2 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.482962
slot update_slots: id  0 | task 2 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.603702
slot update_slots: id  0 | task 2 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.724443
slot update_slots: id  0 | task 2 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.845183
slot update_slots: id  0 | task 2 | n_tokens = 14336, memory_seq_rm [14336, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 2048, progress = 0.965924
slot update_slots: id  0 | task 2 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 16898, batch.n_tokens = 514, progress = 0.996227
slot update_slots: id  0 | task 2 | n_tokens = 16898, memory_seq_rm [16898, end)
slot update_slots: id  0 | task 2 | prompt processing progress, n_tokens = 16962, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2 | prompt done, n_tokens = 16962, batch.n_tokens = 64
slot init_sampler: id  0 | task 2 | init sampler, took 1.31 ms, tokens: text = 16962, total = 16962
slot update_slots: id  0 | task 2 | created context checkpoint 1 of 8 (pos_min = 16897, pos_max = 16897, size = 75.376 MiB)
slot print_timing: id  0 | task 2 |
prompt eval time =  110084.95 ms / 16962 tokens (    6.49 ms per token,   154.08 tokens per second)
       eval time =    9879.50 ms /   167 tokens (   59.16 ms per token,    16.90 tokens per second)
      total time =  119964.45 ms / 17129 tokens
slot      release: id  0 | task 2 | stop processing: n_tokens = 17128, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10808634380
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 17128, total state size = 477.010 MiB
srv          load:  - looking for better prompt, base f_keep = 0.002, sim = 0.004
srv        update:  - cache state: 2 prompts, 707.077 MiB (limits: 8192.000 MiB, 32768 tokens, 200386 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv  get_availabl: prompt cache update took 3980.01 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 190 | processing task, is_child = 0
slot update_slots: id  0 | task 190 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 9779
slot update_slots: id  0 | task 190 | n_past = 40, slot.prompt.tokens.size() = 17128, seq_id = 0, pos_min = 17127, n_swa = 1
slot update_slots: id  0 | task 190 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 190 | erased invalidated context checkpoint (pos_min = 16897, pos_max = 16897, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 190 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 190 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.209428
slot update_slots: id  0 | task 190 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id  0 | task 190 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.418857
slot update_slots: id  0 | task 190 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id  0 | task 190 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.628285
slot update_slots: id  0 | task 190 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id  0 | task 190 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.837713
slot update_slots: id  0 | task 190 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id  0 | task 190 | prompt processing progress, n_tokens = 9715, batch.n_tokens = 1523, progress = 0.993455
slot update_slots: id  0 | task 190 | n_tokens = 9715, memory_seq_rm [9715, end)
slot update_slots: id  0 | task 190 | prompt processing progress, n_tokens = 9779, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 190 | prompt done, n_tokens = 9779, batch.n_tokens = 64
slot init_sampler: id  0 | task 190 | init sampler, took 0.80 ms, tokens: text = 9779, total = 9779
slot update_slots: id  0 | task 190 | created context checkpoint 1 of 8 (pos_min = 9714, pos_max = 9714, size = 75.376 MiB)
slot print_timing: id  0 | task 190 |
prompt eval time =   43435.43 ms /  9779 tokens (    4.44 ms per token,   225.14 tokens per second)
       eval time =    2675.64 ms /    45 tokens (   59.46 ms per token,    16.82 tokens per second)
      total time =   46111.07 ms /  9824 tokens
slot      release: id  0 | task 190 | stop processing: n_tokens = 9823, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10859055595
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 9823, total state size = 305.715 MiB
srv  params_from_: Chat format: Qwen3 Coder
srv          load:  - looking for better prompt, base f_keep = 0.002, sim = 0.031
srv        update:  - cache state: 3 prompts, 1088.168 MiB (limits: 8192.000 MiB, 32768 tokens, 204158 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv        update:    - prompt 0x5e42681ad3f0:    9823 tokens, checkpoints:  1,   381.091 MiB
srv  get_availabl: prompt cache update took 1279.15 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 241 | processing task, is_child = 0
slot update_slots: id  0 | task 241 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 639
slot update_slots: id  0 | task 241 | n_past = 20, slot.prompt.tokens.size() = 9823, seq_id = 0, pos_min = 9822, n_swa = 1
slot update_slots: id  0 | task 241 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 241 | erased invalidated context checkpoint (pos_min = 9714, pos_max = 9714, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 241 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 241 | prompt processing progress, n_tokens = 575, batch.n_tokens = 575, progress = 0.899844
slot update_slots: id  0 | task 241 | n_tokens = 575, memory_seq_rm [575, end)
slot update_slots: id  0 | task 241 | prompt processing progress, n_tokens = 639, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 241 | prompt done, n_tokens = 639, batch.n_tokens = 64
slot init_sampler: id  0 | task 241 | init sampler, took 0.06 ms, tokens: text = 639, total = 639
slot update_slots: id  0 | task 241 | created context checkpoint 1 of 8 (pos_min = 574, pos_max = 574, size = 75.376 MiB)
slot print_timing: id  0 | task 241 |
prompt eval time =    5353.33 ms /   639 tokens (    8.38 ms per token,   119.36 tokens per second)
       eval time =    1218.51 ms /    23 tokens (   52.98 ms per token,    18.88 tokens per second)
      total time =    6571.85 ms /   662 tokens
slot      release: id  0 | task 241 | stop processing: n_tokens = 661, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10867785632
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 661, total state size = 90.876 MiB
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          load:  - looking for better prompt, base f_keep = 0.030, sim = 0.002
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.964
srv        update:  - cache state: 3 prompts, 873.329 MiB (limits: 8192.000 MiB, 32768 tokens, 168440 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv        update:    - prompt 0x5e4267889560:     661 tokens, checkpoints:  1,   166.252 MiB
srv  get_availabl: prompt cache update took 1476.50 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 242 | processing task, is_child = 0
slot update_slots: id  0 | task 242 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10190
slot update_slots: id  0 | task 242 | n_tokens = 9823, memory_seq_rm [9823, end)
slot update_slots: id  0 | task 242 | prompt processing progress, n_tokens = 10126, batch.n_tokens = 303, progress = 0.993719
slot update_slots: id  0 | task 242 | n_tokens = 10126, memory_seq_rm [10126, end)
slot update_slots: id  0 | task 242 | prompt processing progress, n_tokens = 10190, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 242 | prompt done, n_tokens = 10190, batch.n_tokens = 64
slot init_sampler: id  0 | task 242 | init sampler, took 0.81 ms, tokens: text = 10190, total = 10190
slot update_slots: id  0 | task 242 | created context checkpoint 2 of 8 (pos_min = 10125, pos_max = 10125, size = 75.376 MiB)
slot print_timing: id  0 | task 242 |
prompt eval time =    3482.72 ms /   367 tokens (    9.49 ms per token,   105.38 tokens per second)
       eval time =    3469.33 ms /    59 tokens (   58.80 ms per token,    17.01 tokens per second)
      total time =    6952.05 ms /   426 tokens
slot      release: id  0 | task 242 | stop processing: n_tokens = 10248, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10876215074
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 10248, total state size = 315.681 MiB
srv  params_from_: Chat format: Qwen3 Coder
srv          load:  - looking for better prompt, base f_keep = 0.002, sim = 0.035
srv        update:  - cache state: 4 prompts, 1339.761 MiB (limits: 8192.000 MiB, 32768 tokens, 172460 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv        update:    - prompt 0x5e4267889560:     661 tokens, checkpoints:  1,   166.252 MiB
srv        update:    - prompt 0x5e42699ac270:   10248 tokens, checkpoints:  2,   466.433 MiB
srv  get_availabl: prompt cache update took 1718.98 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 327 | processing task, is_child = 0
slot update_slots: id  0 | task 327 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 573
slot update_slots: id  0 | task 327 | n_past = 20, slot.prompt.tokens.size() = 10248, seq_id = 0, pos_min = 10247, n_swa = 1
slot update_slots: id  0 | task 327 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 327 | erased invalidated context checkpoint (pos_min = 9714, pos_max = 9714, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 327 | erased invalidated context checkpoint (pos_min = 10125, pos_max = 10125, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 327 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 327 | prompt processing progress, n_tokens = 509, batch.n_tokens = 509, progress = 0.888307
slot update_slots: id  0 | task 327 | n_tokens = 509, memory_seq_rm [509, end)
slot update_slots: id  0 | task 327 | prompt processing progress, n_tokens = 573, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 327 | prompt done, n_tokens = 573, batch.n_tokens = 64
slot init_sampler: id  0 | task 327 | init sampler, took 0.05 ms, tokens: text = 573, total = 573
slot update_slots: id  0 | task 327 | created context checkpoint 1 of 8 (pos_min = 508, pos_max = 508, size = 75.376 MiB)
slot print_timing: id  0 | task 327 |
prompt eval time =    4815.77 ms /   573 tokens (    8.40 ms per token,   118.98 tokens per second)
       eval time =    1245.78 ms /    23 tokens (   54.16 ms per token,    18.46 tokens per second)
      total time =    6061.55 ms /   596 tokens
slot      release: id  0 | task 327 | stop processing: n_tokens = 595, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10884113885
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 595, total state size = 89.328 MiB
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          load:  - looking for better prompt, base f_keep = 0.034, sim = 0.002
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.972
srv        update:  - cache state: 4 prompts, 1038.033 MiB (limits: 8192.000 MiB, 32768 tokens, 146409 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv        update:    - prompt 0x5e4267889560:     661 tokens, checkpoints:  1,   166.252 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv  get_availabl: prompt cache update took 908.73 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 328 | processing task, is_child = 0
slot update_slots: id  0 | task 328 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10538
slot update_slots: id  0 | task 328 | n_tokens = 10248, memory_seq_rm [10248, end)
slot update_slots: id  0 | task 328 | prompt processing progress, n_tokens = 10474, batch.n_tokens = 226, progress = 0.993927
slot update_slots: id  0 | task 328 | n_tokens = 10474, memory_seq_rm [10474, end)
slot update_slots: id  0 | task 328 | prompt processing progress, n_tokens = 10538, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 328 | prompt done, n_tokens = 10538, batch.n_tokens = 64
slot init_sampler: id  0 | task 328 | init sampler, took 0.82 ms, tokens: text = 10538, total = 10538
slot update_slots: id  0 | task 328 | created context checkpoint 3 of 8 (pos_min = 10473, pos_max = 10473, size = 75.376 MiB)
slot print_timing: id  0 | task 328 |
prompt eval time =    3459.38 ms /   290 tokens (   11.93 ms per token,    83.83 tokens per second)
       eval time =    2195.82 ms /    39 tokens (   56.30 ms per token,    17.76 tokens per second)
      total time =    5655.20 ms /   329 tokens
slot      release: id  0 | task 328 | stop processing: n_tokens = 10576, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 393 | processing task, is_child = 0
slot update_slots: id  0 | task 393 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10631
slot update_slots: id  0 | task 393 | n_tokens = 10576, memory_seq_rm [10576, end)
slot update_slots: id  0 | task 393 | prompt processing progress, n_tokens = 10631, batch.n_tokens = 55, progress = 1.000000
slot update_slots: id  0 | task 393 | prompt done, n_tokens = 10631, batch.n_tokens = 55
slot init_sampler: id  0 | task 393 | init sampler, took 0.86 ms, tokens: text = 10631, total = 10631
slot update_slots: id  0 | task 393 | created context checkpoint 4 of 8 (pos_min = 10575, pos_max = 10575, size = 75.376 MiB)
slot print_timing: id  0 | task 393 |
prompt eval time =    1321.29 ms /    55 tokens (   24.02 ms per token,    41.63 tokens per second)
       eval time =    1632.40 ms /    30 tokens (   54.41 ms per token,    18.38 tokens per second)
      total time =    2953.69 ms /    85 tokens
slot      release: id  0 | task 393 | stop processing: n_tokens = 10660, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.802 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 424 | processing task, is_child = 0
slot update_slots: id  0 | task 424 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13296
slot update_slots: id  0 | task 424 | n_tokens = 10660, memory_seq_rm [10660, end)
slot update_slots: id  0 | task 424 | prompt processing progress, n_tokens = 12708, batch.n_tokens = 2048, progress = 0.955776
slot update_slots: id  0 | task 424 | n_tokens = 12708, memory_seq_rm [12708, end)
slot update_slots: id  0 | task 424 | prompt processing progress, n_tokens = 13232, batch.n_tokens = 524, progress = 0.995187
slot update_slots: id  0 | task 424 | n_tokens = 13232, memory_seq_rm [13232, end)
slot update_slots: id  0 | task 424 | prompt processing progress, n_tokens = 13296, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 424 | prompt done, n_tokens = 13296, batch.n_tokens = 64
slot init_sampler: id  0 | task 424 | init sampler, took 1.08 ms, tokens: text = 13296, total = 13296
slot update_slots: id  0 | task 424 | created context checkpoint 5 of 8 (pos_min = 13231, pos_max = 13231, size = 75.376 MiB)
slot print_timing: id  0 | task 424 |
prompt eval time =   19034.14 ms /  2636 tokens (    7.22 ms per token,   138.49 tokens per second)
       eval time =    1692.73 ms /    32 tokens (   52.90 ms per token,    18.90 tokens per second)
      total time =   20726.87 ms /  2668 tokens
slot      release: id  0 | task 424 | stop processing: n_tokens = 13327, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.985 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 459 | processing task, is_child = 0
slot update_slots: id  0 | task 459 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13529
slot update_slots: id  0 | task 459 | n_tokens = 13327, memory_seq_rm [13327, end)
slot update_slots: id  0 | task 459 | prompt processing progress, n_tokens = 13465, batch.n_tokens = 138, progress = 0.995269
slot update_slots: id  0 | task 459 | n_tokens = 13465, memory_seq_rm [13465, end)
slot update_slots: id  0 | task 459 | prompt processing progress, n_tokens = 13529, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 459 | prompt done, n_tokens = 13529, batch.n_tokens = 64
slot init_sampler: id  0 | task 459 | init sampler, took 1.10 ms, tokens: text = 13529, total = 13529
slot update_slots: id  0 | task 459 | created context checkpoint 6 of 8 (pos_min = 13464, pos_max = 13464, size = 75.376 MiB)
slot print_timing: id  0 | task 459 |
prompt eval time =    3199.59 ms /   202 tokens (   15.84 ms per token,    63.13 tokens per second)
       eval time =    3666.16 ms /    62 tokens (   59.13 ms per token,    16.91 tokens per second)
      total time =    6865.74 ms /   264 tokens
slot      release: id  0 | task 459 | stop processing: n_tokens = 13590, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 523 | processing task, is_child = 0
slot update_slots: id  0 | task 523 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13668
slot update_slots: id  0 | task 523 | n_tokens = 13590, memory_seq_rm [13590, end)
slot update_slots: id  0 | task 523 | prompt processing progress, n_tokens = 13604, batch.n_tokens = 14, progress = 0.995318
slot update_slots: id  0 | task 523 | n_tokens = 13604, memory_seq_rm [13604, end)
slot update_slots: id  0 | task 523 | prompt processing progress, n_tokens = 13668, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 523 | prompt done, n_tokens = 13668, batch.n_tokens = 64
slot init_sampler: id  0 | task 523 | init sampler, took 1.31 ms, tokens: text = 13668, total = 13668
slot update_slots: id  0 | task 523 | created context checkpoint 7 of 8 (pos_min = 13603, pos_max = 13603, size = 75.376 MiB)
slot print_timing: id  0 | task 523 |
prompt eval time =    1302.44 ms /    78 tokens (   16.70 ms per token,    59.89 tokens per second)
       eval time =    3741.01 ms /    67 tokens (   55.84 ms per token,    17.91 tokens per second)
      total time =    5043.44 ms /   145 tokens
slot      release: id  0 | task 523 | stop processing: n_tokens = 13734, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 592 | processing task, is_child = 0
slot update_slots: id  0 | task 592 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13823
slot update_slots: id  0 | task 592 | n_tokens = 13734, memory_seq_rm [13734, end)
slot update_slots: id  0 | task 592 | prompt processing progress, n_tokens = 13759, batch.n_tokens = 25, progress = 0.995370
slot update_slots: id  0 | task 592 | n_tokens = 13759, memory_seq_rm [13759, end)
slot update_slots: id  0 | task 592 | prompt processing progress, n_tokens = 13823, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 592 | prompt done, n_tokens = 13823, batch.n_tokens = 64
slot init_sampler: id  0 | task 592 | init sampler, took 1.36 ms, tokens: text = 13823, total = 13823
slot update_slots: id  0 | task 592 | created context checkpoint 8 of 8 (pos_min = 13758, pos_max = 13758, size = 75.376 MiB)
slot print_timing: id  0 | task 592 |
prompt eval time =    2094.38 ms /    89 tokens (   23.53 ms per token,    42.49 tokens per second)
       eval time =    4097.58 ms /    73 tokens (   56.13 ms per token,    17.82 tokens per second)
      total time =    6191.97 ms /   162 tokens
slot      release: id  0 | task 592 | stop processing: n_tokens = 13895, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 667 | processing task, is_child = 0
slot update_slots: id  0 | task 667 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13976
slot update_slots: id  0 | task 667 | n_tokens = 13895, memory_seq_rm [13895, end)
slot update_slots: id  0 | task 667 | prompt processing progress, n_tokens = 13912, batch.n_tokens = 17, progress = 0.995421
slot update_slots: id  0 | task 667 | n_tokens = 13912, memory_seq_rm [13912, end)
slot update_slots: id  0 | task 667 | prompt processing progress, n_tokens = 13976, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 667 | prompt done, n_tokens = 13976, batch.n_tokens = 64
slot init_sampler: id  0 | task 667 | init sampler, took 1.12 ms, tokens: text = 13976, total = 13976
slot update_slots: id  0 | task 667 | erasing old context checkpoint (pos_min = 9714, pos_max = 9714, size = 75.376 MiB)
slot update_slots: id  0 | task 667 | created context checkpoint 8 of 8 (pos_min = 13911, pos_max = 13911, size = 75.376 MiB)
slot print_timing: id  0 | task 667 |
prompt eval time =    1541.54 ms /    81 tokens (   19.03 ms per token,    52.54 tokens per second)
       eval time =    2121.52 ms /    37 tokens (   57.34 ms per token,    17.44 tokens per second)
      total time =    3663.06 ms /   118 tokens
slot      release: id  0 | task 667 | stop processing: n_tokens = 14012, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.991 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 706 | processing task, is_child = 0
slot update_slots: id  0 | task 706 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14137
slot update_slots: id  0 | task 706 | n_tokens = 14012, memory_seq_rm [14012, end)
slot update_slots: id  0 | task 706 | prompt processing progress, n_tokens = 14073, batch.n_tokens = 61, progress = 0.995473
slot update_slots: id  0 | task 706 | n_tokens = 14073, memory_seq_rm [14073, end)
slot update_slots: id  0 | task 706 | prompt processing progress, n_tokens = 14137, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 706 | prompt done, n_tokens = 14137, batch.n_tokens = 64
slot init_sampler: id  0 | task 706 | init sampler, took 1.06 ms, tokens: text = 14137, total = 14137
slot update_slots: id  0 | task 706 | erasing old context checkpoint (pos_min = 10125, pos_max = 10125, size = 75.376 MiB)
slot update_slots: id  0 | task 706 | created context checkpoint 8 of 8 (pos_min = 14072, pos_max = 14072, size = 75.376 MiB)
slot print_timing: id  0 | task 706 |
prompt eval time =    2541.72 ms /   125 tokens (   20.33 ms per token,    49.18 tokens per second)
       eval time =    1790.02 ms /    31 tokens (   57.74 ms per token,    17.32 tokens per second)
      total time =    4331.74 ms /   156 tokens
slot      release: id  0 | task 706 | stop processing: n_tokens = 14167, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.987 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 739 | processing task, is_child = 0
slot update_slots: id  0 | task 739 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14349
slot update_slots: id  0 | task 739 | n_tokens = 14167, memory_seq_rm [14167, end)
slot update_slots: id  0 | task 739 | prompt processing progress, n_tokens = 14285, batch.n_tokens = 118, progress = 0.995540
slot update_slots: id  0 | task 739 | n_tokens = 14285, memory_seq_rm [14285, end)
slot update_slots: id  0 | task 739 | prompt processing progress, n_tokens = 14349, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 739 | prompt done, n_tokens = 14349, batch.n_tokens = 64
slot init_sampler: id  0 | task 739 | init sampler, took 1.27 ms, tokens: text = 14349, total = 14349
slot update_slots: id  0 | task 739 | erasing old context checkpoint (pos_min = 10473, pos_max = 10473, size = 75.376 MiB)
slot update_slots: id  0 | task 739 | created context checkpoint 8 of 8 (pos_min = 14284, pos_max = 14284, size = 75.376 MiB)
slot print_timing: id  0 | task 739 |
prompt eval time =    2861.86 ms /   182 tokens (   15.72 ms per token,    63.59 tokens per second)
       eval time =    1670.05 ms /    31 tokens (   53.87 ms per token,    18.56 tokens per second)
      total time =    4531.91 ms /   213 tokens
slot      release: id  0 | task 739 | stop processing: n_tokens = 14379, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.987 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 772 | processing task, is_child = 0
slot update_slots: id  0 | task 772 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14573
slot update_slots: id  0 | task 772 | n_tokens = 14379, memory_seq_rm [14379, end)
slot update_slots: id  0 | task 772 | prompt processing progress, n_tokens = 14509, batch.n_tokens = 130, progress = 0.995608
slot update_slots: id  0 | task 772 | n_tokens = 14509, memory_seq_rm [14509, end)
slot update_slots: id  0 | task 772 | prompt processing progress, n_tokens = 14573, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 772 | prompt done, n_tokens = 14573, batch.n_tokens = 64
slot init_sampler: id  0 | task 772 | init sampler, took 1.18 ms, tokens: text = 14573, total = 14573
slot update_slots: id  0 | task 772 | erasing old context checkpoint (pos_min = 10575, pos_max = 10575, size = 75.376 MiB)
slot update_slots: id  0 | task 772 | created context checkpoint 8 of 8 (pos_min = 14508, pos_max = 14508, size = 75.376 MiB)
slot print_timing: id  0 | task 772 |
prompt eval time =    2477.22 ms /   194 tokens (   12.77 ms per token,    78.31 tokens per second)
       eval time =    2595.48 ms /    48 tokens (   54.07 ms per token,    18.49 tokens per second)
      total time =    5072.70 ms /   242 tokens
slot      release: id  0 | task 772 | stop processing: n_tokens = 14620, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10957331969
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 14620, total state size = 418.200 MiB
srv  params_from_: Chat format: Qwen3 Coder
srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.061
srv          load:  - found better prompt with f_keep = 0.451, sim = 0.903
srv        update:  - cache state: 4 prompts, 1892.987 MiB (limits: 8192.000 MiB, 32768 tokens, 140693 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e424eb36820:   14620 tokens, checkpoints:  8,  1021.206 MiB
srv  get_availabl: prompt cache update took 7377.53 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 822 | processing task, is_child = 0
slot update_slots: id  0 | task 822 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 330
slot update_slots: id  0 | task 822 | n_past = 298, slot.prompt.tokens.size() = 661, seq_id = 0, pos_min = 660, n_swa = 1
slot update_slots: id  0 | task 822 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 822 | erased invalidated context checkpoint (pos_min = 574, pos_max = 574, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 822 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 822 | prompt processing progress, n_tokens = 266, batch.n_tokens = 266, progress = 0.806061
slot update_slots: id  0 | task 822 | n_tokens = 266, memory_seq_rm [266, end)
slot update_slots: id  0 | task 822 | prompt processing progress, n_tokens = 330, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 822 | prompt done, n_tokens = 330, batch.n_tokens = 64
slot init_sampler: id  0 | task 822 | init sampler, took 0.03 ms, tokens: text = 330, total = 330
slot update_slots: id  0 | task 822 | created context checkpoint 1 of 8 (pos_min = 265, pos_max = 265, size = 75.376 MiB)
slot print_timing: id  0 | task 822 |
prompt eval time =    4005.07 ms /   330 tokens (   12.14 ms per token,    82.40 tokens per second)
       eval time =    1264.52 ms /    23 tokens (   54.98 ms per token,    18.19 tokens per second)
      total time =    5269.59 ms /   353 tokens
slot      release: id  0 | task 822 | stop processing: n_tokens = 352, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 10970068296
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 352, total state size = 83.630 MiB
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          load:  - looking for better prompt, base f_keep = 0.057, sim = 0.001
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.996
srv        update:  - cache state: 4 prompts, 1030.787 MiB (limits: 8192.000 MiB, 32768 tokens, 144983 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e42624d83a0:   17128 tokens, checkpoints:  1,   552.385 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv  get_availabl: prompt cache update took 283.02 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 823 | processing task, is_child = 0
slot update_slots: id  0 | task 823 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14674
slot update_slots: id  0 | task 823 | n_tokens = 14620, memory_seq_rm [14620, end)
slot update_slots: id  0 | task 823 | prompt processing progress, n_tokens = 14674, batch.n_tokens = 54, progress = 1.000000
slot update_slots: id  0 | task 823 | prompt done, n_tokens = 14674, batch.n_tokens = 54
slot init_sampler: id  0 | task 823 | init sampler, took 1.45 ms, tokens: text = 14674, total = 14674
slot update_slots: id  0 | task 823 | erasing old context checkpoint (pos_min = 13231, pos_max = 13231, size = 75.376 MiB)
slot update_slots: id  0 | task 823 | created context checkpoint 8 of 8 (pos_min = 14619, pos_max = 14619, size = 75.376 MiB)
slot print_timing: id  0 | task 823 |
prompt eval time =    1103.75 ms /    54 tokens (   20.44 ms per token,    48.92 tokens per second)
       eval time =   37767.49 ms /   662 tokens (   57.05 ms per token,    17.53 tokens per second)
      total time =   38871.24 ms /   716 tokens
slot      release: id  0 | task 823 | stop processing: n_tokens = 15335, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 11009224197
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 15335, total state size = 434.966 MiB
srv          load:  - looking for better prompt, base f_keep = 0.003, sim = 0.002
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.961
srv        update:  - cache state: 4 prompts, 1516.374 MiB (limits: 8192.000 MiB, 32768 tokens, 88868 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426a0950c0:   15335 tokens, checkpoints:  8,  1037.972 MiB
srv  get_availabl: prompt cache update took 6211.05 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 1510 | processing task, is_child = 0
slot update_slots: id  0 | task 1510 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 17825
slot update_slots: id  0 | task 1510 | n_tokens = 17128, memory_seq_rm [17128, end)
slot update_slots: id  0 | task 1510 | prompt processing progress, n_tokens = 17761, batch.n_tokens = 633, progress = 0.996410
slot update_slots: id  0 | task 1510 | n_tokens = 17761, memory_seq_rm [17761, end)
slot update_slots: id  0 | task 1510 | prompt processing progress, n_tokens = 17825, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 1510 | prompt done, n_tokens = 17825, batch.n_tokens = 64
slot init_sampler: id  0 | task 1510 | init sampler, took 1.38 ms, tokens: text = 17825, total = 17825
slot update_slots: id  0 | task 1510 | created context checkpoint 2 of 8 (pos_min = 17760, pos_max = 17760, size = 75.376 MiB)
slot print_timing: id  0 | task 1510 |
prompt eval time =    8561.37 ms /   697 tokens (   12.28 ms per token,    81.41 tokens per second)
       eval time =    2397.15 ms /    44 tokens (   54.48 ms per token,    18.36 tokens per second)
      total time =   10958.52 ms /   741 tokens
slot      release: id  0 | task 1510 | stop processing: n_tokens = 17868, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.873 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 1556 | processing task, is_child = 0
slot update_slots: id  0 | task 1556 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 20474
slot update_slots: id  0 | task 1556 | n_tokens = 17868, memory_seq_rm [17868, end)
slot update_slots: id  0 | task 1556 | prompt processing progress, n_tokens = 19916, batch.n_tokens = 2048, progress = 0.972746
slot update_slots: id  0 | task 1556 | n_tokens = 19916, memory_seq_rm [19916, end)
slot update_slots: id  0 | task 1556 | prompt processing progress, n_tokens = 20410, batch.n_tokens = 494, progress = 0.996874
slot update_slots: id  0 | task 1556 | n_tokens = 20410, memory_seq_rm [20410, end)
slot update_slots: id  0 | task 1556 | prompt processing progress, n_tokens = 20474, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 1556 | prompt done, n_tokens = 20474, batch.n_tokens = 64
slot init_sampler: id  0 | task 1556 | init sampler, took 1.64 ms, tokens: text = 20474, total = 20474
slot update_slots: id  0 | task 1556 | created context checkpoint 3 of 8 (pos_min = 20409, pos_max = 20409, size = 75.376 MiB)
slot print_timing: id  0 | task 1556 |
prompt eval time =   22336.34 ms /  2606 tokens (    8.57 ms per token,   116.67 tokens per second)
       eval time =    1963.70 ms /    34 tokens (   57.76 ms per token,    17.31 tokens per second)
      total time =   24300.04 ms /  2640 tokens
slot      release: id  0 | task 1556 | stop processing: n_tokens = 20507, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.951 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 1593 | processing task, is_child = 0
slot update_slots: id  0 | task 1593 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21552
slot update_slots: id  0 | task 1593 | n_past = 20503, slot.prompt.tokens.size() = 20507, seq_id = 0, pos_min = 20506, n_swa = 1
slot update_slots: id  0 | task 1593 | restored context checkpoint (pos_min = 20409, pos_max = 20409, size = 75.376 MiB)
slot update_slots: id  0 | task 1593 | n_tokens = 20410, memory_seq_rm [20410, end)
slot update_slots: id  0 | task 1593 | prompt processing progress, n_tokens = 21488, batch.n_tokens = 1078, progress = 0.997030
slot update_slots: id  0 | task 1593 | n_tokens = 21488, memory_seq_rm [21488, end)
slot update_slots: id  0 | task 1593 | prompt processing progress, n_tokens = 21552, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 1593 | prompt done, n_tokens = 21552, batch.n_tokens = 64
slot init_sampler: id  0 | task 1593 | init sampler, took 1.66 ms, tokens: text = 21552, total = 21552
slot update_slots: id  0 | task 1593 | created context checkpoint 4 of 8 (pos_min = 21487, pos_max = 21487, size = 75.376 MiB)
slot print_timing: id  0 | task 1593 |
prompt eval time =    6580.21 ms /  1142 tokens (    5.76 ms per token,   173.55 tokens per second)
       eval time =   20194.20 ms /   380 tokens (   53.14 ms per token,    18.82 tokens per second)
      total time =   26774.40 ms /  1522 tokens
slot      release: id  0 | task 1593 | stop processing: n_tokens = 21931, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 11078044457
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 21931, total state size = 589.635 MiB
srv          load:  - looking for better prompt, base f_keep = 0.002, sim = 0.004
srv        update:  - cache state: 5 prompts, 2407.512 MiB (limits: 8192.000 MiB, 32768 tokens, 130598 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426a0950c0:   15335 tokens, checkpoints:  8,  1037.972 MiB
srv        update:    - prompt 0x5e426bba13c0:   21931 tokens, checkpoints:  4,   891.138 MiB
srv  get_availabl: prompt cache update took 3427.35 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 1975 | processing task, is_child = 0
slot update_slots: id  0 | task 1975 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10107
slot update_slots: id  0 | task 1975 | n_past = 40, slot.prompt.tokens.size() = 21931, seq_id = 0, pos_min = 21930, n_swa = 1
slot update_slots: id  0 | task 1975 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 1975 | erased invalidated context checkpoint (pos_min = 16897, pos_max = 16897, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 1975 | erased invalidated context checkpoint (pos_min = 17760, pos_max = 17760, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 1975 | erased invalidated context checkpoint (pos_min = 20409, pos_max = 20409, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 1975 | erased invalidated context checkpoint (pos_min = 21487, pos_max = 21487, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 1975 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 1975 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.202632
slot update_slots: id  0 | task 1975 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id  0 | task 1975 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.405264
slot update_slots: id  0 | task 1975 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id  0 | task 1975 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.607895
slot update_slots: id  0 | task 1975 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id  0 | task 1975 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.810527
slot update_slots: id  0 | task 1975 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id  0 | task 1975 | prompt processing progress, n_tokens = 10043, batch.n_tokens = 1851, progress = 0.993668
slot update_slots: id  0 | task 1975 | n_tokens = 10043, memory_seq_rm [10043, end)
slot update_slots: id  0 | task 1975 | prompt processing progress, n_tokens = 10107, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 1975 | prompt done, n_tokens = 10107, batch.n_tokens = 64
slot init_sampler: id  0 | task 1975 | init sampler, took 0.81 ms, tokens: text = 10107, total = 10107
slot update_slots: id  0 | task 1975 | created context checkpoint 1 of 8 (pos_min = 10042, pos_max = 10042, size = 75.376 MiB)
slot print_timing: id  0 | task 1975 |
prompt eval time =   48887.82 ms / 10107 tokens (    4.84 ms per token,   206.74 tokens per second)
       eval time =    2005.96 ms /    37 tokens (   54.22 ms per token,    18.45 tokens per second)
      total time =   50893.78 ms / 10144 tokens
slot      release: id  0 | task 1975 | stop processing: n_tokens = 10143, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.967 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2018 | processing task, is_child = 0
slot update_slots: id  0 | task 2018 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 10493
slot update_slots: id  0 | task 2018 | n_tokens = 10143, memory_seq_rm [10143, end)
slot update_slots: id  0 | task 2018 | prompt processing progress, n_tokens = 10429, batch.n_tokens = 286, progress = 0.993901
slot update_slots: id  0 | task 2018 | n_tokens = 10429, memory_seq_rm [10429, end)
slot update_slots: id  0 | task 2018 | prompt processing progress, n_tokens = 10493, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2018 | prompt done, n_tokens = 10493, batch.n_tokens = 64
slot init_sampler: id  0 | task 2018 | init sampler, took 0.82 ms, tokens: text = 10493, total = 10493
slot update_slots: id  0 | task 2018 | created context checkpoint 2 of 8 (pos_min = 10428, pos_max = 10428, size = 75.376 MiB)
slot print_timing: id  0 | task 2018 |
prompt eval time =    4169.57 ms /   350 tokens (   11.91 ms per token,    83.94 tokens per second)
       eval time =    2379.56 ms /    43 tokens (   55.34 ms per token,    18.07 tokens per second)
      total time =    6549.12 ms /   393 tokens
slot      release: id  0 | task 2018 | stop processing: n_tokens = 10535, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.800 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2063 | processing task, is_child = 0
slot update_slots: id  0 | task 2063 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13171
slot update_slots: id  0 | task 2063 | n_tokens = 10535, memory_seq_rm [10535, end)
slot update_slots: id  0 | task 2063 | prompt processing progress, n_tokens = 12583, batch.n_tokens = 2048, progress = 0.955356
slot update_slots: id  0 | task 2063 | n_tokens = 12583, memory_seq_rm [12583, end)
slot update_slots: id  0 | task 2063 | prompt processing progress, n_tokens = 13107, batch.n_tokens = 524, progress = 0.995141
slot update_slots: id  0 | task 2063 | n_tokens = 13107, memory_seq_rm [13107, end)
slot update_slots: id  0 | task 2063 | prompt processing progress, n_tokens = 13171, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2063 | prompt done, n_tokens = 13171, batch.n_tokens = 64
slot init_sampler: id  0 | task 2063 | init sampler, took 1.07 ms, tokens: text = 13171, total = 13171
slot update_slots: id  0 | task 2063 | created context checkpoint 3 of 8 (pos_min = 13106, pos_max = 13106, size = 75.376 MiB)
slot print_timing: id  0 | task 2063 |
prompt eval time =   15695.50 ms /  2636 tokens (    5.95 ms per token,   167.95 tokens per second)
       eval time =    1647.42 ms /    32 tokens (   51.48 ms per token,    19.42 tokens per second)
      total time =   17342.92 ms /  2668 tokens
slot      release: id  0 | task 2063 | stop processing: n_tokens = 13202, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.985 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2098 | processing task, is_child = 0
slot update_slots: id  0 | task 2098 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13404
slot update_slots: id  0 | task 2098 | n_tokens = 13202, memory_seq_rm [13202, end)
slot update_slots: id  0 | task 2098 | prompt processing progress, n_tokens = 13340, batch.n_tokens = 138, progress = 0.995225
slot update_slots: id  0 | task 2098 | n_tokens = 13340, memory_seq_rm [13340, end)
slot update_slots: id  0 | task 2098 | prompt processing progress, n_tokens = 13404, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2098 | prompt done, n_tokens = 13404, batch.n_tokens = 64
slot init_sampler: id  0 | task 2098 | init sampler, took 1.07 ms, tokens: text = 13404, total = 13404
slot update_slots: id  0 | task 2098 | created context checkpoint 4 of 8 (pos_min = 13339, pos_max = 13339, size = 75.376 MiB)
slot print_timing: id  0 | task 2098 |
prompt eval time =    2317.17 ms /   202 tokens (   11.47 ms per token,    87.18 tokens per second)
       eval time =    2776.01 ms /    49 tokens (   56.65 ms per token,    17.65 tokens per second)
      total time =    5093.19 ms /   251 tokens
slot      release: id  0 | task 2098 | stop processing: n_tokens = 13452, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.978 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2149 | processing task, is_child = 0
slot update_slots: id  0 | task 2149 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13753
slot update_slots: id  0 | task 2149 | n_tokens = 13452, memory_seq_rm [13452, end)
slot update_slots: id  0 | task 2149 | prompt processing progress, n_tokens = 13689, batch.n_tokens = 237, progress = 0.995346
^Oslot update_slots: id  0 | task 2149 | n_tokens = 13689, memory_seq_rm [13689, end)
slot update_slots: id  0 | task 2149 | prompt processing progress, n_tokens = 13753, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2149 | prompt done, n_tokens = 13753, batch.n_tokens = 64
slot init_sampler: id  0 | task 2149 | init sampler, took 1.14 ms, tokens: text = 13753, total = 13753
slot update_slots: id  0 | task 2149 | created context checkpoint 5 of 8 (pos_min = 13688, pos_max = 13688, size = 75.376 MiB)
slot print_timing: id  0 | task 2149 |
prompt eval time =    3427.18 ms /   301 tokens (   11.39 ms per token,    87.83 tokens per second)
       eval time =    2794.28 ms /    46 tokens (   60.75 ms per token,    16.46 tokens per second)
      total time =    6221.46 ms /   347 tokens
slot      release: id  0 | task 2149 | stop processing: n_tokens = 13798, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 11171476651
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 13798, total state size = 398.925 MiB
srv  params_from_: Chat format: Qwen3 Coder
srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.063
srv        update:  - cache state: 6 prompts, 3183.316 MiB (limits: 8192.000 MiB, 32768 tokens, 134278 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426a0950c0:   15335 tokens, checkpoints:  8,  1037.972 MiB
srv        update:    - prompt 0x5e426bba13c0:   21931 tokens, checkpoints:  4,   891.138 MiB
srv        update:    - prompt 0x5e42692b4330:   13798 tokens, checkpoints:  5,   775.804 MiB
srv  get_availabl: prompt cache update took 2569.31 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2197 | processing task, is_child = 0
slot update_slots: id  0 | task 2197 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 319
slot update_slots: id  0 | task 2197 | n_past = 20, slot.prompt.tokens.size() = 13798, seq_id = 0, pos_min = 13797, n_swa = 1
slot update_slots: id  0 | task 2197 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 2197 | erased invalidated context checkpoint (pos_min = 10042, pos_max = 10042, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 2197 | erased invalidated context checkpoint (pos_min = 10428, pos_max = 10428, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 2197 | erased invalidated context checkpoint (pos_min = 13106, pos_max = 13106, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 2197 | erased invalidated context checkpoint (pos_min = 13339, pos_max = 13339, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 2197 | erased invalidated context checkpoint (pos_min = 13688, pos_max = 13688, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 2197 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 2197 | prompt processing progress, n_tokens = 255, batch.n_tokens = 255, progress = 0.799373
slot update_slots: id  0 | task 2197 | n_tokens = 255, memory_seq_rm [255, end)
slot update_slots: id  0 | task 2197 | prompt processing progress, n_tokens = 319, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2197 | prompt done, n_tokens = 319, batch.n_tokens = 64
slot init_sampler: id  0 | task 2197 | init sampler, took 0.03 ms, tokens: text = 319, total = 319
slot update_slots: id  0 | task 2197 | created context checkpoint 1 of 8 (pos_min = 254, pos_max = 254, size = 75.376 MiB)
slot print_timing: id  0 | task 2197 |
prompt eval time =    6916.62 ms /   319 tokens (   21.68 ms per token,    46.12 tokens per second)
       eval time =    2074.63 ms /    34 tokens (   61.02 ms per token,    16.39 tokens per second)
      total time =    8991.25 ms /   353 tokens
slot      release: id  0 | task 2197 | stop processing: n_tokens = 352, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 11183297025
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 352, total state size = 83.630 MiB
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          load:  - looking for better prompt, base f_keep = 0.057, sim = 0.001
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.997
srv        update:  - cache state: 6 prompts, 2566.518 MiB (limits: 8192.000 MiB, 32768 tokens, 123630 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426a0950c0:   15335 tokens, checkpoints:  8,  1037.972 MiB
srv        update:    - prompt 0x5e426bba13c0:   21931 tokens, checkpoints:  4,   891.138 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv  get_availabl: prompt cache update took 589.95 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2198 | processing task, is_child = 0
slot update_slots: id  0 | task 2198 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13843
slot update_slots: id  0 | task 2198 | n_tokens = 13798, memory_seq_rm [13798, end)
slot update_slots: id  0 | task 2198 | prompt processing progress, n_tokens = 13843, batch.n_tokens = 45, progress = 1.000000
slot update_slots: id  0 | task 2198 | prompt done, n_tokens = 13843, batch.n_tokens = 45
slot init_sampler: id  0 | task 2198 | init sampler, took 1.12 ms, tokens: text = 13843, total = 13843
slot update_slots: id  0 | task 2198 | created context checkpoint 6 of 8 (pos_min = 13797, pos_max = 13797, size = 75.376 MiB)
slot print_timing: id  0 | task 2198 |
prompt eval time =    1294.61 ms /    45 tokens (   28.77 ms per token,    34.76 tokens per second)
       eval time =    1716.28 ms /    31 tokens (   55.36 ms per token,    18.06 tokens per second)
      total time =    3010.89 ms /    76 tokens
slot      release: id  0 | task 2198 | stop processing: n_tokens = 13873, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.986 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 2265 | processing task, is_child = 0
slot update_slots: id  0 | task 2265 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 14067
slot update_slots: id  0 | task 2265 | n_tokens = 13873, memory_seq_rm [13873, end)
slot update_slots: id  0 | task 2265 | prompt processing progress, n_tokens = 14003, batch.n_tokens = 130, progress = 0.995450
slot update_slots: id  0 | task 2265 | n_tokens = 14003, memory_seq_rm [14003, end)
slot update_slots: id  0 | task 2265 | prompt processing progress, n_tokens = 14067, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 2265 | prompt done, n_tokens = 14067, batch.n_tokens = 64
slot init_sampler: id  0 | task 2265 | init sampler, took 1.34 ms, tokens: text = 14067, total = 14067
slot update_slots: id  0 | task 2265 | created context checkpoint 7 of 8 (pos_min = 14002, pos_max = 14002, size = 75.376 MiB)
slot print_timing: id  0 | task 2265 |
prompt eval time =    3538.61 ms /   194 tokens (   18.24 ms per token,    54.82 tokens per second)
       eval time =   98183.48 ms /  1680 tokens (   58.44 ms per token,    17.11 tokens per second)
      total time =  101722.09 ms /  1874 tokens
slot      release: id  0 | task 2265 | stop processing: n_tokens = 15746, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 11288798023
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 15746, total state size = 444.603 MiB
srv          load:  - looking for better prompt, base f_keep = 0.003, sim = 0.002
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.928
srv        update:  - cache state: 6 prompts, 2647.614 MiB (limits: 8192.000 MiB, 32768 tokens, 100706 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426a0950c0:   15335 tokens, checkpoints:  8,  1037.972 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv  get_availabl: prompt cache update took 11613.77 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 3947 | processing task, is_child = 0
slot update_slots: id  0 | task 3947 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23642
slot update_slots: id  0 | task 3947 | n_tokens = 21931, memory_seq_rm [21931, end)
slot update_slots: id  0 | task 3947 | prompt processing progress, n_tokens = 23578, batch.n_tokens = 1647, progress = 0.997293
slot update_slots: id  0 | task 3947 | n_tokens = 23578, memory_seq_rm [23578, end)
slot update_slots: id  0 | task 3947 | prompt processing progress, n_tokens = 23642, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 3947 | prompt done, n_tokens = 23642, batch.n_tokens = 64
slot init_sampler: id  0 | task 3947 | init sampler, took 2.13 ms, tokens: text = 23642, total = 23642
slot update_slots: id  0 | task 3947 | created context checkpoint 5 of 8 (pos_min = 23577, pos_max = 23577, size = 75.376 MiB)
slot print_timing: id  0 | task 3947 |
prompt eval time =   15749.75 ms /  1711 tokens (    9.21 ms per token,   108.64 tokens per second)
       eval time =   11339.72 ms /   196 tokens (   57.86 ms per token,    17.28 tokens per second)
      total time =   27089.48 ms /  1907 tokens
slot      release: id  0 | task 3947 | stop processing: n_tokens = 23837, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
^[[B^[[Asrv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 4145 | processing task, is_child = 0
slot update_slots: id  0 | task 4145 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23894
slot update_slots: id  0 | task 4145 | n_tokens = 23837, memory_seq_rm [23837, end)
slot update_slots: id  0 | task 4145 | prompt processing progress, n_tokens = 23894, batch.n_tokens = 57, progress = 1.000000
slot update_slots: id  0 | task 4145 | prompt done, n_tokens = 23894, batch.n_tokens = 57
slot init_sampler: id  0 | task 4145 | init sampler, took 1.93 ms, tokens: text = 23894, total = 23894
slot update_slots: id  0 | task 4145 | created context checkpoint 6 of 8 (pos_min = 23836, pos_max = 23836, size = 75.376 MiB)
slot print_timing: id  0 | task 4145 |
prompt eval time =    8031.84 ms /    57 tokens (  140.91 ms per token,     7.10 tokens per second)
       eval time =   53588.53 ms /   802 tokens (   66.82 ms per token,    14.97 tokens per second)
      total time =   61620.37 ms /   859 tokens
slot      release: id  0 | task 4145 | stop processing: n_tokens = 24695, truncated = 0
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 4948 | processing task, is_child = 0
slot update_slots: id  0 | task 4948 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 24838
slot update_slots: id  0 | task 4948 | n_tokens = 24695, memory_seq_rm [24695, end)
slot update_slots: id  0 | task 4948 | prompt processing progress, n_tokens = 24774, batch.n_tokens = 79, progress = 0.997423
slot update_slots: id  0 | task 4948 | n_tokens = 24774, memory_seq_rm [24774, end)
slot update_slots: id  0 | task 4948 | prompt processing progress, n_tokens = 24838, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 4948 | prompt done, n_tokens = 24838, batch.n_tokens = 64
slot init_sampler: id  0 | task 4948 | init sampler, took 2.04 ms, tokens: text = 24838, total = 24838
slot update_slots: id  0 | task 4948 | created context checkpoint 7 of 8 (pos_min = 24773, pos_max = 24773, size = 75.376 MiB)
slot print_timing: id  0 | task 4948 |
prompt eval time =    5046.80 ms /   143 tokens (   35.29 ms per token,    28.33 tokens per second)
       eval time =     841.27 ms /    14 tokens (   60.09 ms per token,    16.64 tokens per second)
      total time =    5888.07 ms /   157 tokens
slot      release: id  0 | task 4948 | stop processing: n_tokens = 24851, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.952 (> 0.100 thold), f_keep = 0.681
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 4964 | processing task, is_child = 0
slot update_slots: id  0 | task 4964 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 17765
slot update_slots: id  0 | task 4964 | n_past = 16917, slot.prompt.tokens.size() = 24851, seq_id = 0, pos_min = 24850, n_swa = 1
slot update_slots: id  0 | task 4964 | restored context checkpoint (pos_min = 16897, pos_max = 16897, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | erased invalidated context checkpoint (pos_min = 17760, pos_max = 17760, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | erased invalidated context checkpoint (pos_min = 20409, pos_max = 20409, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | erased invalidated context checkpoint (pos_min = 21487, pos_max = 21487, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | erased invalidated context checkpoint (pos_min = 23577, pos_max = 23577, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | erased invalidated context checkpoint (pos_min = 23836, pos_max = 23836, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | erased invalidated context checkpoint (pos_min = 24773, pos_max = 24773, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 4964 | n_tokens = 16898, memory_seq_rm [16898, end)
slot update_slots: id  0 | task 4964 | prompt processing progress, n_tokens = 17701, batch.n_tokens = 803, progress = 0.996397
slot update_slots: id  0 | task 4964 | n_tokens = 17701, memory_seq_rm [17701, end)
slot update_slots: id  0 | task 4964 | prompt processing progress, n_tokens = 17765, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 4964 | prompt done, n_tokens = 17765, batch.n_tokens = 64
slot init_sampler: id  0 | task 4964 | init sampler, took 1.44 ms, tokens: text = 17765, total = 17765
slot update_slots: id  0 | task 4964 | created context checkpoint 2 of 8 (pos_min = 17700, pos_max = 17700, size = 75.376 MiB)
slot print_timing: id  0 | task 4964 |
prompt eval time =   14860.42 ms /   867 tokens (   17.14 ms per token,    58.34 tokens per second)
       eval time =    8429.19 ms /   144 tokens (   58.54 ms per token,    17.08 tokens per second)
      total time =   23289.60 ms /  1011 tokens
slot      release: id  0 | task 4964 | stop processing: n_tokens = 17908, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 12109752620
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 17908, total state size = 495.300 MiB
srv          load:  - looking for better prompt, base f_keep = 0.002, sim = 0.004
srv          load:  - found better prompt with f_keep = 0.631, sim = 0.992
srv        update:  - cache state: 6 prompts, 2255.693 MiB (limits: 8192.000 MiB, 32768 tokens, 127548 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv        update:    - prompt 0x5e426baf03e0:   17908 tokens, checkpoints:  2,   646.052 MiB
srv  get_availabl: prompt cache update took 5872.87 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5110 | processing task, is_child = 0
slot update_slots: id  0 | task 5110 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 9752
slot update_slots: id  0 | task 5110 | n_past = 9678, slot.prompt.tokens.size() = 15335, seq_id = 0, pos_min = 15334, n_swa = 1
slot update_slots: id  0 | task 5110 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 13464, pos_max = 13464, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 13603, pos_max = 13603, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 13758, pos_max = 13758, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 13911, pos_max = 13911, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 14072, pos_max = 14072, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 14284, pos_max = 14284, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 14508, pos_max = 14508, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | erased invalidated context checkpoint (pos_min = 14619, pos_max = 14619, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 5110 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 5110 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.210008
slot update_slots: id  0 | task 5110 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id  0 | task 5110 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.420016
slot update_slots: id  0 | task 5110 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id  0 | task 5110 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.630025
slot update_slots: id  0 | task 5110 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id  0 | task 5110 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.840033
slot update_slots: id  0 | task 5110 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id  0 | task 5110 | prompt processing progress, n_tokens = 9688, batch.n_tokens = 1496, progress = 0.993437
slot update_slots: id  0 | task 5110 | n_tokens = 9688, memory_seq_rm [9688, end)
slot update_slots: id  0 | task 5110 | prompt processing progress, n_tokens = 9752, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5110 | prompt done, n_tokens = 9752, batch.n_tokens = 64
slot init_sampler: id  0 | task 5110 | init sampler, took 0.78 ms, tokens: text = 9752, total = 9752
slot update_slots: id  0 | task 5110 | created context checkpoint 1 of 8 (pos_min = 9687, pos_max = 9687, size = 75.376 MiB)
slot print_timing: id  0 | task 5110 |
prompt eval time =   73465.76 ms /  9752 tokens (    7.53 ms per token,   132.74 tokens per second)
       eval time =    2107.18 ms /    37 tokens (   56.95 ms per token,    17.56 tokens per second)
      total time =   75572.94 ms /  9789 tokens
slot      release: id  0 | task 5110 | stop processing: n_tokens = 9788, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.987 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5153 | processing task, is_child = 0
slot update_slots: id  0 | task 5153 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 9913
slot update_slots: id  0 | task 5153 | n_tokens = 9788, memory_seq_rm [9788, end)
slot update_slots: id  0 | task 5153 | prompt processing progress, n_tokens = 9849, batch.n_tokens = 61, progress = 0.993544
slot update_slots: id  0 | task 5153 | n_tokens = 9849, memory_seq_rm [9849, end)
slot update_slots: id  0 | task 5153 | prompt processing progress, n_tokens = 9913, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5153 | prompt done, n_tokens = 9913, batch.n_tokens = 64
slot init_sampler: id  0 | task 5153 | init sampler, took 0.75 ms, tokens: text = 9913, total = 9913
slot update_slots: id  0 | task 5153 | created context checkpoint 2 of 8 (pos_min = 9848, pos_max = 9848, size = 75.376 MiB)
slot print_timing: id  0 | task 5153 |
prompt eval time =    2714.39 ms /   125 tokens (   21.72 ms per token,    46.05 tokens per second)
       eval time =    1830.79 ms /    30 tokens (   61.03 ms per token,    16.39 tokens per second)
      total time =    4545.19 ms /   155 tokens
slot      release: id  0 | task 5153 | stop processing: n_tokens = 9942, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.790 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5185 | processing task, is_child = 0
slot update_slots: id  0 | task 5185 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 12578
slot update_slots: id  0 | task 5185 | n_tokens = 9942, memory_seq_rm [9942, end)
slot update_slots: id  0 | task 5185 | prompt processing progress, n_tokens = 11990, batch.n_tokens = 2048, progress = 0.953252
slot update_slots: id  0 | task 5185 | n_tokens = 11990, memory_seq_rm [11990, end)
slot update_slots: id  0 | task 5185 | prompt processing progress, n_tokens = 12514, batch.n_tokens = 524, progress = 0.994912
slot update_slots: id  0 | task 5185 | n_tokens = 12514, memory_seq_rm [12514, end)
slot update_slots: id  0 | task 5185 | prompt processing progress, n_tokens = 12578, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5185 | prompt done, n_tokens = 12578, batch.n_tokens = 64
slot init_sampler: id  0 | task 5185 | init sampler, took 1.07 ms, tokens: text = 12578, total = 12578
slot update_slots: id  0 | task 5185 | created context checkpoint 3 of 8 (pos_min = 12513, pos_max = 12513, size = 75.376 MiB)
slot print_timing: id  0 | task 5185 |
prompt eval time =   22554.48 ms /  2636 tokens (    8.56 ms per token,   116.87 tokens per second)
       eval time =    1803.50 ms /    31 tokens (   58.18 ms per token,    17.19 tokens per second)
      total time =   24357.98 ms /  2667 tokens
slot      release: id  0 | task 5185 | stop processing: n_tokens = 12608, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5219 | processing task, is_child = 0
slot update_slots: id  0 | task 5219 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 12684
slot update_slots: id  0 | task 5219 | n_tokens = 12608, memory_seq_rm [12608, end)
slot update_slots: id  0 | task 5219 | prompt processing progress, n_tokens = 12620, batch.n_tokens = 12, progress = 0.994954
slot update_slots: id  0 | task 5219 | n_tokens = 12620, memory_seq_rm [12620, end)
slot update_slots: id  0 | task 5219 | prompt processing progress, n_tokens = 12684, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5219 | prompt done, n_tokens = 12684, batch.n_tokens = 64
slot init_sampler: id  0 | task 5219 | init sampler, took 1.01 ms, tokens: text = 12684, total = 12684
slot update_slots: id  0 | task 5219 | created context checkpoint 4 of 8 (pos_min = 12619, pos_max = 12619, size = 75.376 MiB)
slot print_timing: id  0 | task 5219 |
prompt eval time =    1333.55 ms /    76 tokens (   17.55 ms per token,    56.99 tokens per second)
       eval time =    1785.36 ms /    32 tokens (   55.79 ms per token,    17.92 tokens per second)
      total time =    3118.91 ms /   108 tokens
slot      release: id  0 | task 5219 | stop processing: n_tokens = 12715, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.984 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5253 | processing task, is_child = 0
slot update_slots: id  0 | task 5253 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 12917
slot update_slots: id  0 | task 5253 | n_tokens = 12715, memory_seq_rm [12715, end)
slot update_slots: id  0 | task 5253 | prompt processing progress, n_tokens = 12853, batch.n_tokens = 138, progress = 0.995045
slot update_slots: id  0 | task 5253 | n_tokens = 12853, memory_seq_rm [12853, end)
slot update_slots: id  0 | task 5253 | prompt processing progress, n_tokens = 12917, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5253 | prompt done, n_tokens = 12917, batch.n_tokens = 64
slot init_sampler: id  0 | task 5253 | init sampler, took 1.08 ms, tokens: text = 12917, total = 12917
slot update_slots: id  0 | task 5253 | created context checkpoint 5 of 8 (pos_min = 12852, pos_max = 12852, size = 75.376 MiB)
slot print_timing: id  0 | task 5253 |
prompt eval time =    2505.56 ms /   202 tokens (   12.40 ms per token,    80.62 tokens per second)
       eval time =    2117.64 ms /    37 tokens (   57.23 ms per token,    17.47 tokens per second)
      total time =    4623.20 ms /   239 tokens
slot      release: id  0 | task 5253 | stop processing: n_tokens = 12953, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.990 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5292 | processing task, is_child = 0
slot update_slots: id  0 | task 5292 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 13078
slot update_slots: id  0 | task 5292 | n_tokens = 12953, memory_seq_rm [12953, end)
slot update_slots: id  0 | task 5292 | prompt processing progress, n_tokens = 13014, batch.n_tokens = 61, progress = 0.995106
slot update_slots: id  0 | task 5292 | n_tokens = 13014, memory_seq_rm [13014, end)
slot update_slots: id  0 | task 5292 | prompt processing progress, n_tokens = 13078, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5292 | prompt done, n_tokens = 13078, batch.n_tokens = 64
slot init_sampler: id  0 | task 5292 | init sampler, took 1.23 ms, tokens: text = 13078, total = 13078
slot update_slots: id  0 | task 5292 | created context checkpoint 6 of 8 (pos_min = 13013, pos_max = 13013, size = 75.376 MiB)
slot print_timing: id  0 | task 5292 |
prompt eval time =    1997.29 ms /   125 tokens (   15.98 ms per token,    62.58 tokens per second)
       eval time =   31247.08 ms /   538 tokens (   58.08 ms per token,    17.22 tokens per second)
      total time =   33244.36 ms /   663 tokens
slot      release: id  0 | task 5292 | stop processing: n_tokens = 13615, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 12266760647
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 13615, total state size = 394.634 MiB
srv          load:  - looking for better prompt, base f_keep = 0.003, sim = 0.002
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.969
srv        update:  - cache state: 6 prompts, 2456.530 MiB (limits: 8192.000 MiB, 32768 tokens, 102804 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv        update:    - prompt 0x5e426816e6d0:   13615 tokens, checkpoints:  6,   846.889 MiB
srv  get_availabl: prompt cache update took 7365.40 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5832 | processing task, is_child = 0
slot update_slots: id  0 | task 5832 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 18481
slot update_slots: id  0 | task 5832 | n_tokens = 17908, memory_seq_rm [17908, end)
slot update_slots: id  0 | task 5832 | prompt processing progress, n_tokens = 18417, batch.n_tokens = 509, progress = 0.996537
slot update_slots: id  0 | task 5832 | n_tokens = 18417, memory_seq_rm [18417, end)
slot update_slots: id  0 | task 5832 | prompt processing progress, n_tokens = 18481, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5832 | prompt done, n_tokens = 18481, batch.n_tokens = 64
slot init_sampler: id  0 | task 5832 | init sampler, took 1.45 ms, tokens: text = 18481, total = 18481
slot update_slots: id  0 | task 5832 | created context checkpoint 3 of 8 (pos_min = 18416, pos_max = 18416, size = 75.376 MiB)
slot print_timing: id  0 | task 5832 |
prompt eval time =    7734.73 ms /   573 tokens (   13.50 ms per token,    74.08 tokens per second)
       eval time =    2791.53 ms /    44 tokens (   63.44 ms per token,    15.76 tokens per second)
      total time =   10526.26 ms /   617 tokens
slot      release: id  0 | task 5832 | stop processing: n_tokens = 18524, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.877 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5878 | processing task, is_child = 0
slot update_slots: id  0 | task 5878 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21130
slot update_slots: id  0 | task 5878 | n_tokens = 18524, memory_seq_rm [18524, end)
slot update_slots: id  0 | task 5878 | prompt processing progress, n_tokens = 20572, batch.n_tokens = 2048, progress = 0.973592
slot update_slots: id  0 | task 5878 | n_tokens = 20572, memory_seq_rm [20572, end)
slot update_slots: id  0 | task 5878 | prompt processing progress, n_tokens = 21066, batch.n_tokens = 494, progress = 0.996971
slot update_slots: id  0 | task 5878 | n_tokens = 21066, memory_seq_rm [21066, end)
slot update_slots: id  0 | task 5878 | prompt processing progress, n_tokens = 21130, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5878 | prompt done, n_tokens = 21130, batch.n_tokens = 64
slot init_sampler: id  0 | task 5878 | init sampler, took 1.72 ms, tokens: text = 21130, total = 21130
slot update_slots: id  0 | task 5878 | created context checkpoint 4 of 8 (pos_min = 21065, pos_max = 21065, size = 75.376 MiB)
slot print_timing: id  0 | task 5878 |
prompt eval time =   21446.58 ms /  2606 tokens (    8.23 ms per token,   121.51 tokens per second)
       eval time =    1975.53 ms /    33 tokens (   59.86 ms per token,    16.70 tokens per second)
      total time =   23422.12 ms /  2639 tokens
slot      release: id  0 | task 5878 | stop processing: n_tokens = 21162, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5914 | processing task, is_child = 0
slot update_slots: id  0 | task 5914 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21178
slot update_slots: id  0 | task 5914 | n_past = 21157, slot.prompt.tokens.size() = 21162, seq_id = 0, pos_min = 21161, n_swa = 1
slot update_slots: id  0 | task 5914 | restored context checkpoint (pos_min = 21065, pos_max = 21065, size = 75.376 MiB)
slot update_slots: id  0 | task 5914 | n_tokens = 21066, memory_seq_rm [21066, end)
slot update_slots: id  0 | task 5914 | prompt processing progress, n_tokens = 21114, batch.n_tokens = 48, progress = 0.996978
slot update_slots: id  0 | task 5914 | n_tokens = 21114, memory_seq_rm [21114, end)
slot update_slots: id  0 | task 5914 | prompt processing progress, n_tokens = 21178, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 5914 | prompt done, n_tokens = 21178, batch.n_tokens = 64
slot init_sampler: id  0 | task 5914 | init sampler, took 1.64 ms, tokens: text = 21178, total = 21178
slot print_timing: id  0 | task 5914 |
prompt eval time =    1830.46 ms /   112 tokens (   16.34 ms per token,    61.19 tokens per second)
       eval time =    3386.87 ms /    58 tokens (   58.39 ms per token,    17.12 tokens per second)
      total time =    5217.32 ms /   170 tokens
slot      release: id  0 | task 5914 | stop processing: n_tokens = 21235, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 5974 | processing task, is_child = 0
slot update_slots: id  0 | task 5974 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21259
slot update_slots: id  0 | task 5974 | n_tokens = 21235, memory_seq_rm [21235, end)
slot update_slots: id  0 | task 5974 | prompt processing progress, n_tokens = 21259, batch.n_tokens = 24, progress = 1.000000
slot update_slots: id  0 | task 5974 | prompt done, n_tokens = 21259, batch.n_tokens = 24
slot init_sampler: id  0 | task 5974 | init sampler, took 1.72 ms, tokens: text = 21259, total = 21259
slot update_slots: id  0 | task 5974 | created context checkpoint 5 of 8 (pos_min = 21234, pos_max = 21234, size = 75.376 MiB)
slot print_timing: id  0 | task 5974 |
prompt eval time =     375.50 ms /    24 tokens (   15.65 ms per token,    63.91 tokens per second)
       eval time =    4046.66 ms /    67 tokens (   60.40 ms per token,    16.56 tokens per second)
      total time =    4422.16 ms /    91 tokens
slot      release: id  0 | task 5974 | stop processing: n_tokens = 21325, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6042 | processing task, is_child = 0
slot update_slots: id  0 | task 6042 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21353
slot update_slots: id  0 | task 6042 | n_tokens = 21325, memory_seq_rm [21325, end)
slot update_slots: id  0 | task 6042 | prompt processing progress, n_tokens = 21353, batch.n_tokens = 28, progress = 1.000000
slot update_slots: id  0 | task 6042 | prompt done, n_tokens = 21353, batch.n_tokens = 28
slot init_sampler: id  0 | task 6042 | init sampler, took 1.75 ms, tokens: text = 21353, total = 21353
slot update_slots: id  0 | task 6042 | created context checkpoint 6 of 8 (pos_min = 21324, pos_max = 21324, size = 75.376 MiB)
slot print_timing: id  0 | task 6042 |
prompt eval time =     435.48 ms /    28 tokens (   15.55 ms per token,    64.30 tokens per second)
       eval time =    3561.99 ms /    62 tokens (   57.45 ms per token,    17.41 tokens per second)
      total time =    3997.47 ms /    90 tokens
slot      release: id  0 | task 6042 | stop processing: n_tokens = 21414, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6105 | processing task, is_child = 0
slot update_slots: id  0 | task 6105 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21441
slot update_slots: id  0 | task 6105 | n_tokens = 21414, memory_seq_rm [21414, end)
slot update_slots: id  0 | task 6105 | prompt processing progress, n_tokens = 21441, batch.n_tokens = 27, progress = 1.000000
slot update_slots: id  0 | task 6105 | prompt done, n_tokens = 21441, batch.n_tokens = 27
slot init_sampler: id  0 | task 6105 | init sampler, took 1.71 ms, tokens: text = 21441, total = 21441
slot update_slots: id  0 | task 6105 | created context checkpoint 7 of 8 (pos_min = 21413, pos_max = 21413, size = 75.376 MiB)
slot print_timing: id  0 | task 6105 |
prompt eval time =     409.14 ms /    27 tokens (   15.15 ms per token,    65.99 tokens per second)
       eval time =    3569.16 ms /    61 tokens (   58.51 ms per token,    17.09 tokens per second)
      total time =    3978.29 ms /    88 tokens
slot      release: id  0 | task 6105 | stop processing: n_tokens = 21501, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6167 | processing task, is_child = 0
slot update_slots: id  0 | task 6167 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21527
slot update_slots: id  0 | task 6167 | n_tokens = 21501, memory_seq_rm [21501, end)
slot update_slots: id  0 | task 6167 | prompt processing progress, n_tokens = 21527, batch.n_tokens = 26, progress = 1.000000
slot update_slots: id  0 | task 6167 | prompt done, n_tokens = 21527, batch.n_tokens = 26
slot init_sampler: id  0 | task 6167 | init sampler, took 1.91 ms, tokens: text = 21527, total = 21527
slot update_slots: id  0 | task 6167 | created context checkpoint 8 of 8 (pos_min = 21500, pos_max = 21500, size = 75.376 MiB)
slot print_timing: id  0 | task 6167 |
prompt eval time =     390.64 ms /    26 tokens (   15.02 ms per token,    66.56 tokens per second)
       eval time =    3449.90 ms /    59 tokens (   58.47 ms per token,    17.10 tokens per second)
      total time =    3840.54 ms /    85 tokens
slot      release: id  0 | task 6167 | stop processing: n_tokens = 21585, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6227 | processing task, is_child = 0
slot update_slots: id  0 | task 6227 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21610
slot update_slots: id  0 | task 6227 | n_tokens = 21585, memory_seq_rm [21585, end)
slot update_slots: id  0 | task 6227 | prompt processing progress, n_tokens = 21610, batch.n_tokens = 25, progress = 1.000000
slot update_slots: id  0 | task 6227 | prompt done, n_tokens = 21610, batch.n_tokens = 25
slot init_sampler: id  0 | task 6227 | init sampler, took 1.74 ms, tokens: text = 21610, total = 21610
slot update_slots: id  0 | task 6227 | erasing old context checkpoint (pos_min = 16897, pos_max = 16897, size = 75.376 MiB)
slot update_slots: id  0 | task 6227 | created context checkpoint 8 of 8 (pos_min = 21584, pos_max = 21584, size = 75.376 MiB)
slot print_timing: id  0 | task 6227 |
prompt eval time =     621.28 ms /    25 tokens (   24.85 ms per token,    40.24 tokens per second)
       eval time =    3198.08 ms /    56 tokens (   57.11 ms per token,    17.51 tokens per second)
      total time =    3819.36 ms /    81 tokens
slot      release: id  0 | task 6227 | stop processing: n_tokens = 21665, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6284 | processing task, is_child = 0
slot update_slots: id  0 | task 6284 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21690
slot update_slots: id  0 | task 6284 | n_tokens = 21665, memory_seq_rm [21665, end)
slot update_slots: id  0 | task 6284 | prompt processing progress, n_tokens = 21690, batch.n_tokens = 25, progress = 1.000000
slot update_slots: id  0 | task 6284 | prompt done, n_tokens = 21690, batch.n_tokens = 25
slot init_sampler: id  0 | task 6284 | init sampler, took 1.72 ms, tokens: text = 21690, total = 21690
slot update_slots: id  0 | task 6284 | erasing old context checkpoint (pos_min = 17700, pos_max = 17700, size = 75.376 MiB)
slot update_slots: id  0 | task 6284 | created context checkpoint 8 of 8 (pos_min = 21664, pos_max = 21664, size = 75.376 MiB)
slot print_timing: id  0 | task 6284 |
prompt eval time =     750.75 ms /    25 tokens (   30.03 ms per token,    33.30 tokens per second)
       eval time =    3213.78 ms /    55 tokens (   58.43 ms per token,    17.11 tokens per second)
      total time =    3964.53 ms /    80 tokens
slot      release: id  0 | task 6284 | stop processing: n_tokens = 21744, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6340 | processing task, is_child = 0
slot update_slots: id  0 | task 6340 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21768
slot update_slots: id  0 | task 6340 | n_tokens = 21744, memory_seq_rm [21744, end)
slot update_slots: id  0 | task 6340 | prompt processing progress, n_tokens = 21768, batch.n_tokens = 24, progress = 1.000000
slot update_slots: id  0 | task 6340 | prompt done, n_tokens = 21768, batch.n_tokens = 24
slot init_sampler: id  0 | task 6340 | init sampler, took 1.77 ms, tokens: text = 21768, total = 21768
slot update_slots: id  0 | task 6340 | erasing old context checkpoint (pos_min = 18416, pos_max = 18416, size = 75.376 MiB)
slot update_slots: id  0 | task 6340 | created context checkpoint 8 of 8 (pos_min = 21743, pos_max = 21743, size = 75.376 MiB)
slot print_timing: id  0 | task 6340 |
prompt eval time =     363.60 ms /    24 tokens (   15.15 ms per token,    66.01 tokens per second)
       eval time =    3446.97 ms /    56 tokens (   61.55 ms per token,    16.25 tokens per second)
      total time =    3810.56 ms /    80 tokens
slot      release: id  0 | task 6340 | stop processing: n_tokens = 21823, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6397 | processing task, is_child = 0
slot update_slots: id  0 | task 6397 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21847
slot update_slots: id  0 | task 6397 | n_tokens = 21823, memory_seq_rm [21823, end)
slot update_slots: id  0 | task 6397 | prompt processing progress, n_tokens = 21847, batch.n_tokens = 24, progress = 1.000000
slot update_slots: id  0 | task 6397 | prompt done, n_tokens = 21847, batch.n_tokens = 24
slot init_sampler: id  0 | task 6397 | init sampler, took 1.80 ms, tokens: text = 21847, total = 21847
slot update_slots: id  0 | task 6397 | erasing old context checkpoint (pos_min = 21065, pos_max = 21065, size = 75.376 MiB)
slot update_slots: id  0 | task 6397 | created context checkpoint 8 of 8 (pos_min = 21822, pos_max = 21822, size = 75.376 MiB)
slot print_timing: id  0 | task 6397 |
prompt eval time =     374.63 ms /    24 tokens (   15.61 ms per token,    64.06 tokens per second)
       eval time =    3437.24 ms /    58 tokens (   59.26 ms per token,    16.87 tokens per second)
      total time =    3811.86 ms /    82 tokens
slot      release: id  0 | task 6397 | stop processing: n_tokens = 21904, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6456 | processing task, is_child = 0
slot update_slots: id  0 | task 6456 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 21928
slot update_slots: id  0 | task 6456 | n_tokens = 21904, memory_seq_rm [21904, end)
slot update_slots: id  0 | task 6456 | prompt processing progress, n_tokens = 21928, batch.n_tokens = 24, progress = 1.000000
slot update_slots: id  0 | task 6456 | prompt done, n_tokens = 21928, batch.n_tokens = 24
slot init_sampler: id  0 | task 6456 | init sampler, took 1.68 ms, tokens: text = 21928, total = 21928
slot update_slots: id  0 | task 6456 | erasing old context checkpoint (pos_min = 21234, pos_max = 21234, size = 75.376 MiB)
slot update_slots: id  0 | task 6456 | created context checkpoint 8 of 8 (pos_min = 21903, pos_max = 21903, size = 75.376 MiB)
slot print_timing: id  0 | task 6456 |
prompt eval time =     422.02 ms /    24 tokens (   17.58 ms per token,    56.87 tokens per second)
       eval time =    3360.58 ms /    58 tokens (   57.94 ms per token,    17.26 tokens per second)
      total time =    3782.60 ms /    82 tokens
slot      release: id  0 | task 6456 | stop processing: n_tokens = 21985, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6515 | processing task, is_child = 0
slot update_slots: id  0 | task 6515 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22010
slot update_slots: id  0 | task 6515 | n_tokens = 21985, memory_seq_rm [21985, end)
slot update_slots: id  0 | task 6515 | prompt processing progress, n_tokens = 22010, batch.n_tokens = 25, progress = 1.000000
slot update_slots: id  0 | task 6515 | prompt done, n_tokens = 22010, batch.n_tokens = 25
slot init_sampler: id  0 | task 6515 | init sampler, took 1.79 ms, tokens: text = 22010, total = 22010
slot update_slots: id  0 | task 6515 | erasing old context checkpoint (pos_min = 21324, pos_max = 21324, size = 75.376 MiB)
slot update_slots: id  0 | task 6515 | created context checkpoint 8 of 8 (pos_min = 21984, pos_max = 21984, size = 75.376 MiB)
slot print_timing: id  0 | task 6515 |
prompt eval time =     383.56 ms /    25 tokens (   15.34 ms per token,    65.18 tokens per second)
       eval time =    3391.38 ms /    57 tokens (   59.50 ms per token,    16.81 tokens per second)
      total time =    3774.94 ms /    82 tokens
slot      release: id  0 | task 6515 | stop processing: n_tokens = 22066, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6573 | processing task, is_child = 0
slot update_slots: id  0 | task 6573 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22092
slot update_slots: id  0 | task 6573 | n_tokens = 22066, memory_seq_rm [22066, end)
slot update_slots: id  0 | task 6573 | prompt processing progress, n_tokens = 22092, batch.n_tokens = 26, progress = 1.000000
slot update_slots: id  0 | task 6573 | prompt done, n_tokens = 22092, batch.n_tokens = 26
slot init_sampler: id  0 | task 6573 | init sampler, took 1.82 ms, tokens: text = 22092, total = 22092
slot update_slots: id  0 | task 6573 | erasing old context checkpoint (pos_min = 21413, pos_max = 21413, size = 75.376 MiB)
slot update_slots: id  0 | task 6573 | created context checkpoint 8 of 8 (pos_min = 22065, pos_max = 22065, size = 75.376 MiB)
slot print_timing: id  0 | task 6573 |
prompt eval time =     413.38 ms /    26 tokens (   15.90 ms per token,    62.90 tokens per second)
       eval time =    3271.08 ms /    54 tokens (   60.58 ms per token,    16.51 tokens per second)
      total time =    3684.46 ms /    80 tokens
slot      release: id  0 | task 6573 | stop processing: n_tokens = 22145, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6628 | processing task, is_child = 0
slot update_slots: id  0 | task 6628 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22171
slot update_slots: id  0 | task 6628 | n_tokens = 22145, memory_seq_rm [22145, end)
slot update_slots: id  0 | task 6628 | prompt processing progress, n_tokens = 22171, batch.n_tokens = 26, progress = 1.000000
slot update_slots: id  0 | task 6628 | prompt done, n_tokens = 22171, batch.n_tokens = 26
slot init_sampler: id  0 | task 6628 | init sampler, took 1.83 ms, tokens: text = 22171, total = 22171
slot update_slots: id  0 | task 6628 | erasing old context checkpoint (pos_min = 21500, pos_max = 21500, size = 75.376 MiB)
slot update_slots: id  0 | task 6628 | created context checkpoint 8 of 8 (pos_min = 22144, pos_max = 22144, size = 75.376 MiB)
slot print_timing: id  0 | task 6628 |
prompt eval time =     417.88 ms /    26 tokens (   16.07 ms per token,    62.22 tokens per second)
       eval time =    2850.32 ms /    49 tokens (   58.17 ms per token,    17.19 tokens per second)
      total time =    3268.19 ms /    75 tokens
slot      release: id  0 | task 6628 | stop processing: n_tokens = 22219, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6678 | processing task, is_child = 0
slot update_slots: id  0 | task 6678 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22272
slot update_slots: id  0 | task 6678 | n_past = 22209, slot.prompt.tokens.size() = 22219, seq_id = 0, pos_min = 22218, n_swa = 1
slot update_slots: id  0 | task 6678 | restored context checkpoint (pos_min = 22144, pos_max = 22144, size = 75.376 MiB)
slot update_slots: id  0 | task 6678 | n_tokens = 22145, memory_seq_rm [22145, end)
slot update_slots: id  0 | task 6678 | prompt processing progress, n_tokens = 22208, batch.n_tokens = 63, progress = 0.997126
slot update_slots: id  0 | task 6678 | n_tokens = 22208, memory_seq_rm [22208, end)
slot update_slots: id  0 | task 6678 | prompt processing progress, n_tokens = 22272, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 6678 | prompt done, n_tokens = 22272, batch.n_tokens = 64
slot init_sampler: id  0 | task 6678 | init sampler, took 1.71 ms, tokens: text = 22272, total = 22272
slot print_timing: id  0 | task 6678 |
prompt eval time =    2156.74 ms /   127 tokens (   16.98 ms per token,    58.89 tokens per second)
       eval time =    1729.32 ms /    33 tokens (   52.40 ms per token,    19.08 tokens per second)
      total time =    3886.06 ms /   160 tokens
slot      release: id  0 | task 6678 | stop processing: n_tokens = 22304, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.997 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6713 | processing task, is_child = 0
slot update_slots: id  0 | task 6713 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22357
slot update_slots: id  0 | task 6713 | n_past = 22294, slot.prompt.tokens.size() = 22304, seq_id = 0, pos_min = 22303, n_swa = 1
slot update_slots: id  0 | task 6713 | restored context checkpoint (pos_min = 22144, pos_max = 22144, size = 75.376 MiB)
slot update_slots: id  0 | task 6713 | n_tokens = 22145, memory_seq_rm [22145, end)
slot update_slots: id  0 | task 6713 | prompt processing progress, n_tokens = 22293, batch.n_tokens = 148, progress = 0.997137
slot update_slots: id  0 | task 6713 | n_tokens = 22293, memory_seq_rm [22293, end)
slot update_slots: id  0 | task 6713 | prompt processing progress, n_tokens = 22357, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 6713 | prompt done, n_tokens = 22357, batch.n_tokens = 64
slot init_sampler: id  0 | task 6713 | init sampler, took 1.81 ms, tokens: text = 22357, total = 22357
slot update_slots: id  0 | task 6713 | erasing old context checkpoint (pos_min = 21584, pos_max = 21584, size = 75.376 MiB)
slot update_slots: id  0 | task 6713 | created context checkpoint 8 of 8 (pos_min = 22292, pos_max = 22292, size = 75.376 MiB)
slot print_timing: id  0 | task 6713 |
prompt eval time =    2757.89 ms /   212 tokens (   13.01 ms per token,    76.87 tokens per second)
       eval time =    2032.58 ms /    34 tokens (   59.78 ms per token,    16.73 tokens per second)
      total time =    4790.47 ms /   246 tokens
slot      release: id  0 | task 6713 | stop processing: n_tokens = 22390, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.998 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6749 | processing task, is_child = 0
slot update_slots: id  0 | task 6749 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22443
slot update_slots: id  0 | task 6749 | n_tokens = 22390, memory_seq_rm [22390, end)
slot update_slots: id  0 | task 6749 | prompt processing progress, n_tokens = 22443, batch.n_tokens = 53, progress = 1.000000
slot update_slots: id  0 | task 6749 | prompt done, n_tokens = 22443, batch.n_tokens = 53
slot init_sampler: id  0 | task 6749 | init sampler, took 1.76 ms, tokens: text = 22443, total = 22443
slot update_slots: id  0 | task 6749 | erasing old context checkpoint (pos_min = 21664, pos_max = 21664, size = 75.376 MiB)
slot update_slots: id  0 | task 6749 | created context checkpoint 8 of 8 (pos_min = 22389, pos_max = 22389, size = 75.376 MiB)
slot print_timing: id  0 | task 6749 |
prompt eval time =    1048.08 ms /    53 tokens (   19.78 ms per token,    50.57 tokens per second)
       eval time =    5528.37 ms /    88 tokens (   62.82 ms per token,    15.92 tokens per second)
      total time =    6576.45 ms /   141 tokens
slot      release: id  0 | task 6749 | stop processing: n_tokens = 22530, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 12378895599
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 22530, total state size = 603.681 MiB
srv  params_from_: Chat format: Qwen3 Coder
srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.057
srv        update:  - cache state: 7 prompts, 3663.218 MiB (limits: 8192.000 MiB, 32768 tokens, 119323 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv        update:    - prompt 0x5e426816e6d0:   13615 tokens, checkpoints:  6,   846.889 MiB
srv        update:    - prompt 0x5e426850d0e0:   22530 tokens, checkpoints:  8,  1206.688 MiB
srv  get_availabl: prompt cache update took 6022.52 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6838 | processing task, is_child = 0
slot update_slots: id  0 | task 6838 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 350
slot update_slots: id  0 | task 6838 | n_past = 20, slot.prompt.tokens.size() = 22530, seq_id = 0, pos_min = 22529, n_swa = 1
slot update_slots: id  0 | task 6838 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 21743, pos_max = 21743, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 21822, pos_max = 21822, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 21903, pos_max = 21903, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 21984, pos_max = 21984, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 22065, pos_max = 22065, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 22144, pos_max = 22144, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 22292, pos_max = 22292, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | erased invalidated context checkpoint (pos_min = 22389, pos_max = 22389, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 6838 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 6838 | prompt processing progress, n_tokens = 286, batch.n_tokens = 286, progress = 0.817143
slot update_slots: id  0 | task 6838 | n_tokens = 286, memory_seq_rm [286, end)
slot update_slots: id  0 | task 6838 | prompt processing progress, n_tokens = 350, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 6838 | prompt done, n_tokens = 350, batch.n_tokens = 64
slot init_sampler: id  0 | task 6838 | init sampler, took 0.03 ms, tokens: text = 350, total = 350
slot update_slots: id  0 | task 6838 | created context checkpoint 1 of 8 (pos_min = 285, pos_max = 285, size = 75.376 MiB)
slot print_timing: id  0 | task 6838 |
prompt eval time =   11793.55 ms /   350 tokens (   33.70 ms per token,    29.68 tokens per second)
       eval time =    1458.87 ms /    23 tokens (   63.43 ms per token,    15.77 tokens per second)
      total time =   13252.42 ms /   373 tokens
slot      release: id  0 | task 6838 | stop processing: n_tokens = 372, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 12398394801
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 372, total state size = 84.099 MiB
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          load:  - looking for better prompt, base f_keep = 0.054, sim = 0.001
srv          load:  - found better prompt with f_keep = 1.000, sim = 0.999
srv        update:  - cache state: 7 prompts, 2616.005 MiB (limits: 8192.000 MiB, 32768 tokens, 97702 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv        update:    - prompt 0x5e426816e6d0:   13615 tokens, checkpoints:  6,   846.889 MiB
srv        update:    - prompt 0x5e424eb0ef50:     372 tokens, checkpoints:  1,   159.475 MiB
srv  get_availabl: prompt cache update took 846.78 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6839 | processing task, is_child = 0
slot update_slots: id  0 | task 6839 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22543
slot update_slots: id  0 | task 6839 | n_tokens = 22530, memory_seq_rm [22530, end)
slot update_slots: id  0 | task 6839 | prompt processing progress, n_tokens = 22543, batch.n_tokens = 13, progress = 1.000000
slot update_slots: id  0 | task 6839 | prompt done, n_tokens = 22543, batch.n_tokens = 13
slot init_sampler: id  0 | task 6839 | init sampler, took 1.71 ms, tokens: text = 22543, total = 22543
slot update_slots: id  0 | task 6839 | erasing old context checkpoint (pos_min = 21743, pos_max = 21743, size = 75.376 MiB)
slot update_slots: id  0 | task 6839 | created context checkpoint 8 of 8 (pos_min = 22529, pos_max = 22529, size = 75.376 MiB)
slot print_timing: id  0 | task 6839 |
prompt eval time =     480.65 ms /    13 tokens (   36.97 ms per token,    27.05 tokens per second)
       eval time =    5347.05 ms /    81 tokens (   66.01 ms per token,    15.15 tokens per second)
      total time =    5827.69 ms /    94 tokens
slot      release: id  0 | task 6839 | stop processing: n_tokens = 22623, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.995 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6945 | processing task, is_child = 0
slot update_slots: id  0 | task 6945 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22688
slot update_slots: id  0 | task 6945 | n_past = 22580, slot.prompt.tokens.size() = 22623, seq_id = 0, pos_min = 22622, n_swa = 1
slot update_slots: id  0 | task 6945 | restored context checkpoint (pos_min = 22529, pos_max = 22529, size = 75.376 MiB)
slot update_slots: id  0 | task 6945 | n_tokens = 22530, memory_seq_rm [22530, end)
slot update_slots: id  0 | task 6945 | prompt processing progress, n_tokens = 22624, batch.n_tokens = 94, progress = 0.997179
slot update_slots: id  0 | task 6945 | n_tokens = 22624, memory_seq_rm [22624, end)
slot update_slots: id  0 | task 6945 | prompt processing progress, n_tokens = 22688, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 6945 | prompt done, n_tokens = 22688, batch.n_tokens = 64
slot init_sampler: id  0 | task 6945 | init sampler, took 1.76 ms, tokens: text = 22688, total = 22688
slot update_slots: id  0 | task 6945 | erasing old context checkpoint (pos_min = 21822, pos_max = 21822, size = 75.376 MiB)
slot update_slots: id  0 | task 6945 | created context checkpoint 8 of 8 (pos_min = 22623, pos_max = 22623, size = 75.376 MiB)
slot print_timing: id  0 | task 6945 |
prompt eval time =    2796.67 ms /   158 tokens (   17.70 ms per token,    56.50 tokens per second)
       eval time =    2716.46 ms /    46 tokens (   59.05 ms per token,    16.93 tokens per second)
      total time =    5513.13 ms /   204 tokens
slot      release: id  0 | task 6945 | stop processing: n_tokens = 22733, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 6993 | processing task, is_child = 0
slot update_slots: id  0 | task 6993 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 22766
slot update_slots: id  0 | task 6993 | n_tokens = 22733, memory_seq_rm [22733, end)
slot update_slots: id  0 | task 6993 | prompt processing progress, n_tokens = 22766, batch.n_tokens = 33, progress = 1.000000
slot update_slots: id  0 | task 6993 | prompt done, n_tokens = 22766, batch.n_tokens = 33
slot init_sampler: id  0 | task 6993 | init sampler, took 1.88 ms, tokens: text = 22766, total = 22766
slot update_slots: id  0 | task 6993 | erasing old context checkpoint (pos_min = 21903, pos_max = 21903, size = 75.376 MiB)
slot update_slots: id  0 | task 6993 | created context checkpoint 8 of 8 (pos_min = 22732, pos_max = 22732, size = 75.376 MiB)
slot print_timing: id  0 | task 6993 |
prompt eval time =    1262.12 ms /    33 tokens (   38.25 ms per token,    26.15 tokens per second)
       eval time =   55410.56 ms /   902 tokens (   61.43 ms per token,    16.28 tokens per second)
      total time =   56672.67 ms /   935 tokens
slot      release: id  0 | task 6993 | stop processing: n_tokens = 23667, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 0.998
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 7896 | processing task, is_child = 0
slot update_slots: id  0 | task 7896 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23730
slot update_slots: id  0 | task 7896 | n_past = 23625, slot.prompt.tokens.size() = 23667, seq_id = 0, pos_min = 23666, n_swa = 1
slot update_slots: id  0 | task 7896 | restored context checkpoint (pos_min = 22732, pos_max = 22732, size = 75.376 MiB)
slot update_slots: id  0 | task 7896 | n_tokens = 22733, memory_seq_rm [22733, end)
slot update_slots: id  0 | task 7896 | prompt processing progress, n_tokens = 23666, batch.n_tokens = 933, progress = 0.997303
slot update_slots: id  0 | task 7896 | n_tokens = 23666, memory_seq_rm [23666, end)
slot update_slots: id  0 | task 7896 | prompt processing progress, n_tokens = 23730, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 7896 | prompt done, n_tokens = 23730, batch.n_tokens = 64
slot init_sampler: id  0 | task 7896 | init sampler, took 1.85 ms, tokens: text = 23730, total = 23730
slot update_slots: id  0 | task 7896 | erasing old context checkpoint (pos_min = 21984, pos_max = 21984, size = 75.376 MiB)
slot update_slots: id  0 | task 7896 | created context checkpoint 8 of 8 (pos_min = 23665, pos_max = 23665, size = 75.376 MiB)
slot print_timing: id  0 | task 7896 |
prompt eval time =    5967.17 ms /   997 tokens (    5.99 ms per token,   167.08 tokens per second)
       eval time =   11417.55 ms /   196 tokens (   58.25 ms per token,    17.17 tokens per second)
      total time =   17384.72 ms /  1193 tokens
slot      release: id  0 | task 7896 | stop processing: n_tokens = 23925, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 8094 | processing task, is_child = 0
slot update_slots: id  0 | task 8094 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 23957
slot update_slots: id  0 | task 8094 | n_tokens = 23925, memory_seq_rm [23925, end)
slot update_slots: id  0 | task 8094 | prompt processing progress, n_tokens = 23957, batch.n_tokens = 32, progress = 1.000000
slot update_slots: id  0 | task 8094 | prompt done, n_tokens = 23957, batch.n_tokens = 32
slot init_sampler: id  0 | task 8094 | init sampler, took 1.89 ms, tokens: text = 23957, total = 23957
slot update_slots: id  0 | task 8094 | erasing old context checkpoint (pos_min = 22065, pos_max = 22065, size = 75.376 MiB)
slot update_slots: id  0 | task 8094 | created context checkpoint 8 of 8 (pos_min = 23924, pos_max = 23924, size = 75.376 MiB)
slot print_timing: id  0 | task 8094 |
prompt eval time =     811.30 ms /    32 tokens (   25.35 ms per token,    39.44 tokens per second)
       eval time =   77493.44 ms /  1302 tokens (   59.52 ms per token,    16.80 tokens per second)
      total time =   78304.74 ms /  1334 tokens
slot      release: id  0 | task 8094 | stop processing: n_tokens = 25258, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 9397 | processing task, is_child = 0
slot update_slots: id  0 | task 9397 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 25289
slot update_slots: id  0 | task 9397 | n_tokens = 25258, memory_seq_rm [25258, end)
slot update_slots: id  0 | task 9397 | prompt processing progress, n_tokens = 25289, batch.n_tokens = 31, progress = 1.000000
slot update_slots: id  0 | task 9397 | prompt done, n_tokens = 25289, batch.n_tokens = 31
slot init_sampler: id  0 | task 9397 | init sampler, took 2.03 ms, tokens: text = 25289, total = 25289
slot update_slots: id  0 | task 9397 | erasing old context checkpoint (pos_min = 22144, pos_max = 22144, size = 75.376 MiB)
slot update_slots: id  0 | task 9397 | created context checkpoint 8 of 8 (pos_min = 25257, pos_max = 25257, size = 75.376 MiB)
slot print_timing: id  0 | task 9397 |
prompt eval time =     454.77 ms /    31 tokens (   14.67 ms per token,    68.17 tokens per second)
       eval time =   17713.42 ms /   309 tokens (   57.32 ms per token,    17.44 tokens per second)
      total time =   18168.19 ms /   340 tokens
slot      release: id  0 | task 9397 | stop processing: n_tokens = 25597, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.989 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 9707 | processing task, is_child = 0
slot update_slots: id  0 | task 9707 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 25887
slot update_slots: id  0 | task 9707 | n_tokens = 25597, memory_seq_rm [25597, end)
slot update_slots: id  0 | task 9707 | prompt processing progress, n_tokens = 25823, batch.n_tokens = 226, progress = 0.997528
slot update_slots: id  0 | task 9707 | n_tokens = 25823, memory_seq_rm [25823, end)
slot update_slots: id  0 | task 9707 | prompt processing progress, n_tokens = 25887, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 9707 | prompt done, n_tokens = 25887, batch.n_tokens = 64
slot init_sampler: id  0 | task 9707 | init sampler, took 2.05 ms, tokens: text = 25887, total = 25887
slot update_slots: id  0 | task 9707 | erasing old context checkpoint (pos_min = 22292, pos_max = 22292, size = 75.376 MiB)
slot update_slots: id  0 | task 9707 | created context checkpoint 8 of 8 (pos_min = 25822, pos_max = 25822, size = 75.376 MiB)
slot print_timing: id  0 | task 9707 |
prompt eval time =    3431.64 ms /   290 tokens (   11.83 ms per token,    84.51 tokens per second)
       eval time =    5636.36 ms /    92 tokens (   61.26 ms per token,    16.32 tokens per second)
      total time =    9068.00 ms /   382 tokens
slot      release: id  0 | task 9707 | stop processing: n_tokens = 25978, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 9801 | processing task, is_child = 0
slot update_slots: id  0 | task 9801 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 26009
slot update_slots: id  0 | task 9801 | n_tokens = 25978, memory_seq_rm [25978, end)
slot update_slots: id  0 | task 9801 | prompt processing progress, n_tokens = 26009, batch.n_tokens = 31, progress = 1.000000
slot update_slots: id  0 | task 9801 | prompt done, n_tokens = 26009, batch.n_tokens = 31
slot init_sampler: id  0 | task 9801 | init sampler, took 2.08 ms, tokens: text = 26009, total = 26009
slot update_slots: id  0 | task 9801 | erasing old context checkpoint (pos_min = 22389, pos_max = 22389, size = 75.376 MiB)
slot update_slots: id  0 | task 9801 | created context checkpoint 8 of 8 (pos_min = 25977, pos_max = 25977, size = 75.376 MiB)
slot print_timing: id  0 | task 9801 |
prompt eval time =     374.53 ms /    31 tokens (   12.08 ms per token,    82.77 tokens per second)
       eval time =   42089.66 ms /   709 tokens (   59.36 ms per token,    16.84 tokens per second)
      total time =   42464.19 ms /   740 tokens
slot      release: id  0 | task 9801 | stop processing: n_tokens = 26717, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 10511 | processing task, is_child = 0
slot update_slots: id  0 | task 10511 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 26747
slot update_slots: id  0 | task 10511 | n_tokens = 26717, memory_seq_rm [26717, end)
slot update_slots: id  0 | task 10511 | prompt processing progress, n_tokens = 26747, batch.n_tokens = 30, progress = 1.000000
slot update_slots: id  0 | task 10511 | prompt done, n_tokens = 26747, batch.n_tokens = 30
slot init_sampler: id  0 | task 10511 | init sampler, took 2.25 ms, tokens: text = 26747, total = 26747
slot update_slots: id  0 | task 10511 | erasing old context checkpoint (pos_min = 22529, pos_max = 22529, size = 75.376 MiB)
slot update_slots: id  0 | task 10511 | created context checkpoint 8 of 8 (pos_min = 26716, pos_max = 26716, size = 75.376 MiB)
slot print_timing: id  0 | task 10511 |
prompt eval time =     623.52 ms /    30 tokens (   20.78 ms per token,    48.11 tokens per second)
       eval time =  102561.43 ms /  1737 tokens (   59.05 ms per token,    16.94 tokens per second)
      total time =  103184.96 ms /  1767 tokens
slot      release: id  0 | task 10511 | stop processing: n_tokens = 28483, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 12249 | processing task, is_child = 0
slot update_slots: id  0 | task 12249 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 28514
slot update_slots: id  0 | task 12249 | n_tokens = 28483, memory_seq_rm [28483, end)
slot update_slots: id  0 | task 12249 | prompt processing progress, n_tokens = 28514, batch.n_tokens = 31, progress = 1.000000
slot update_slots: id  0 | task 12249 | prompt done, n_tokens = 28514, batch.n_tokens = 31
slot init_sampler: id  0 | task 12249 | init sampler, took 2.26 ms, tokens: text = 28514, total = 28514
slot update_slots: id  0 | task 12249 | erasing old context checkpoint (pos_min = 22623, pos_max = 22623, size = 75.376 MiB)
slot update_slots: id  0 | task 12249 | created context checkpoint 8 of 8 (pos_min = 28482, pos_max = 28482, size = 75.376 MiB)
slot print_timing: id  0 | task 12249 |
prompt eval time =     566.94 ms /    31 tokens (   18.29 ms per token,    54.68 tokens per second)
       eval time =   74037.91 ms /  1242 tokens (   59.61 ms per token,    16.78 tokens per second)
      total time =   74604.85 ms /  1273 tokens
slot      release: id  0 | task 12249 | stop processing: n_tokens = 29755, truncated = 0
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 13492 | processing task, is_child = 0
slot update_slots: id  0 | task 13492 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 29786
slot update_slots: id  0 | task 13492 | n_tokens = 29755, memory_seq_rm [29755, end)
slot update_slots: id  0 | task 13492 | prompt processing progress, n_tokens = 29786, batch.n_tokens = 31, progress = 1.000000
slot update_slots: id  0 | task 13492 | prompt done, n_tokens = 29786, batch.n_tokens = 31
slot init_sampler: id  0 | task 13492 | init sampler, took 2.40 ms, tokens: text = 29786, total = 29786
slot update_slots: id  0 | task 13492 | erasing old context checkpoint (pos_min = 22732, pos_max = 22732, size = 75.376 MiB)
slot update_slots: id  0 | task 13492 | created context checkpoint 8 of 8 (pos_min = 29754, pos_max = 29754, size = 75.376 MiB)
slot print_timing: id  0 | task 13492 |
prompt eval time =     739.31 ms /    31 tokens (   23.85 ms per token,    41.93 tokens per second)
       eval time =   69617.41 ms /  1153 tokens (   60.38 ms per token,    16.56 tokens per second)
      total time =   70356.72 ms /  1184 tokens
slot      release: id  0 | task 13492 | stop processing: n_tokens = 30938, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 14646 | processing task, is_child = 0
slot update_slots: id  0 | task 14646 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 30969
slot update_slots: id  0 | task 14646 | n_tokens = 30938, memory_seq_rm [30938, end)
slot update_slots: id  0 | task 14646 | prompt processing progress, n_tokens = 30969, batch.n_tokens = 31, progress = 1.000000
slot update_slots: id  0 | task 14646 | prompt done, n_tokens = 30969, batch.n_tokens = 31
slot init_sampler: id  0 | task 14646 | init sampler, took 2.55 ms, tokens: text = 30969, total = 30969
slot update_slots: id  0 | task 14646 | erasing old context checkpoint (pos_min = 23665, pos_max = 23665, size = 75.376 MiB)
slot update_slots: id  0 | task 14646 | created context checkpoint 8 of 8 (pos_min = 30937, pos_max = 30937, size = 75.376 MiB)
slot print_timing: id  0 | task 14646 |
prompt eval time =     869.43 ms /    31 tokens (   28.05 ms per token,    35.66 tokens per second)
       eval time =   91165.07 ms /  1465 tokens (   62.23 ms per token,    16.07 tokens per second)
      total time =   92034.51 ms /  1496 tokens
slot      release: id  0 | task 14646 | stop processing: n_tokens = 32433, truncated = 0
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 16112 | processing task, is_child = 0
slot update_slots: id  0 | task 16112 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 32464
slot update_slots: id  0 | task 16112 | n_tokens = 32433, memory_seq_rm [32433, end)
slot update_slots: id  0 | task 16112 | prompt processing progress, n_tokens = 32464, batch.n_tokens = 31, progress = 1.000000
slot update_slots: id  0 | task 16112 | prompt done, n_tokens = 32464, batch.n_tokens = 31
slot init_sampler: id  0 | task 16112 | init sampler, took 2.57 ms, tokens: text = 32464, total = 32464
slot update_slots: id  0 | task 16112 | erasing old context checkpoint (pos_min = 23924, pos_max = 23924, size = 75.376 MiB)
slot update_slots: id  0 | task 16112 | created context checkpoint 8 of 8 (pos_min = 32432, pos_max = 32432, size = 75.376 MiB)
slot print_timing: id  0 | task 16112 |
prompt eval time =    1005.12 ms /    31 tokens (   32.42 ms per token,    30.84 tokens per second)
       eval time =    4757.12 ms /    71 tokens (   67.00 ms per token,    14.92 tokens per second)
      total time =    5762.25 ms /   102 tokens
slot      release: id  0 | task 16112 | stop processing: n_tokens = 32534, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 12983208127
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 32534, total state size = 838.264 MiB
srv  params_from_: Chat format: Qwen3 Coder
srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.061
srv        update:  - cache state: 8 prompts, 4057.276 MiB (limits: 8192.000 MiB, 32768 tokens, 128684 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv        update:    - prompt 0x5e426816e6d0:   13615 tokens, checkpoints:  6,   846.889 MiB
srv        update:    - prompt 0x5e424eb0ef50:     372 tokens, checkpoints:  1,   159.475 MiB
srv        update:    - prompt 0x5e426af39220:   32534 tokens, checkpoints:  8,  1441.271 MiB
srv  get_availabl: prompt cache update took 13719.02 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 16184 | processing task, is_child = 0
slot update_slots: id  0 | task 16184 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 329
slot update_slots: id  0 | task 16184 | n_past = 20, slot.prompt.tokens.size() = 32534, seq_id = 0, pos_min = 32533, n_swa = 1
slot update_slots: id  0 | task 16184 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 25257, pos_max = 25257, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 25822, pos_max = 25822, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 25977, pos_max = 25977, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 26716, pos_max = 26716, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 28482, pos_max = 28482, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 29754, pos_max = 29754, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 30937, pos_max = 30937, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | erased invalidated context checkpoint (pos_min = 32432, pos_max = 32432, n_swa = 1, size = 75.376 MiB)
slot update_slots: id  0 | task 16184 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  0 | task 16184 | prompt processing progress, n_tokens = 265, batch.n_tokens = 265, progress = 0.805471
slot update_slots: id  0 | task 16184 | n_tokens = 265, memory_seq_rm [265, end)
slot update_slots: id  0 | task 16184 | prompt processing progress, n_tokens = 329, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  0 | task 16184 | prompt done, n_tokens = 329, batch.n_tokens = 64
slot init_sampler: id  0 | task 16184 | init sampler, took 0.03 ms, tokens: text = 329, total = 329
slot update_slots: id  0 | task 16184 | created context checkpoint 1 of 8 (pos_min = 264, pos_max = 264, size = 75.376 MiB)
slot print_timing: id  0 | task 16184 |
prompt eval time =   23115.51 ms /   329 tokens (   70.26 ms per token,    14.23 tokens per second)
       eval time =    1521.45 ms /    23 tokens (   66.15 ms per token,    15.12 tokens per second)
      total time =   24636.95 ms /   352 tokens
slot      release: id  0 | task 16184 | stop processing: n_tokens = 351, truncated = 0
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 13021897998
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 351, total state size = 83.607 MiB
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          load:  - looking for better prompt, base f_keep = 0.057, sim = 0.001
srv          load:  - found better prompt with f_keep = 1.000, sim = 1.000
srv        update:  - cache state: 8 prompts, 2774.988 MiB (limits: 8192.000 MiB, 32768 tokens, 93141 est)
srv        update:    - prompt 0x5e426789a450:     168 tokens, checkpoints:  1,   154.691 MiB
srv        update:    - prompt 0x5e4268394340:     595 tokens, checkpoints:  1,   164.704 MiB
srv        update:    - prompt 0x5e426ad8a3f0:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e426ad9d890:     352 tokens, checkpoints:  1,   159.006 MiB
srv        update:    - prompt 0x5e424eb6dbb0:   15746 tokens, checkpoints:  7,   972.234 MiB
srv        update:    - prompt 0x5e426816e6d0:   13615 tokens, checkpoints:  6,   846.889 MiB
srv        update:    - prompt 0x5e424eb0ef50:     372 tokens, checkpoints:  1,   159.475 MiB
srv        update:    - prompt 0x5e426af2f1c0:     351 tokens, checkpoints:  1,   158.983 MiB
srv  get_availabl: prompt cache update took 1311.45 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 16185 | processing task, is_child = 0
slot update_slots: id  0 | task 16185 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 32547
slot update_slots: id  0 | task 16185 | n_tokens = 32534, memory_seq_rm [32534, end)
slot update_slots: id  0 | task 16185 | prompt processing progress, n_tokens = 32547, batch.n_tokens = 13, progress = 1.000000
slot update_slots: id  0 | task 16185 | prompt done, n_tokens = 32547, batch.n_tokens = 13
slot init_sampler: id  0 | task 16185 | init sampler, took 2.61 ms, tokens: text = 32547, total = 32547
slot update_slots: id  0 | task 16185 | erasing old context checkpoint (pos_min = 25257, pos_max = 25257, size = 75.376 MiB)
slot update_slots: id  0 | task 16185 | created context checkpoint 8 of 8 (pos_min = 32533, pos_max = 32533, size = 75.376 MiB)
slot print_timing: id  0 | task 16185 |
prompt eval time =     519.86 ms /    13 tokens (   39.99 ms per token,    25.01 tokens per second)
       eval time =   14544.86 ms /   221 tokens (   65.81 ms per token,    15.19 tokens per second)
      total time =   15064.72 ms /   234 tokens
slot      release: id  0 | task 16185 | stop processing: n_tokens = 32767, truncated = 1
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv          stop: cancel task, id_task = 16185
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.993
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 16432 | processing task, is_child = 0
slot update_slots: id  0 | task 16432 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 32547
slot update_slots: id  0 | task 16432 | n_past = 32547, slot.prompt.tokens.size() = 32767, seq_id = 0, pos_min = 32766, n_swa = 1
slot update_slots: id  0 | task 16432 | restored context checkpoint (pos_min = 32533, pos_max = 32533, size = 75.376 MiB)
slot update_slots: id  0 | task 16432 | n_tokens = 32534, memory_seq_rm [32534, end)
slot update_slots: id  0 | task 16432 | prompt processing progress, n_tokens = 32547, batch.n_tokens = 13, progress = 1.000000
slot update_slots: id  0 | task 16432 | prompt done, n_tokens = 32547, batch.n_tokens = 13
slot init_sampler: id  0 | task 16432 | init sampler, took 2.51 ms, tokens: text = 32547, total = 32547
slot print_timing: id  0 | task 16432 |
prompt eval time =     240.75 ms /    13 tokens (   18.52 ms per token,    54.00 tokens per second)
       eval time =   13605.62 ms /   221 tokens (   61.56 ms per token,    16.24 tokens per second)
      total time =   13846.37 ms /   234 tokens
slot      release: id  0 | task 16432 | stop processing: n_tokens = 32767, truncated = 1
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.992 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  0 | task 16654 | processing task, is_child = 0
slot update_slots: id  0 | task 16654 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 33042
srv    send_error: task id = 16654, error: request (33042 tokens) exceeds the available context size (32768 tokens), try increasing it
slot      release: id  0 | task 16654 | stop processing: n_tokens = 32767, truncated = 0
srv  update_slots: no tokens to decode
srv  update_slots: all slots are idle
srv          stop: cancel task, id_task = 16654
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/messages 127.0.0.1 400
