You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gemma 4 chat template that works with opencode - download the .jinja file and tell vllm to use it via `--chat-template chat_template_gemma_large_fixed.jinja`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A simple proxy to convert a non-streaming Chat Completions request into a streaming one
Simple non-streaming to streaming Chat Completions Proxy
This is a simple proxy I use to run non-streaming evals (like BFCL multi_turn) against vLLM server's streaming request/response path. Run the vLLM server as usual, run the proxy (via python proxy.py), and point BFCL to http://localhost:8001/v1 instead of http://localhost:8000/v1 to test the streaming path.
This means you can start vLLM once and run BFCL twice, in both streaming and non-streaming by just changing the OPENAI_BASE_URL, to verify basic correctness of the streaming reasoning and tool call parsers.
The entire script was written by Gemini in one-shot, but it seems to work so far in basic testing.
Changes required to get latest main of vLLM running Qwen3 MoE NVFP4 on DGX Spark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dockerfile to create vLLM v0.11.2 containers for DGX Spark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Llguidance, vllm guided_grammar, and Hermes models
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
An example of how to use Pydantic AI with Llama Stack and the Responses API
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EBNF grammar (for use with Tatsu) for Llama 4 Pythonic tool calling parsing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters