Skip to content

Instantly share code, notes, and snippets.

@jazzqi
Created April 22, 2026 16:29
Show Gist options
  • Select an option

  • Save jazzqi/4de61d9602c135702637289d7dc332a3 to your computer and use it in GitHub Desktop.

Select an option

Save jazzqi/4de61d9602c135702637289d7dc332a3 to your computer and use it in GitHub Desktop.
Hermes Free Model Support Report

Hermes Free Model Support Report

Supported Free Models

Model Provider Context Size Output Size Reasoning Speed (tokens/s) Notes
glm-5.1 nvidia 200K 131K ~100+ Latest GLM, supports tool calls
DeepSeek-R1 nvidia 128K 128K ~80 Strong reasoning
llama-3.1-nemotron-70b nvidia 128K 64K ~100 Optimized for chat
chatqa-2 nvidia 128K 64K ~60 Question answering
mimo-v2-pro nous 128K 64K ~50 Code-focused, slower
DeepSeek-V3.2-Thinking nvidia 128K 128K ~70 Balanced reasoning
qwen-flash 302ai 1M 32K ~40 Very large context

Recommendations

  • General Use: nvidia/llama-3.1-nemotron-70b (BEST PERFORMANCE)
  • Code/Technical: xiaomi/mimo-v2-pro (DEEP CODE UNDERSTANDING)
  • Complex Reasoning: deepseek-ai/DeepSeek-R1 (POWERFUL LOGIC)
  • Long Context: qwen-flash (1M CONTEXT WINDOW)

Last updated: 2026-04-22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment