| Model | Provider | Context Size | Output Size | Reasoning | Speed (tokens/s) | Notes |
|---|---|---|---|---|---|---|
| glm-5.1 | nvidia | 200K | 131K | ✅ | ~100+ | Latest GLM, supports tool calls |
| DeepSeek-R1 | nvidia | 128K | 128K | ✅ | ~80 | Strong reasoning |
| llama-3.1-nemotron-70b | nvidia | 128K | 64K | ❌ | ~100 | Optimized for chat |
| chatqa-2 | nvidia | 128K | 64K | ❌ | ~60 | Question answering |
| mimo-v2-pro | nous | 128K | 64K | ❌ | ~50 | Code-focused, slower |
| DeepSeek-V3.2-Thinking | nvidia | 128K | 128K | ✅ | ~70 | Balanced reasoning |
| qwen-flash | 302ai | 1M | 32K | ❌ | ~40 | Very large context |
- General Use:
nvidia/llama-3.1-nemotron-70b(BEST PERFORMANCE) - Code/Technical:
xiaomi/mimo-v2-pro(DEEP CODE UNDERSTANDING) - Complex Reasoning:
deepseek-ai/DeepSeek-R1(POWERFUL LOGIC) - Long Context:
qwen-flash(1M CONTEXT WINDOW)
Last updated: 2026-04-22