| Model | Size/Class | Format | Hosted Provider | Best Local Path | Notes |
|---|---|---|---|---|---|
| Huihui Gemma 4 E2B Abliterated v2 | E2B | GGUF | No | Ollama / llama.cpp | Gemma 4 MoE with ~2B active params. Multimodal (image+text in, text out). Abliterated for reduced refusal. Lightweight enough to run fast, but MoE active-param sizing means quality punches above its weight class. |
| Huihui Gemma 4 E4B Abliterated | E4B | GGUF | No | Ollama / llama.cpp | Same Gemma 4 MoE family as E2B but with ~4B active params. Multimodal. Better quality ceiling than E2B at the cost of more compute per token. |
| SultrySilicon V2 | 7B | GGUF | No | Ollama / llama.cpp | Roleplay-focused 7B model. Smallest in the set. Good for quick creative/RP sanity checks, not for reasoning or instruction-following benchmarks. |
| Huihui-GLM-4.6V-Flash-Abliterated | 9B | GGUF | No | Ollama / llama.cpp | Based on Z.ai GLM-4.6V-Flash. Vision-language model (image+text). Abliterated. Bilingual Chinese/English. Fast inference variant of the GLM-4.6V family. |
| Gemma-2-Ataraxy-9B | 9B | GGUF | No | Ollama / llama.cpp | Merge of Gemma-2-9B-SimPO and Gemma-2-Gutenberg-9B. Creative writing and roleplay oriented. Scored well on EQ-Bench. Good balance of instruction-following and literary quality at 9B. |
| MythoMax-L2-13B | 13B | GGUF | No | Ollama / llama.cpp | By Gryphe. Llama 2 merge of MythoLogic-L2 and Huginn using experimental per-tensor gradient merging. One of the most downloaded RP/creative models ever (~59k GGUF downloads). Strong at both roleplay and storywriting. Alpaca format. The OG. |
| Dan's PersonalityEngine V1.3.0 | 24B | GGUF | No | Ollama / llama.cpp | Fine-tuned from Mistral Small 3.1 24B Base. Trained on a massive mix: roleplay, storywriting, tool use, math, reasoning, code, medical, legal, and survival topics. Multilingual (EN, AR, DE, FR, ES, HI, PT, JA, KO). A genuine generalist with personality. |
| SuperGemma4 26B Abliterated Multimodal | 26B multimodal | GGUF | No | custom multimodal stack | Based on Gemma 4 26B-A4B. Multimodal (image-text-to-text). Abliterated with low refusal. Optimized for Apple Silicon (MLX). Supports Korean + English. Tool use and coding tags. |
| Gemma 3 27B Abliterated | 27B | GGUF | No | Ollama / llama.cpp | Abliterated version of Google's Gemma 3 27B instruct. Multimodal (image-text-to-text). Reduced refusal behavior while preserving instruction-following quality. |
| Huihui Gemma 4 31B Abliterated | 31B | GGUF | No | Ollama / llama.cpp | Abliterated Gemma 4 31B instruct. Multimodal (any-to-any pipeline tag). Dense 31B, not MoE. Strongest Gemma 4 dense abliterated option. |
| Gemma 4 31B Abliterated | 31B | GGUF + safetensors | No | Ollama / llama.cpp | Same base as above (Gemma 4 31B-it) but different abliteration method using mlabonne's harmful_behaviors + harmless_alpaca datasets. Both formats in one repo. |
| Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-Abliterated | 35B A3B | GGUF | No | Ollama / llama.cpp | Qwen 3.5 MoE (35B total, ~3B active). Distilled from Claude 4.6 Opus reasoning. Chain-of-thought and reasoning-focused. Abliterated. Multimodal. Punches well above its active param count on reasoning tasks. |
| Midnight Rose 70B v2.0.3 | 70B | GGUF | No | Ollama / llama.cpp | By sophosympatheia. Complex multi-stage SLERP/DARE-TIES merge of WizardLM, Tulu-2-DPO, Dolphin, and earlier Midnight Rose versions. Uncensored. Designed for roleplay and storytelling. Scored surprisingly high on EQ-Bench even at low quants. ~6k context sweet spot. |
| Midnight Miqu 70B v1.5 | 70B | GGUF | No | Ollama / llama.cpp | Llama-family merge of Midnight-Miqu v1.0 and Tess-70B. Creative writing and roleplay focused. 32k context. Known for strong prose quality and character consistency at 70B scale. |
| Midnight Rose 103B v2.0.3 | 103B | GGUF | No | heavy self-host | Same lineage as the 70B but scaled up. Importance-matrix GGUF by mradermacher. Firmly in the "need real hardware" category. |
| DeepSeek V3 | 671B A37B | safetensors | Yes: DeepInfra, Novita | Hosted preferred | Massive MoE. 671B total, 37B active. Strong on code, math, and instruction-following. Pre-trained on ~15T tokens. Use via OpenRouter, not locally. |
| DeepSeek V3.2 | 685B A37B | safetensors | No confirmed provider yet | Hosted preferred | Successor to V3. Same general architecture class. Not a local play. |
| Behemoth-123B-v1 | 123B | GGUF | No | heavy self-host | Mistral-family 123B. Creative/RP community model. Massive parameter count makes it impractical for casual local use but prized for output quality in the r/LocalLLM community. |
| Monstral-123B | 123B | GGUF | No | heavy self-host | Mistral-family 123B. Text generation and chat focused. Same weight class as Behemoth, different training mix and community lineage. |
| BlackSheep-Large | ~27B | GGUF | No | Ollama / llama.cpp | By TroyDoesAI. Canonical repo is gated. Q8_0 is ~29.5 GB, placing it in the 27B-class. Community RP/creative model. |
Last active
April 16, 2026 15:58
-
-
Save swyxio/324fc884061bf20e97a2ecbe59bae34a to your computer and use it in GitHub Desktop.
r/localLlama + r/localLLM + r/sillytavernAI preferred models list - apr 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment