Skip to content

Instantly share code, notes, and snippets.

@swyxio
Last active April 16, 2026 15:58
Show Gist options
  • Select an option

  • Save swyxio/324fc884061bf20e97a2ecbe59bae34a to your computer and use it in GitHub Desktop.

Select an option

Save swyxio/324fc884061bf20e97a2ecbe59bae34a to your computer and use it in GitHub Desktop.
r/localLlama + r/localLLM + r/sillytavernAI preferred models list - apr 2026
Model Size/Class Format Hosted Provider Best Local Path Notes
Huihui Gemma 4 E2B Abliterated v2 E2B GGUF No Ollama / llama.cpp Gemma 4 MoE with ~2B active params. Multimodal (image+text in, text out). Abliterated for reduced refusal. Lightweight enough to run fast, but MoE active-param sizing means quality punches above its weight class.
Huihui Gemma 4 E4B Abliterated E4B GGUF No Ollama / llama.cpp Same Gemma 4 MoE family as E2B but with ~4B active params. Multimodal. Better quality ceiling than E2B at the cost of more compute per token.
SultrySilicon V2 7B GGUF No Ollama / llama.cpp Roleplay-focused 7B model. Smallest in the set. Good for quick creative/RP sanity checks, not for reasoning or instruction-following benchmarks.
Huihui-GLM-4.6V-Flash-Abliterated 9B GGUF No Ollama / llama.cpp Based on Z.ai GLM-4.6V-Flash. Vision-language model (image+text). Abliterated. Bilingual Chinese/English. Fast inference variant of the GLM-4.6V family.
Gemma-2-Ataraxy-9B 9B GGUF No Ollama / llama.cpp Merge of Gemma-2-9B-SimPO and Gemma-2-Gutenberg-9B. Creative writing and roleplay oriented. Scored well on EQ-Bench. Good balance of instruction-following and literary quality at 9B.
MythoMax-L2-13B 13B GGUF No Ollama / llama.cpp By Gryphe. Llama 2 merge of MythoLogic-L2 and Huginn using experimental per-tensor gradient merging. One of the most downloaded RP/creative models ever (~59k GGUF downloads). Strong at both roleplay and storywriting. Alpaca format. The OG.
Dan's PersonalityEngine V1.3.0 24B GGUF No Ollama / llama.cpp Fine-tuned from Mistral Small 3.1 24B Base. Trained on a massive mix: roleplay, storywriting, tool use, math, reasoning, code, medical, legal, and survival topics. Multilingual (EN, AR, DE, FR, ES, HI, PT, JA, KO). A genuine generalist with personality.
SuperGemma4 26B Abliterated Multimodal 26B multimodal GGUF No custom multimodal stack Based on Gemma 4 26B-A4B. Multimodal (image-text-to-text). Abliterated with low refusal. Optimized for Apple Silicon (MLX). Supports Korean + English. Tool use and coding tags.
Gemma 3 27B Abliterated 27B GGUF No Ollama / llama.cpp Abliterated version of Google's Gemma 3 27B instruct. Multimodal (image-text-to-text). Reduced refusal behavior while preserving instruction-following quality.
Huihui Gemma 4 31B Abliterated 31B GGUF No Ollama / llama.cpp Abliterated Gemma 4 31B instruct. Multimodal (any-to-any pipeline tag). Dense 31B, not MoE. Strongest Gemma 4 dense abliterated option.
Gemma 4 31B Abliterated 31B GGUF + safetensors No Ollama / llama.cpp Same base as above (Gemma 4 31B-it) but different abliteration method using mlabonne's harmful_behaviors + harmless_alpaca datasets. Both formats in one repo.
Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-Abliterated 35B A3B GGUF No Ollama / llama.cpp Qwen 3.5 MoE (35B total, ~3B active). Distilled from Claude 4.6 Opus reasoning. Chain-of-thought and reasoning-focused. Abliterated. Multimodal. Punches well above its active param count on reasoning tasks.
Midnight Rose 70B v2.0.3 70B GGUF No Ollama / llama.cpp By sophosympatheia. Complex multi-stage SLERP/DARE-TIES merge of WizardLM, Tulu-2-DPO, Dolphin, and earlier Midnight Rose versions. Uncensored. Designed for roleplay and storytelling. Scored surprisingly high on EQ-Bench even at low quants. ~6k context sweet spot.
Midnight Miqu 70B v1.5 70B GGUF No Ollama / llama.cpp Llama-family merge of Midnight-Miqu v1.0 and Tess-70B. Creative writing and roleplay focused. 32k context. Known for strong prose quality and character consistency at 70B scale.
Midnight Rose 103B v2.0.3 103B GGUF No heavy self-host Same lineage as the 70B but scaled up. Importance-matrix GGUF by mradermacher. Firmly in the "need real hardware" category.
DeepSeek V3 671B A37B safetensors Yes: DeepInfra, Novita Hosted preferred Massive MoE. 671B total, 37B active. Strong on code, math, and instruction-following. Pre-trained on ~15T tokens. Use via OpenRouter, not locally.
DeepSeek V3.2 685B A37B safetensors No confirmed provider yet Hosted preferred Successor to V3. Same general architecture class. Not a local play.
Behemoth-123B-v1 123B GGUF No heavy self-host Mistral-family 123B. Creative/RP community model. Massive parameter count makes it impractical for casual local use but prized for output quality in the r/LocalLLM community.
Monstral-123B 123B GGUF No heavy self-host Mistral-family 123B. Text generation and chat focused. Same weight class as Behemoth, different training mix and community lineage.
BlackSheep-Large ~27B GGUF No Ollama / llama.cpp By TroyDoesAI. Canonical repo is gated. Q8_0 is ~29.5 GB, placing it in the 27B-class. Community RP/creative model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment