Skip to content

Instantly share code, notes, and snippets.

View noxgle's full-sized avatar
:dependabot:
I may be slow to respond.

Sebastian noxgle

:dependabot:
I may be slow to respond.
View GitHub Profile
@noxgle
noxgle / opencode-gemma4-ollama-macos.md
Created April 7, 2026 09:16 — forked from daniel-farina/opencode-gemma4-ollama-macos.md
Running OpenCode with Gemma 4 26B on macOS via llama.cpp - fixing tool calling

Running OpenCode with Gemma 4 26B on macOS (via llama.cpp)

As of April 2026, Gemma 4 tool calling is broken in Ollama v0.20.0 (ollama/ollama#15241) - the tool call parser fails and streaming drops tool calls entirely. OpenCode also has issues with local OpenAI-compatible providers (anomalyco/opencode#20669, #20719).

This guide documents a working setup using:

  • llama.cpp (built from source with PR #21326 template fix + PR #21343 tokenizer fix) instead of Ollama
  • OpenCode built from source with PR #16531 tool-call compatibility layer

Tested on macOS Apple Silicon (M1 Max, 32GB) on April 2, 2026.

@noxgle
noxgle / vyos-optimisations
Created February 21, 2023 11:51 — forked from RafPe/vyos-optimisations
vyos throughput optimizations
Server 2 sockets,6 cores each, 2.4ghz
# Set ixgbe options
# Limit RSS queues to the number of physical cores per cpu
# Disable offload
# When you change this, you need to run the command and reboot for it to take.
echo "options ixgbe LRO=0,0 MQ=1,1 RSS=6,6 VMDQ=0,0 vxlan_rx=0,0" > /etc/modprobe.d/ixgbe.conf
# Shut down HT cores
for i in $(seq 1 2 23); do
@noxgle
noxgle / nginx-tuning.md
Created January 7, 2020 18:50 — forked from denji/nginx-tuning.md
NGINX tuning for best performance

NGINX Tuning For Best Performance

For this configuration you can use web server you like, i decided, because i work mostly with it to use nginx.

Generally, properly configured nginx can handle up to 400K to 500K requests per second (clustered), most what i saw is 50K to 80K (non-clustered) requests per second and 30% CPU load, course, this was 2 x Intel Xeon with HyperThreading enabled, but it can work without problem on slower machines.

You must understand that this config is used in testing environment and not in production so you will need to find a way to implement most of those features best possible for your servers.