Skip to content

Instantly share code, notes, and snippets.

View noxgle's full-sized avatar
:dependabot:
I may be slow to respond.

Sebastian noxgle

:dependabot:
I may be slow to respond.
View GitHub Profile
@daniel-farina
daniel-farina / opencode-gemma4-ollama-macos.md
Last active May 4, 2026 14:02
Running OpenCode with Gemma 4 26B on macOS via llama.cpp - fixing tool calling

Running OpenCode with Gemma 4 26B on macOS (via llama.cpp)

As of April 2026, Gemma 4 tool calling is broken in Ollama v0.20.0 (ollama/ollama#15241) - the tool call parser fails and streaming drops tool calls entirely. OpenCode also has issues with local OpenAI-compatible providers (anomalyco/opencode#20669, #20719).

This guide documents a working setup using:

  • llama.cpp (built from source with PR #21326 template fix + PR #21343 tokenizer fix) instead of Ollama
  • OpenCode built from source with PR #16531 tool-call compatibility layer

Tested on macOS Apple Silicon (M1 Max, 32GB) on April 2, 2026.

@denji
denji / nginx-tuning.md
Last active April 24, 2026 08:09
NGINX tuning for best performance

NGINX Tuning For Best Performance

For this configuration you can use web server you like, i decided, because i work mostly with it to use nginx.

Generally, properly configured nginx can handle up to 400K to 500K requests per second (clustered), most what i saw is 50K to 80K (non-clustered) requests per second and 30% CPU load, course, this was 2 x Intel Xeon with HyperThreading enabled, but it can work without problem on slower machines.

You must understand that this config is used in testing environment and not in production so you will need to find a way to implement most of those features best possible for your servers.