Skip to content

Instantly share code, notes, and snippets.

@awni
Last active May 5, 2026 13:55
Show Gist options
  • Select an option

  • Save awni/93a973a0cf5fb539b2ce1f37ec4a9989 to your computer and use it in GitHub Desktop.

Select an option

Save awni/93a973a0cf5fb539b2ce1f37ec4a9989 to your computer and use it in GitHub Desktop.
OpenCode with MLX

The following guide will show you how to connect a local model served with MLX to OpenCode for local coding.

1. Install OpenCode

curl -fsSL https://opencode.ai/install | bash

2. Install mlx-lm

pip install mlx-lm

3. Make a custom provider for OpenCode

Open ~/.config/opencode/opencode.json and past the following (if you already have a config just add the MLX provider):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
          "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

4. Start the mlx-lm server

mlx_lm.server

5. Start OpenCode and select the provider

In the repo you plan to work, type opencode.

Once inside the OpenCode TUI:

  1. Enter /connect
  2. Type MLX and select it
  3. For the API key enter none
  4. Select the model
  5. Start planning and building
@electroheadfx
Copy link
Copy Markdown

Nice! So I don't need to specify the model checkpoint as a command line option of mlx_lm.server, correct? Will opencode attach the model name in the request and triggers the server to load the model?

@wangkuiyi, correct! And if you list multiple models in the config, for example:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/Qwen2.5-Coder-7B-Instruct-8bit": {
            "name": "Qwen 2.5 Coder"
        },
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
            "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

How you can give 2 baseURL for each model ?
because I can load 2 models on the same base URL ?

Thanks bro !

@awni
Copy link
Copy Markdown
Author

awni commented Jan 6, 2026

because I can load 2 models on the same base URL ?

Each provider (e.g. MLX) has a url (localhost for local providers).
Each provider can have an arbitrary number of models.

@haishengXie0712
Copy link
Copy Markdown

I connected to OpenCode using the method described above, but it couldn't write code to the file.

@sirolf99
Copy link
Copy Markdown

sirolf99 commented May 5, 2026

what if already downloaded, how to server from local repo ? (and whit what name ? )
mlx_lm.server --model /Volumes//huggingface/hub/mlx-community/Qwen3.6-35B-A3B-nvfp4/ --host 0.0.0.0 --


it's still trying to redownload from huggingface instead of using local folder or searching in another location
/users/.cache/huggingface/hub/models--mlx-community--Qwen3.6-35B-A3B-nvfp4/snapshots/9c1a3a223ddd8a3425212cc421056614f149cf0f
with error Not Found: No safetensors found ; but safetensors are in the provided --model folder

@sirolf99
Copy link
Copy Markdown

sirolf99 commented May 5, 2026

qwen 3.6 -> 35b-a3b + opencode

2026-05-05 15:38:49,121 - INFO - Prompt processing progress: 43835/43836
2026-05-05 15:38:49,287 - INFO - Prompt processing progress: 43836/43836
2026-05-05 15:38:58,984 - INFO - Prompt Cache: 4 sequences, 3.46 GB
2026-05-05 15:38:58,984 - INFO - - assistant: 2 sequences, 1.93 GB
2026-05-05 15:38:58,984 - INFO - - user: 1 sequences, 0.96 GB
2026-05-05 15:38:58,984 - INFO - - system: 1 sequences, 0.57 GB
192.168.1.73 - - [05/May/2026 15:38:58] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-05-05 15:39:00,384 - INFO - Prompt processing progress: 228/229
2026-05-05 15:39:00,433 - INFO - Prompt processing progress: 229/229
2026-05-05 15:39:05,155 - INFO - Prompt Cache: 6 sequences, 5.40 GB
2026-05-05 15:39:05,155 - INFO - - assistant: 4 sequences, 3.87 GB
2026-05-05 15:39:05,155 - INFO - - user: 1 sequences, 0.96 GB
2026-05-05 15:39:05,155 - INFO - - system: 1 sequences, 0.57 GB
192.168.1.73 - - [05/May/2026 15:39:05] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-05-05 15:39:06,142 - INFO - Prompt processing progress: 89/90
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)

+> trying workaround with param
--prompt-cache-bytes 8589934592 \

--prompt-cache-size 6
--prompt-concurrency 2
--decode-concurrency 2
--prefill-step-size 1024

+> same crash (dont have that issue with llama.cpp/lmstudio :| )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment