Skip to content

Instantly share code, notes, and snippets.

@awni
Last active May 5, 2026 13:55
Show Gist options
  • Select an option

  • Save awni/93a973a0cf5fb539b2ce1f37ec4a9989 to your computer and use it in GitHub Desktop.

Select an option

Save awni/93a973a0cf5fb539b2ce1f37ec4a9989 to your computer and use it in GitHub Desktop.
OpenCode with MLX

The following guide will show you how to connect a local model served with MLX to OpenCode for local coding.

1. Install OpenCode

curl -fsSL https://opencode.ai/install | bash

2. Install mlx-lm

pip install mlx-lm

3. Make a custom provider for OpenCode

Open ~/.config/opencode/opencode.json and past the following (if you already have a config just add the MLX provider):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
          "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

4. Start the mlx-lm server

mlx_lm.server

5. Start OpenCode and select the provider

In the repo you plan to work, type opencode.

Once inside the OpenCode TUI:

  1. Enter /connect
  2. Type MLX and select it
  3. For the API key enter none
  4. Select the model
  5. Start planning and building
@wangkuiyi
Copy link
Copy Markdown

wangkuiyi commented Jan 3, 2026

Nice! So I don't need to specify the model checkpoint as a command line option of mlx_lm.server, correct? Will opencode attach the model name in the request and triggers the server to load the model?

@JoeJoe1313
Copy link
Copy Markdown

Nice! So I don't need to specify the model checkpoint as a command line option of mlx_lm.server, correct? Will opencode attach the model name in the request and triggers the server to load the model?

@wangkuiyi, correct! And if you list multiple models in the config, for example:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/Qwen2.5-Coder-7B-Instruct-8bit": {
            "name": "Qwen 2.5 Coder"
        },
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
            "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

you can switch between these models directly in opencode:

Screenshot 2026-01-03 at 14 24 49

The first time you prompt the model it would need to download it first if you hadn't downloaded it before (you can see on the left):

Screenshot 2026-01-03 at 14 22 55

@awni
Copy link
Copy Markdown
Author

awni commented Jan 3, 2026

FYI in addition to this guide you need to be aware of a couple more things:

  1. Until this PR lands (ml-explore/mlx-lm#711) tool calling is basically broken in mlx_lm.server
  2. Even with the above PR tool calling support is limited to certain models. If you aren't sure post the model here or open an issue. We will keep adding more tool parsers to support more models as needed

@wangkuiyi
Copy link
Copy Markdown

Thank you so much @JoeJoe1313 and @awni !

@electroheadfx
Copy link
Copy Markdown

Nice! So I don't need to specify the model checkpoint as a command line option of mlx_lm.server, correct? Will opencode attach the model name in the request and triggers the server to load the model?

@wangkuiyi, correct! And if you list multiple models in the config, for example:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/Qwen2.5-Coder-7B-Instruct-8bit": {
            "name": "Qwen 2.5 Coder"
        },
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
            "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

How you can give 2 baseURL for each model ?
because I can load 2 models on the same base URL ?

Thanks bro !

@awni
Copy link
Copy Markdown
Author

awni commented Jan 6, 2026

because I can load 2 models on the same base URL ?

Each provider (e.g. MLX) has a url (localhost for local providers).
Each provider can have an arbitrary number of models.

@haishengXie0712
Copy link
Copy Markdown

I connected to OpenCode using the method described above, but it couldn't write code to the file.

@sirolf99
Copy link
Copy Markdown

sirolf99 commented May 5, 2026

what if already downloaded, how to server from local repo ? (and whit what name ? )
mlx_lm.server --model /Volumes//huggingface/hub/mlx-community/Qwen3.6-35B-A3B-nvfp4/ --host 0.0.0.0 --


it's still trying to redownload from huggingface instead of using local folder or searching in another location
/users/.cache/huggingface/hub/models--mlx-community--Qwen3.6-35B-A3B-nvfp4/snapshots/9c1a3a223ddd8a3425212cc421056614f149cf0f
with error Not Found: No safetensors found ; but safetensors are in the provided --model folder

@sirolf99
Copy link
Copy Markdown

sirolf99 commented May 5, 2026

qwen 3.6 -> 35b-a3b + opencode

2026-05-05 15:38:49,121 - INFO - Prompt processing progress: 43835/43836
2026-05-05 15:38:49,287 - INFO - Prompt processing progress: 43836/43836
2026-05-05 15:38:58,984 - INFO - Prompt Cache: 4 sequences, 3.46 GB
2026-05-05 15:38:58,984 - INFO - - assistant: 2 sequences, 1.93 GB
2026-05-05 15:38:58,984 - INFO - - user: 1 sequences, 0.96 GB
2026-05-05 15:38:58,984 - INFO - - system: 1 sequences, 0.57 GB
192.168.1.73 - - [05/May/2026 15:38:58] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-05-05 15:39:00,384 - INFO - Prompt processing progress: 228/229
2026-05-05 15:39:00,433 - INFO - Prompt processing progress: 229/229
2026-05-05 15:39:05,155 - INFO - Prompt Cache: 6 sequences, 5.40 GB
2026-05-05 15:39:05,155 - INFO - - assistant: 4 sequences, 3.87 GB
2026-05-05 15:39:05,155 - INFO - - user: 1 sequences, 0.96 GB
2026-05-05 15:39:05,155 - INFO - - system: 1 sequences, 0.57 GB
192.168.1.73 - - [05/May/2026 15:39:05] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-05-05 15:39:06,142 - INFO - Prompt processing progress: 89/90
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)

+> trying workaround with param
--prompt-cache-bytes 8589934592 \

--prompt-cache-size 6
--prompt-concurrency 2
--decode-concurrency 2
--prefill-step-size 1024

+> same crash (dont have that issue with llama.cpp/lmstudio :| )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment