awni/open_code_mlx.md

The following guide will show you how to connect a local model served with MLX to OpenCode for local coding.

1. Install OpenCode

curl -fsSL https://opencode.ai/install | bash

2. Install mlx-lm

pip install mlx-lm

3. Make a custom provider for OpenCode

Open ~/.config/opencode/opencode.json and past the following (if you already have a config just add the MLX provider):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
          "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

4. Start the mlx-lm server

mlx_lm.server

5. Start OpenCode and select the provider

In the repo you plan to work, type opencode.

Once inside the OpenCode TUI:

Enter /connect
Type MLX and select it
For the API key enter none
Select the model
Start planning and building

electroheadfx · 2026-01-06T07:45:52Z

Nice! So I don't need to specify the model checkpoint as a command line option of mlx_lm.server, correct? Will opencode attach the model name in the request and triggers the server to load the model?

@wangkuiyi, correct! And if you list multiple models in the config, for example:
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {
        "mlx-community/Qwen2.5-Coder-7B-Instruct-8bit": {
            "name": "Qwen 2.5 Coder"
        },
        "mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bit": {
            "name": "Nemotron 3 Nano"
        }
      }
    }
  }
}

How you can give 2 baseURL for each model ?
because I can load 2 models on the same base URL ?

Thanks bro !

awni · 2026-01-06T15:32:58Z

because I can load 2 models on the same base URL ?

Each provider (e.g. MLX) has a url (localhost for local providers).
Each provider can have an arbitrary number of models.

haishengXie0712 · 2026-04-20T13:18:32Z

I connected to OpenCode using the method described above, but it couldn't write code to the file.

sirolf99 · 2026-05-05T13:27:14Z

what if already downloaded, how to server from local repo ? (and whit what name ? )
mlx_lm.server --model /Volumes//huggingface/hub/mlx-community/Qwen3.6-35B-A3B-nvfp4/ --host 0.0.0.0 --

it's still trying to redownload from huggingface instead of using local folder or searching in another location
/users/.cache/huggingface/hub/models--mlx-community--Qwen3.6-35B-A3B-nvfp4/snapshots/9c1a3a223ddd8a3425212cc421056614f149cf0f
with error Not Found: No safetensors found ; but safetensors are in the provided --model folder

sirolf99 · 2026-05-05T13:52:32Z

qwen 3.6 -> 35b-a3b + opencode

2026-05-05 15:38:49,121 - INFO - Prompt processing progress: 43835/43836
2026-05-05 15:38:49,287 - INFO - Prompt processing progress: 43836/43836
2026-05-05 15:38:58,984 - INFO - Prompt Cache: 4 sequences, 3.46 GB
2026-05-05 15:38:58,984 - INFO - - assistant: 2 sequences, 1.93 GB
2026-05-05 15:38:58,984 - INFO - - user: 1 sequences, 0.96 GB
2026-05-05 15:38:58,984 - INFO - - system: 1 sequences, 0.57 GB
192.168.1.73 - - [05/May/2026 15:38:58] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-05-05 15:39:00,384 - INFO - Prompt processing progress: 228/229
2026-05-05 15:39:00,433 - INFO - Prompt processing progress: 229/229
2026-05-05 15:39:05,155 - INFO - Prompt Cache: 6 sequences, 5.40 GB
2026-05-05 15:39:05,155 - INFO - - assistant: 4 sequences, 3.87 GB
2026-05-05 15:39:05,155 - INFO - - user: 1 sequences, 0.96 GB
2026-05-05 15:39:05,155 - INFO - - system: 1 sequences, 0.57 GB
192.168.1.73 - - [05/May/2026 15:39:05] "POST /v1/chat/completions HTTP/1.1" 200 -
2026-05-05 15:39:06,142 - INFO - Prompt processing progress: 89/90
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)

+> trying workaround with param
--prompt-cache-bytes 8589934592 \

--prompt-cache-size 6
--prompt-concurrency 2
--decode-concurrency 2
--prefill-step-size 1024

+> same crash (dont have that issue with llama.cpp/lmstudio :| )

awni/open_code_mlx.md

Select an option

No results found

Select an option

No results found

1. Install OpenCode

2. Install mlx-lm

3. Make a custom provider for OpenCode

4. Start the mlx-lm server

5. Start OpenCode and select the provider

electroheadfx commented Jan 6, 2026

Uh oh!

awni commented Jan 6, 2026

Uh oh!

haishengXie0712 commented Apr 20, 2026

Uh oh!

sirolf99 commented May 5, 2026 •

edited

Loading

Uh oh!

sirolf99 commented May 5, 2026 •

edited

Loading

Uh oh!

awni/open_code_mlx.md

1. Install OpenCode

2. Install mlx-lm

3. Make a custom provider for OpenCode

4. Start the mlx-lm server

5. Start OpenCode and select the provider

electroheadfx commented Jan 6, 2026

Uh oh!

awni commented Jan 6, 2026

Uh oh!

haishengXie0712 commented Apr 20, 2026

Uh oh!

sirolf99 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sirolf99 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sirolf99 commented May 5, 2026 •

edited

Loading

sirolf99 commented May 5, 2026 •

edited

Loading