miles990/litellm-mlx-claude-code-guide.md

用 LiteLLM 串接 MLX Server 到 Claude Code

問題

Claude Code 發送 Anthropic Messages API 格式的請求，但 MLX Server（mlx_lm.server）提供的是 OpenAI Chat Completions API 格式。兩者格式不同，無法直接串接。

解決方案

用 LiteLLM 作為中間層 proxy，自動處理格式轉換：

Claude Code  ──(Anthropic 格式)──▶  LiteLLM Proxy  ──(OpenAI 格式)──▶  MLX Server

前置條件

Python 3.9+
MLX Server 已在本機運行（預設 http://localhost:8080/v1）
Claude Code 已安裝

步驟 1：安裝 LiteLLM

pip install 'litellm[proxy]'

步驟 2：建立設定檔

建立 litellm_config.yaml：

model_list:
  # 把 Claude 模型名稱映射到你的 MLX Server
  - model_name: claude-sonnet-4-20250514
    litellm_params:
      model: openai/mlx-model        # "openai/" 前綴告訴 LiteLLM 用 OpenAI 格式
      api_base: http://localhost:8080/v1  # 你的 MLX Server 地址
      api_key: not-needed             # MLX Server 不需要 key，但欄位必填

  # 可以映射多個模型名稱到同一個後端
  - model_name: claude-haiku-4-5-20251001
    litellm_params:
      model: openai/mlx-model
      api_base: http://localhost:8080/v1
      api_key: not-needed

注意：model_name 要填 Claude Code 會發送的模型名稱。Claude Code 預設使用 claude-sonnet-4-20250514。

步驟 3：啟動 LiteLLM Proxy

litellm --config litellm_config.yaml --port 4000

你應該會看到：

LiteLLM: Proxy initialized with Config, Set models:
LiteLLM Proxy started on http://0.0.0.0:4000

步驟 4：設定 Claude Code

在你的 shell 環境設定 Claude Code 的 API 端點指向 LiteLLM：

# 加到 ~/.zshrc 或 ~/.bashrc
export ANTHROPIC_BASE_URL=http://localhost:4000

或者單次使用：

ANTHROPIC_BASE_URL=http://localhost:4000 claude

步驟 5：驗證

# 1. 確認 MLX Server 在跑
curl http://localhost:8080/v1/models

# 2. 確認 LiteLLM Proxy 在跑
curl http://localhost:4000/health

# 3. 測試 Anthropic 格式請求（模擬 Claude Code）
curl http://localhost:4000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: not-needed" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hello, say hi in one sentence."}]
  }'

# 4. 啟動 Claude Code
ANTHROPIC_BASE_URL=http://localhost:4000 claude

常見問題

Q: MLX Server 不在 8080 port？

修改 litellm_config.yaml 中的 api_base 即可。

Q: Claude Code 報模型找不到？

確認 model_name 和 Claude Code 發送的模型名稱一致。可以用以下命令查看 LiteLLM 認識哪些模型：

curl http://localhost:4000/v1/models

Q: Streaming 不正常？

LiteLLM 預設支援 streaming。如果遇到問題，試試在 config 加上：

litellm_settings:
  set_verbose: true  # 開啟詳細日誌方便除錯

Q: 想用特定的 MLX 模型？

啟動 MLX Server 時指定模型：

mlx_lm.server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 8080

架構說明

┌─────────────┐     Anthropic API      ┌──────────────┐     OpenAI API      ┌────────────┐
│ Claude Code │ ──────────────────────▶ │ LiteLLM      │ ──────────────────▶ │ MLX Server │
│             │ ◀────────────────────── │ :4000        │ ◀────────────────── │ :8080      │
└─────────────┘     Anthropic 格式      └──────────────┘     OpenAI 格式      └────────────┘

LiteLLM 自動處理：

Request 轉換：Anthropic Messages → OpenAI Chat Completions
Response 轉換：OpenAI 格式 → Anthropic 格式
Streaming：SSE 格式自動轉換
錯誤處理：統一錯誤格式

替代方案

如果覺得 LiteLLM 太重（它是完整的 API gateway），也可以考慮：

自己寫一個輕量 proxy（~180 行 Node.js），只處理 Anthropic ↔ OpenAI 格式轉換
直接用支援 Anthropic 格式的 MLX serving 框架

但對大多數使用者來說，LiteLLM 的 pip install + 一個 yaml 是最快的方案。