Skip to content

Instantly share code, notes, and snippets.

@rogarcia
Forked from olafgeibig/cc-proxy.sh
Created August 1, 2025 21:44
Show Gist options
  • Select an option

  • Save rogarcia/c198536df4de92aa3e1b981919b414be to your computer and use it in GitHub Desktop.

Select an option

Save rogarcia/c198536df4de92aa3e1b981919b414be to your computer and use it in GitHub Desktop.
A LiteLLM proxy solution to use Claude Code with models from the Weights and Biases inference service. You need to have LiteLLM installed or use the docker container. Easiest is to install it with `uv tool install "litellm[proxy]"` Don't worry about the fallback warnings. Either LiteLLM, W&B or the combo of both are not handling streaming respon…
#!/bin/bash
export WANDB_API_KEY=<your key>
export WANDB_PROJECT=<org/project>
litellm --port 4000 --debug --config cc-proxy.yaml
litellm_settings:
drop_params: true
cache: True
cache_params:
type: local
enable_preview_features: True
model_list:
- model_name: anthropic/claude-sonnet-*
litellm_params:
model: openai/Qwen/Qwen3-Coder-480B-A35B-Instruct
api_key: "os.environ/WANDB_API_KEY"
api_base: https://api.inference.wandb.ai/v1
headers:
OpenAI-Project: "os.environ/WANDB_PROJECT"
max_tokens: 65536
repetition_penalty: 1.05
temperature: 0.7
top_k: 20
top_p: 0.8
model_info:
input_cost_per_token: 0.000001
output_cost_per_token: 0.0000015
- model_name: anthropic/claude-opus-*
litellm_params:
model: openai/Qwen/Qwen3-235B-A22B-Thinking-2507
api_key: "os.environ/WANDB_API_KEY"
api_base: https://api.inference.wandb.ai/v1
headers:
OpenAI-Project: "os.environ/WANDB_PROJECT"
max_tokens: 65536
repetition_penalty: 1.05
temperature: 0.6
top_k: 40
top_p: 0.95
model_info:
input_cost_per_token: 0.0000001
output_cost_per_token: 0.0000001
- model_name: anthropic/claude-3-5-haiku-*
litellm_params:
model: openai/Qwen/Qwen3-235B-A22B-Instruct-2507
api_key: "os.environ/WANDB_API_KEY"
api_base: https://api.inference.wandb.ai/v1
max_tokens: 65536
repetition_penalty: 1.05
temperature: 0.7
top_k: 20
top_p: 0.8
headers:
OpenAI-Project: "os.environ/WANDB_PROJECT"
input_cost_per_token: 0.0000001
output_cost_per_token: 0.0000001
#!/bin/bash
export ANTHROPIC_AUTH_TOKEN=sk-1234
export ANTHROPIC_BASE_URL=http://localhost:4000
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
# Startin VS Code, but could also run claude here
code &
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment