Gemini CLI is opensource, thus we can configure it to use with a custom model.
npm install -g @google/gemini-cli
npm uninstall -g @google/gemini-cli
export GOOGLE_GEMINI_BASE_URL="http://localhost:4000"
export NVIDIA_API_KEY="go to build.nvidia.com, verify phone and get a free API key"
Alternatively, we can also set them just before running gemini-cli with
export GOOGLE_GEMINI_BASE_URL="http://localhost:4000"
export GEMINI_API_KEY="sk-1234"
gemini
uv tool install 'litellm[proxy]'
nvim ~/litellm_config.yaml
This is the content of litellm_config.yaml
model_list:
# --------------- Google Free API https://aistudio.google.com/api-keys --------------------
- model_name: gemini-pro-latest
litellm_params:
model: gemini/gemini-pro-latest
api_key: os.environ/GEMINI_API_KEY
- model_name: gemini-flash-latest
litellm_params:
model: gemini/gemini-flash-latest
api_key: os.environ/GEMINI_API_KEY
- model_name: gemini-3.1-flash-lite-preview
litellm_params:
model: gemini/gemini-3.1-flash-lite-preview
api_key: os.environ/GEMINI_API_KEY
- model_name: gemma-4-31b-it
litellm_params:
model: gemini/gemma-4-31b-it
api_key: os.environ/GEMINI_API_KEY
# --------------- NVIDIA Free API calls build.nvidia.com --------------------
- model_name: NVIDIA_moonshotai_kimi-k2.6
litellm_params:
model: nvidia_nim/moonshotai/kimi-k2.6
api_key: "os.environ/NVIDIA_API_KEY"
api_base: "https://integrate.api.nvidia.com/v1"
max_tokens: 1000000
- model_name: NVIDIA_deepseek-v4-pro
litellm_params:
model: nvidia_nim/deepseek-ai/deepseek-v4-pro
api_key: "os.environ/NVIDIA_API_KEY"
api_base: "https://integrate.api.nvidia.com/v1"
max_tokens: 1000000
- model_name: NVIDIA_deepseek-v4-flash
litellm_params:
model: nvidia_nim/deepseek-ai/deepseek-v4-flash
api_key: "os.environ/NVIDIA_API_KEY"
api_base: "https://integrate.api.nvidia.com/v1"
max_tokens: 1000000
- model_name: NVIDIA_glm5.1 # custom model name for NVIDIA build.nvidia.com
litellm_params:
model: nvidia_nim/z-ai/glm5.1 #/deepseek-ai/deepseek-v4-pro # real one
api_key: "os.environ/NVIDIA_API_KEY"
api_base: "https://integrate.api.nvidia.com/v1"
max_tokens: 1000000
- model_name: NVIDIA_mistral-medium-3.5-128b
litellm_params:
model: nvidia_nim/mistralai/mistral-medium-3.5-128b
api_key: "os.environ/NVIDIA_API_KEY"
api_base: "https://integrate.api.nvidia.com/v1"
max_tokens: 1000000
- model_name: nvidia_nim_alias # custom model name for NVIDIA
litellm_params:
model: nvidia_nim/deepseek-ai/deepseek-v4-fash
api_key: "os.environ/NVIDIA_API_KEY"
api_base: "https://integrate.api.nvidia.com/v1"
max_tokens: 1000000
# --------------- NVIDIA Free API calls build.nvidia.com --------------------
router_settings:
model_group_alias:
"gemini-3.1-pro-preview": "nvidia_nim_alias"
"gemini-3.1-pro-preview-customtools": "nvidia_nim_alias"
"gemini-3-flash-preview": "nvidia_nim_alias"
"gemini-3-flash-preview-customtools": "nvidia_nim_alias"
"gemini-3.1-flash-lite-preview": "nvidia_nim_alias"
"gemini-3.1-flash-lite-preview-customtools": "nvidia_nim_alias"
"gemini-2.5-pro": "nvidia_nim_alias"
"gemini-2.5-pro-customtools": "nvidia_nim_alias"
"gemini-2.5-flash": "nvidia_nim_alias"
"gemini-2.5-flash-customtools": "nvidia_nim_alias"
"gemini-2.5-flash-lite": "nvidia_nim_alias"
"gemini-2.5-flash-lite-customtools": "nvidia_nim_alias"
litellm_settings:
drop_params: true
success_callback: []
failure_callback: []
callbacks: []
general_settings:
master_key: "sk-ok"
Start the litellm proxy server
litellm --config ~/litellm_config.yaml --port 4000
export GOOGLE_GEMINI_BASE_URL="http://localhost:4000"
export GEMINI_API_KEY="sk-ok"
gemini