This guide covers setting up local inference with a permanent, static ngrok tunnel and the "Model Aliasing" trick required to unlock Cursor Agent Mode.
- Cursor: Installed on your local machine.
- Ubuntu GPU Server: With NVIDIA drivers and Docker/CORS configured.
- ngrok: A free account with one Static Domain (e.g.,
yourdomain.ngrok-free.app). - Storage(optional): A mounted SSD (e.g.,
/mnt/models) for model storage.
For Ollama:
curl -fsSL https://ollama.com/install.sh | shFor LM Studio: https://dimensionquest.net/2024/09/how-i-install-lm-studio-0.3.2-on-ubuntu-studio-24.04-linux/
Authenticate your server:
ngrok config add-authtoken YOUR_AUTHTOKENFor ollama:
sudo systemctl stop ollama
sudo mv /usr/share/ollama/.ollama/models /mnt/models/ollama/
sudo chown -R ollama:ollama /mnt/models/ollama
sudo chmod +x /mnt /mnt/models /mnt/models/ollamaFor LMStudio:
Use Developer mode, go to Modles tab, you can config the Models Directory on the top.
Cursor requires open CORS origins to communicate via its backend.
For Ollama, edit the service:
sudo systemctl edit ollamaPaste:
Environment="OLLAMA_MODELS=/mnt/models/ollama"
Environment="OLLAMA_ORIGINS=*"For LM Studio, no need to do anything.
Create a systemd service to keep the tunnel alive on your static domain.
For ollama use 11434 as port; for LMStudio, match the port number you selected in the UI.
- Create File:
sudo nano /etc/systemd/system/ngrok.service - Paste Config: (Replace domain and user)
[Unit]
Description=Ngrok Static Tunnel for Local LLM
After=network-online.target
[Service]
User=your_user
ExecStart=/usr/local/bin/ngrok http 11434 --url yourdomain.ngrok-free.app --host-header="localhost:11434"
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target- Start:
sudo systemctl enable --now ngrok.service
As of January 2026, Cursor has migrated GPT models to a new "Responses API" that uses an "input" field instead of the traditional "messages" array. Local servers cannot parse this, causing errors. However, Gemini identifiers still use the standard format.
Create a new model named gemini-1.5-pro based on your coder model:
# Create a file named 'Modelfile'
echo "FROM qwen3-coder:30b" > Modelfile
ollama create gemini-1.5-pro -f ModelfileChange the folder name to gemini-1.5-pro, LM Studio use folder name as the model's API identifier.
- Go to Cursor Settings > Models.
- Add Model: Add
gemini-1.5-pro. - OpenAI API Key: Enter
Ollama(or any string; it's just a placeholder). - Override OpenAI Base URL:
https://yourdomain.ngrok-free.app/v1
To ensure the local model reliably triggers the "Apply" button and uses tools instead of just chatting, add a project rule.
- Create File:
.cursor/rules/agent-editing.mdc - Content:
---
description: Trigger when the agent needs to edit files or use tools
globs:
alwaysApply: true
---
- You are operating in Cursor Agent Mode.
- ALWAYS use the `edit_file` tool for all code modifications.
- NEVER suggest code in the chat if a file edit is possible.
- Use standard SEARCH/REPLACE blocks for diffs.- Check Service:
sudo systemctl status ollamaandsudo systemctl status ngrok - Verify Tunnel:
curl http://localhost:4040/api/tunnels - Check Path:
sudo cat /proc/$(pgrep ollama)/environ | tr '\0' '\n' | grep OLLAMA - Check Logs:
journalctl -u ngrok -f(to see incoming requests from Cursor)