wenboown/ollama.md

Use Local LLM with Cursor and Ollama/LM Studio (Ubuntu GPU Server)

This guide covers setting up local inference with a permanent, static ngrok tunnel and the "Model Aliasing" trick required to unlock Cursor Agent Mode.

Requirements

Cursor: Installed on your local machine.
Ubuntu GPU Server: With NVIDIA drivers and Docker/CORS configured.
ngrok: A free account with one Static Domain (e.g., yourdomain.ngrok-free.app).
Storage(optional): A mounted SSD (e.g., /mnt/models) for model storage.

Step 1: Infrastructure Setup

1. Install Inference Engine

For Ollama:

curl -fsSL https://ollama.com/install.sh | sh

For LM Studio: https://dimensionquest.net/2024/09/how-i-install-lm-studio-0.3.2-on-ubuntu-studio-24.04-linux/

2. Configure Tunnel (ngrok)

Authenticate your server:

ngrok config add-authtoken YOUR_AUTHTOKEN

Step 2: Custom Storage & Service Configuration

1. Move Models (Optional SSD setup)

For ollama:

sudo systemctl stop ollama
sudo mv /usr/share/ollama/.ollama/models /mnt/models/ollama/
sudo chown -R ollama:ollama /mnt/models/ollama
sudo chmod +x /mnt /mnt/models /mnt/models/ollama

For LMStudio: Use Developer mode, go to Modles tab, you can config the Models Directory on the top.

2. Configure Ollama/LM Studio for Cursor

Cursor requires open CORS origins to communicate via its backend.

For Ollama, edit the service:

sudo systemctl edit ollama

Paste:

Environment="OLLAMA_MODELS=/mnt/models/ollama"
Environment="OLLAMA_ORIGINS=*"

For LM Studio, no need to do anything.

Step 3: Persistent ngrok Tunnel

Create a systemd service to keep the tunnel alive on your static domain.

For ollama use 11434 as port; for LMStudio, match the port number you selected in the UI.

Create File: sudo nano /etc/systemd/system/ngrok.service
Paste Config: (Replace domain and user)

[Unit]
Description=Ngrok Static Tunnel for Local LLM
After=network-online.target

[Service]
User=your_user
ExecStart=/usr/local/bin/ngrok http 11434 --url yourdomain.ngrok-free.app --host-header="localhost:11434"
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Start: sudo systemctl enable --now ngrok.service

Step 4: The "Agent Mode" Aliasing Trick

As of January 2026, Cursor has migrated GPT models to a new "Responses API" that uses an "input" field instead of the traditional "messages" array. Local servers cannot parse this, causing errors. However, Gemini identifiers still use the standard format.

Option A: Using Ollama

Create a new model named gemini-1.5-pro based on your coder model:

# Create a file named 'Modelfile'
echo "FROM qwen3-coder:30b" > Modelfile
ollama create gemini-1.5-pro -f Modelfile

Option B: Using LM Studio

Change the folder name to gemini-1.5-pro, LM Studio use folder name as the model's API identifier.

Step 5: Configure Cursor

Go to Cursor Settings > Models.
Add Model: Add gemini-1.5-pro.
OpenAI API Key: Enter Ollama (or any string; it's just a placeholder).
Override OpenAI Base URL: https://yourdomain.ngrok-free.app/v1

Step 6: Stability Rule (MDC) (Optional, Qwen3-coder works quite well without this.)

To ensure the local model reliably triggers the "Apply" button and uses tools instead of just chatting, add a project rule.

Create File: .cursor/rules/agent-editing.mdc
Content:

---
description: Trigger when the agent needs to edit files or use tools
globs: 
alwaysApply: true
---
- You are operating in Cursor Agent Mode.
- ALWAYS use the `edit_file` tool for all code modifications.
- NEVER suggest code in the chat if a file edit is possible.
- Use standard SEARCH/REPLACE blocks for diffs.

Verification Commands

Check Service: sudo systemctl status ollama and sudo systemctl status ngrok
Verify Tunnel: curl http://localhost:4040/api/tunnels
Check Path: sudo cat /proc/$(pgrep ollama)/environ | tr '\0' '\n' | grep OLLAMA
Check Logs: journalctl -u ngrok -f (to see incoming requests from Cursor)