Skip to content

Instantly share code, notes, and snippets.

@peak-flow
Last active January 26, 2026 15:46
Show Gist options
  • Select an option

  • Save peak-flow/96790370f220f8c3c4da03ec34a6cad1 to your computer and use it in GitHub Desktop.

Select an option

Save peak-flow/96790370f220f8c3c4da03ec34a6cad1 to your computer and use it in GitHub Desktop.
pc.md

💠‍🌐 Absolutely — here’s a PC/RTX 5070 version of the Qwen3-TTS setup guide, in the same “ready to hand to a local AI agent (or follow yourself)” format you loved for XTTS-v2.

We’ll cover:

  1. Environment
  2. Install & dependencies
  3. Running Qwen3-TTS
  4. Performance tips for RTX
  5. Extras (chunking, voice options)
  6. Quick troubleshooting

🚀 Qwen3-TTS on Windows/Linux PC (RTX 5070)

🧠 What This Is

This guide gets you:

  • Qwen3-TTS model running locally
  • On a PC with an RTX 5070 GPU
  • Using Python + PyTorch + CUDA
  • Good quality English TTS

⚠️ Not real-time by default — output batches audio. It’s for offline generation or near-interactive narrative, not live call-and-response.


🖥️ 0️⃣ Prerequisites (System)

✔️ Windows (10/11) or Linux ✔ NVIDIA RTX 5070 GPU ✔ CUDA toolkit compatible with your driver (e.g., CUDA 12/12.1) ✔ Python 3.10 or 3.11 ✔ 12+ GB VRAM helps, but 8 GB is workable


📦 1️⃣ Create a Python Environment

💡 Use a virtual environment so dependencies don’t conflict:

python3.11 -m venv qwen3tts
source qwen3tts/bin/activate   # Linux/mac
qwen3tts\Scripts\activate      # Windows
pip install --upgrade pip setuptools wheel

🧠 2️⃣ Install PyTorch with CUDA

Install PyTorch that matches your CUDA version from PyTorch’s official channel:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org

Check CUDA availability:

python - << 'EOF'
import torch
print("CUDA available:", torch.cuda.is_available())
print("CUDA device count:", torch.cuda.device_count())
EOF

Expect:

CUDA available: True

📥 3️⃣ Install Qwen3-TTS Dependencies

You’ll need:

  • transformers
  • accelerate
  • soundfile
  • numpy
  • maybe other repo deps

Install:

pip install transformers accelerate soundfile numpy scipy

🧾 4️⃣ Get the Qwen3-TTS Model

Clone the repo (example name, adjust if different):

git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS

Install any internal requirements:

pip install -r requirements.txt

🗣️ 5️⃣ Run the Basic Inference Script

Below is the minimal script to generate audio.

Create generate_qwen3.py:

import torch
import soundfile as sf

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained("Qwen/Qwen3-TTS")
model = AutoModelForSpeechSeq2Seq.from_pretrained("Qwen/Qwen3-TTS")
model.to(device)
model.eval()

text = "This is Qwen three text to speech running on an RTX 5070."

inputs = processor(text=text, return_tensors="pt").to(device)

with torch.no_grad():
    generated_audio = model.generate(**inputs)

audio_data = generated_audio.cpu().numpy()

sf.write("qwen3_output.wav", audio_data, 24000)

Run it:

python generate_qwen3.py

Play the output:

# Windows
start qwen3_output.wav

# macOS
afplay qwen3_output.wav

# Linux
aplay qwen3_output.wav

🧠 6️⃣ Performance Tips (RTX 5070)

🟡 Optimize GPU usage

  • Batch input shorter phrases
  • Use mixed precision (FP16) if model supports it

Example for mixed precision:

with torch.autocast(device_type="cuda", dtype=torch.float16):
    generated_audio = model.generate(**inputs)

🟡 Chunk text

Split long text into sentences for faster partial outputs:

import re

def chunk_text(text):
    return re.split(r'(?<=[.!?])\s+', text)

chunks = chunk_text(your_long_text)

for i, chunk in enumerate(chunks):
    # generate per chunk

This avoids huge memory spikes.


📊 7️⃣ Expected Throughput

Setup Behavior
RTX 5070 Best open local TTS perf
CPU fallback Very slow
Chunked text Good trade-off
Full paragraph Slower

Qwen3-TTS isn’t designed for streaming, so even on RTX it’s batch-oriented now.


🧩 8️⃣ Optional: Voice Customization

Right now Qwen3-TTS workflows often include options like:

  • Speaker IDs
  • Conditioning input
  • Pitch control

Examples often show:

inputs.update({"speaker": speaker_id})

Check model doc / repo for exact syntax.


🐛 9️⃣ Troubleshooting

❌ Out of GPU memory

  • Shorten text chunks
  • Lower batch size
  • Try FP16 generation

❌ CUDA errors

  • Make sure CUDA & driver versions match
  • Reinstall correct PyTorch wheel

❌ Distorted audio

  • Same sample rate mismatch
  • Try resampling to 24 kHz

🔁 10️⃣ How to Integrate With Local Agent

Since LM Studio doesn’t natively handle TTS pipelines, your agent must:

  1. Generate text
  2. Save it to a .txt file or variable
  3. Run the Python TTS script
  4. Return qwen3_output.wav to the UI
  5. (Optional) Stream playback

Example flow:

User → Agent → text
↓
Agent runs qwen3 script → audio file
↓
Audio returned & played

🧠 Summary (RTX Version)

Model RTX Support Real-Time Ease Best Use
Qwen3-TTS 👍 ❌ (batch) 🟡 High-quality offline
XTTS-v2 👍 ⚠️ (near RT) Live/interactive
PersonaPlex 👍 ⚠️ (research) ⚠️ Expressive prosody experiments

Want Next?

I can give you: ✅ A local HTTP API for Qwen3-TTS ✅ A combined XTTS + Qwen3 voice switcher ✅ A benchmark script vs ElevenLabs (latency + quality) ✅ A prompt-to-speech orchestrator for multimodal agents

Just pick one 😈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment