Doug Smith dougbtv

DeepSeek V4 Pro: Build, Run, and Smoke Test Guide

Overview

This document covers how to build, deploy, and test the deepseek-ai/DeepSeek-V4-Pro model (1.6T total params, 49B active, FP4+FP8 mixed precision) using nm-vllm-ent (based on upstream vLLM v0.20.1rc0) on NVIDIA B200 GPUs.

Build Information

Field	Value

Poolside Laguna: build, run, and smoke test howto

Poolside Laguna Howto

Build Info

Field	Value
Model	Poolside Laguna (model card TBD) — poolside.ai
Image	`quay.io/vllm/rhaiis-early-access:poolside-laguna`

Home Network IPv6 Upgrade Path

Current Setup (as of 2026-04-22)

Desktop: Fedora 42 (Linux 6.17.11), eno1 on 192.168.50.198/24
Router: ASUS ZenWifi AX (192.168.50.1)
ISP: GMAVT / Green Mountain Access (AS12282), PPPoE connection
Public IP: 69.54.3.214 (pppoe-3.214.gmavt.net)
Location: Starksboro, VT

Gemma 4 31B: Build, Run, and Smoke Test Guide

Overview

This document covers how to build, deploy, and test the google/gemma-4-31B-it model using nm-vllm-ent (based on upstream vLLM v0.19.1) on NVIDIA A100 GPUs.

Build Information

Field	Value

Mistral-Small-4-119B Howto

Build Info

Field	Value
Model	mistralai/Mistral-Small-4-119B-2603
Image	`quay.io/vllm/rhaiis-early-access:mistral-4-small`
Build Run	24369571413
nm-cicd branch	`doug/mistral-4-small`

Running LTX-2 Video Generation with vLLM-Omni and ComfyUI

This guide shows you how to run LTX-2 video generation (text-to-video and image-to-video) using vLLM-Omni as the inference backend and ComfyUI as the frontend.

Background

LTX-2 is a powerful video generation model from Lightricks that supports both text-to-video (T2V) and image-to-video (I2V) generation with audio synthesis.

Resources:

LTX-2 GitHub: https://github.com/Lightricks/LTX-2 - Python stack for inference and LoRA training, model links

RHAII Preview: Voxtral Realtime

This guide covers running and trying out the Red Hat AI Inference Server to serve Mistral Voxtral-Mini-4B-Realtime-2602 model, powered by vLLM.

You can find the Voxtral Mini model card @ https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602

From the model card:

Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms. It supports 13 languages and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling.

RHAIIS Preview: NVIDIA Nemotron v3 (Nano 30B-A3B) on Red Hat AI Inference Server

This is a technical quick-start gist for the latest Red Hat AI Inference Server (RHAIIS) preview image, featuring NVIDIA Nemotron v3 Nano 30B-A3B models on vLLM.

Preview image tag (this release):

registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:nvidia-nemotron-v3

Upstream model family (Hugging Face):

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Using this dummy CNI script...

Pay attention to the cniresult() routine, which returns... two interfaces.

#!/usr/bin/env bash

# DEBUG=true
# LOGFILE=/tmp/seamless.log

Enable the reconciler...

oc edit networks.operator.openshift.io cluster and add the additionalNetworks section like:

  additionalNetworks:
  - name: whereabouts-shim
    namespace: openshift-multus
    rawCNIConfig: |-
      {