This document covers how to build, deploy, and test the deepseek-ai/DeepSeek-V4-Pro model (1.6T total params, 49B active, FP4+FP8 mixed precision) using nm-vllm-ent (based on upstream vLLM v0.20.1rc0) on NVIDIA B200 GPUs.
| Field | Value |
|---|
This document covers how to build, deploy, and test the deepseek-ai/DeepSeek-V4-Pro model (1.6T total params, 49B active, FP4+FP8 mixed precision) using nm-vllm-ent (based on upstream vLLM v0.20.1rc0) on NVIDIA B200 GPUs.
| Field | Value |
|---|
Poolside Laguna: build, run, and smoke test howto
| Field | Value |
|---|---|
| Model | Poolside Laguna (model card TBD) — poolside.ai |
| Image | quay.io/vllm/rhaiis-early-access:poolside-laguna |
| Field | Value |
|---|---|
| Model | mistralai/Mistral-Small-4-119B-2603 |
| Image | quay.io/vllm/rhaiis-early-access:mistral-4-small |
| Build Run | 24369571413 |
| nm-cicd branch | doug/mistral-4-small |
This guide shows you how to run LTX-2 video generation (text-to-video and image-to-video) using vLLM-Omni as the inference backend and ComfyUI as the frontend.
LTX-2 is a powerful video generation model from Lightricks that supports both text-to-video (T2V) and image-to-video (I2V) generation with audio synthesis.
Resources:
This guide covers running and trying out the Red Hat AI Inference Server to serve Mistral Voxtral-Mini-4B-Realtime-2602 model, powered by vLLM.
You can find the Voxtral Mini model card @ https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602
From the model card:
Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms. It supports 13 languages and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling.
This is a technical quick-start gist for the latest Red Hat AI Inference Server (RHAIIS) preview image, featuring NVIDIA Nemotron v3 Nano 30B-A3B models on vLLM.
Preview image tag (this release):
registry.redhat.io/rhaiis-preview/vllm-cuda-rhel9:nvidia-nemotron-v3Upstream model family (Hugging Face):
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16Using this dummy CNI script...
Pay attention to the cniresult() routine, which returns... two interfaces.
#!/usr/bin/env bash
# DEBUG=true
# LOGFILE=/tmp/seamless.log
Enable the reconciler...
oc edit networks.operator.openshift.io cluster and add the additionalNetworks section like:
additionalNetworks:
- name: whereabouts-shim
namespace: openshift-multus
rawCNIConfig: |-
{