Skip to content

Instantly share code, notes, and snippets.

@changtimwu
Created March 24, 2026 09:53
Show Gist options
  • Select an option

  • Save changtimwu/4948f56c7f46985056731f18dc5ad781 to your computer and use it in GitHub Desktop.

Select an option

Save changtimwu/4948f56c7f46985056731f18dc5ad781 to your computer and use it in GitHub Desktop.
NemoClaw / OpenShell: Jetson Orin Nano known issues and official NV workaround guide

NemoClaw / OpenShell Issues: Jetson Orin Nano Platform

Tracked issues relevant to the Jetson Orin Nano (and related AGX Thor/Orin) platform. All issues are currently OPEN.


NemoClaw #404 — GPU detection fails + k3s panics on iptables (Orin Nano Specific)

Labels: bug, Platform: AGX Thor/Orin Author: realkim93 (Ryeol.Kim)

Two blockers on Jetson Orin Nano Super (8GB, JetPack 6.x, kernel 5.15.148-tegra):

  1. GPU not detectednvidia-smi --query-gpu=memory.total returns [N/A] on unified memory devices; detectGpu() returns null and preflight reports "No GPU detected".
  2. k3s panic on startup — OpenShell gateway ships iptables v1.8.10 (nf_tables mode), but Tegra kernel lacks nft_chain_filter and related modules:
    iptables v1.8.10 (nf_tables): RULE_INSERT failed (No such file or directory): rule in chain FORWARD
    panic: ...network_policy_controller.go:412
    
  3. Default model too largenemotron-3-nano:30b does not fit in 8GB unified memory.

NemoClaw #539 — OpenShell gateway crashes: missing nf_tables NAT kernel modules

Labels: bug, Platform: AGX Thor/Orin, status: triage Author: garnetsoft

Onboarding fails at step [2/7] Starting OpenShell gateway because k3s uses iptables-nft but the Jetson kernel has no nft_* modules at all.

Error:

E0320 Failed to bootstrap IPTables: failed to apply partial iptables-restore
Warning: Extension MASQUERADE revision 0 not supported, missing kernel module?
iptables v1.8.10 (nf_tables): CHAIN_ADD failed (No such file or directory)

Root cause: find /lib/modules/5.15.185-tegra -name "nft*" returns nothing. The host uses iptables-legacy but the container uses nf_tables.

Suggested fix: OpenShell gateway should detect missing nf_tables support and fall back to iptables-legacy, similar to the cgroup v2 fix in spark-install.md. A dedicated Jetson setup path (like setup-spark) would also address this.

Workaround: None — requires kernel recompile or patching the gateway container image.

Environment:

  • Device: Jetson Orin Nano (Engineering Reference Developer Kit)
  • Kernel: 5.15.185-tegra (aarch64)
  • OS: Ubuntu 22.04.5 LTS
  • Docker: 29.3.0, NemoClaw: 0.1.0, OpenShell CLI: 0.0.12

NemoClaw #300 — GPU detection fails on Jetson Thor and Orin

Labels: Platform: AGX Thor/Orin Author: dfry-lhzn (Daniel Fry)

detectGpu() in nemoclaw/bin/lib/nim.js only checks for "GB10" (Grace Blackwell), missing all Jetson-class hardware.

nvidia-smi output on Orin:

$ nvidia-smi --query-gpu=name --format=csv,noheader,nounits
Orin (nvgpu)

Offending code (nim.js:52):

if (nameOutput && nameOutput.includes("GB10")) {

Proposed fix:

if (nameOutput && (nameOutput.includes("GB10") || nameOutput.includes("Thor") || nameOutput.includes("Orin"))) {

NemoClaw #511 — Installation fails on Jetson AGX (aarch64): npm errors + k3s networking

Labels: bug, Platform: AGX Thor/Orin, status: triage Author: rmegi (Roy Megidish)

Multiple failure stages on Jetson AGX (Tegra kernel 5.15.136-tegra, aarch64):

  1. npm install errors:

    ENOTEMPTY: directory not empty, rename '/usr/lib/node_modules/nemoclaw'
    npm ERR! enoent spawn sh ENOENT
    

    Missing files: openclaw/dist, openclaw/extensions

  2. Gateway fails:

    K8s namespace not ready — timed out waiting for namespace 'openshell'
    RULE_APPEND failed (No such file or directory)
    iptables v1.8.10 (nf_tables) — missing kernel module?
    
  3. k3s pods crash: metrics-server and local-path-provisioner in CrashLoopBackOff.

Environment: Ubuntu 22.04, aarch64, Node.js v22.22.1, npm 10.9.4, Docker 27.5.1


NemoClaw #360 — Install issues on Jetson Thor (multiple)

Labels: Docker, Platform: AGX Thor/Orin Author: theguybieber

Checklist of install issues encountered on Jetson Thor:

  1. Installer curl command includes a spurious $ — copy-paste fails.
  2. Docker must be enabled for the installing user.
  3. openshell not added to PATH after install.
  4. Port conflict check needs sudo (lsof -i :8080).
  5. Preflight GPU detection fails: "No GPU detected — will use cloud inference".
  6. Install hangs at [3/7] Creating sandbox with a custom sandbox name.
  7. nemoclaw onboard fails with custom sandbox names.
  8. nemoclaw start requires an NVIDIA API key even in local-inference mode.

NemoClaw #65 — Feature request: Jetson Nano support

Labels: Platform: AGX Thor/Orin Author: prafsoni

Request for official Jetson Nano support, with particular interest in resource-aware local model loading for the 8GB unified memory constraint.


OpenShell Issues (upstream root cause for gateway blocker)

OpenShell #407 — k3s gateway fails on Jetson Orin Nano: iptables nf_tables / legacy conflict

Labels: state:triage-needed, topic:compatibility Author: guinava — 7 comments

This is the upstream root cause of the gateway crash seen in NemoClaw #404 and #539. The conflict is unresolvable at the NemoClaw level — it requires a fix in the OpenShell gateway container image.

The trap: the iptables state is broken in both possible configurations on Jetson:

Host nf_tables state Result
Loaded (default) kube-router panics: RULE_INSERT failed — nf_tables extensions in container conflict with legacy host tables
Blacklisted kube-proxy exits: Could not fetch rule set generation id: Invalid argument — container's iptables v1.8.10 (nf_tables) still calls nf_tables which is now absent

Root cause: Gateway container bundles iptables v1.8.10 (nf_tables). On L4T/Tegra, the host kernel nat table is in legacy mode. The container's nf_tables iptables cannot read or write legacy tables regardless of host module state.

Attempted workarounds (all failed):

  • update-alternatives --set iptables /usr/sbin/iptables-legacy
  • Blacklisting nf_tables + reboot
  • Loading all iptable_* and xt_* legacy modules manually
  • nemoclaw setup-spark (configures cgroupns=host)

Required fix (in OpenShell gateway container):

  • Bundle iptables-legacy in the cluster image, OR
  • Use --prefer-bundled-bin k3s arg with legacy mode, OR
  • Add entrypoint logic: run update-alternatives --set iptables /usr/sbin/iptables-legacy before exec k3s when legacy tables are detected on host

Environment: Jetson Orin Nano, Ubuntu 22.04, kernel 5.15.148-tegra, Docker 28.x, OpenShell v0.0.7, gateway image ghcr.io/nvidia/openshell/cluster:0.0.8


OpenShell #467 — Gateway fails on Jetson Orin / L4T: kube-proxy exits with "Could not fetch rule set generation id"

Labels: (none yet) Author: tangc3022-hub — 1 comment

Independent report of the same issue, with additional context about host network mode being required on Jetson due to missing veth / Docker NAT limitations.

Key finding from investigation:

  • On Orin, Docker cannot use default bridge/NAT (veth missing), so the gateway must use network_mode: host
  • With host networking, the container's iptables-nft invokes the same kernel as the host — which only has legacy nat tables → immediate failure

Mitigations implemented in reporter's fork:

  1. Pass --prefer-bundled-bin to k3s server args (uses k3s's own bundled iptables)
  2. In cluster image entrypoint, before exec k3s: update-alternatives --set iptables /usr/sbin/iptables-legacy
  3. Improved error reporting to not misclassify this as a "Network connectivity issue"

Environment: Jetson Orin (L4T / tegra-ubuntu), Docker Engine, OpenShell v0.x.x, host uses iptables-legacy


Common Root Causes

Issue Root Cause Repo
GPU not detected detectGpu() in nim.js:52 only checks "GB10", missing "Orin" and "Thor" NemoClaw
Gateway crash Container bundles iptables v1.8.10 (nf_tables); Tegra kernel only supports legacy iptables OpenShell
Default model OOM nemotron-3-nano:30b too large for 8GB unified memory NemoClaw
npm install failure Possible aarch64 package or file-system issue during install NemoClaw

Workarounds (Until Fixed)

  • GPU detection: None upstream — NemoClaw falls back to cloud inference. The fix in nim.js:52 is simple and proposed in NemoClaw #300.
  • iptables/k3s (gateway crash): See the official NV workaround guide below.
  • Model size: Manually select a smaller model during onboarding (e.g., a model that fits in 8GB).

Official NVIDIA Workaround Guide

Source: NVIDIA Developer Forums — NemoClaw + OpenShell: Jetson AGX Orin, Orin Super, Nano, and NX

Target: Jetson AGX Orin, Orin Super, Orin Nano, Orin NX Tested versions: JetPack 6.2.2 (L4T R36.5.0), Ubuntu 22.04.5 LTS, OpenShell 0.0.13, Node.js v22.22.1

Three issues must be addressed before running the NemoClaw installer:

# Issue Impact
1 iptables nf_tables incompatibility k3s panics, gateway never starts
2 Missing br_netfilter kernel module DNS resolution fails inside sandbox
3 Ollama bound to 127.0.0.1 only Containers can't reach local Ollama

Prerequisites

Install Docker (if not already installed):

sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Install NVIDIA Container Toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Set NVIDIA as default Docker runtime — edit /etc/docker/daemon.json:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Add user to docker group:

sudo usermod -aG docker $USER
newgrp docker

Step 1 — Load br_netfilter kernel module

sudo modprobe br_netfilter
sudo sysctl -w net.bridge.bridge-nf-call-iptables=1

Make persistent across reboots:

echo br_netfilter | sudo tee /etc/modules-load.d/k8s.conf
echo 'net.bridge.bridge-nf-call-iptables = 1' | sudo tee /etc/sysctl.d/k8s.conf

Step 2 — Patch the OpenShell cluster image to use iptables-legacy

This is the core fix for the gateway crash. It patches the container image in-place so k3s uses legacy iptables instead of nf_tables.

IMAGE_NAME="ghcr.io/nvidia/openshell/cluster:0.0.13"

docker run --entrypoint sh --name fix-iptables "$IMAGE_NAME" -c '
  update-alternatives --set iptables /usr/sbin/iptables-legacy
  update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
  ln -sf /usr/sbin/iptables-legacy /usr/sbin/iptables
  ln -sf /usr/sbin/ip6tables-legacy /usr/sbin/ip6tables
  iptables --version
'

docker commit \
  --change 'ENTRYPOINT ["/usr/local/bin/cluster-entrypoint.sh"]' \
  fix-iptables "$IMAGE_NAME"

docker rm fix-iptables

The iptables --version output should show (legacy) to confirm the patch worked.

Note: Update IMAGE_NAME to match the cluster image version pulled by your OpenShell version.


Step 3 — Configure Ollama to bind on all interfaces (if using local Ollama)

sudo mkdir -p /etc/systemd/system/ollama.service.d
echo -e '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0:11434"' \
  | sudo tee /etc/systemd/system/ollama.service.d/override.conf
sudo systemctl daemon-reload
sudo systemctl restart ollama

Verify Ollama is listening on all interfaces:

ss -tlnp | grep 11434

Step 4 — Run the NemoClaw installer

curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash

During onboarding, select the local Ollama option.


Verification & Troubleshooting

Check sandbox pod health:

docker exec openshell-cluster-nemoclaw kubectl get pods -n openshell

Verify iptables version inside the container (should show legacy):

docker exec openshell-cluster-nemoclaw iptables --version

Confirm br_netfilter is loaded:

lsmod | grep br_netfilter
cat /proc/sys/net/bridge/bridge-nf-call-iptables

Test DNS resolution inside the cluster:

docker exec openshell-cluster-nemoclaw kubectl run dns-test \
  --namespace=openshell \
  --image=rancher/mirrored-library-busybox:1.37.0 \
  --restart=Never \
  -- nslookup openshell.openshell.svc.cluster.local
sleep 10
docker exec openshell-cluster-nemoclaw kubectl logs -n openshell dns-test
docker exec openshell-cluster-nemoclaw kubectl delete pod -n openshell dns-test

Clean up / fresh start:

source ~/.bashrc
nemoclaw destroy 2>/dev/null
docker rm -f openshell-cluster-nemoclaw 2>/dev/null
docker volume prune -f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment