Skip to content

Instantly share code, notes, and snippets.

@kev-bi
kev-bi / gist:b2b6d7abb2fb4c91c8955f0403e56996
Created March 16, 2026 22:18
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5post1 speculative decoding multiple MPI process OK
/app/tensorrt_llm# cat > config.yaml << 'EOF'
speculative_config:
decoding_type: Eagle3
max_draft_len: 4
speculative_model: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
kv_cache_config:
free_gpu_memory_fraction: 0.70
dtype: fp8
enable_block_reuse: false
trust_remote_code: true
@kev-bi
kev-bi / gist:0935ade82d40fa8daecd6636c26b095c
Created March 16, 2026 22:15
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.1 speculative decoding multiple MPI process crash
/workspace$ cat > config.yaml << 'EOF'
speculative_config:
decoding_type: Eagle3
max_draft_len: 4
speculative_model: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
kv_cache_config:
free_gpu_memory_fraction: 0.70
dtype: fp8
enable_block_reuse: false
trust_remote_code: true
@kev-bi
kev-bi / workload-identity-federation.md
Created January 12, 2025 06:11
GCP service account to AWS IAM role federation for terraform aws provider

GCP service account to AWS IAM role federation for aws terraform provider

On the off chance you have GCP workloads (i.g. GCE VMs or Cloud Run services) that need to apply terraform templates to create resources in an AWS account you can do so by fetching an identity token from the metadata server and using the token for workload identity federation.

Prereqs

This assumes your Cloud Run service or GCE VM already has a service account assigned.

Steps