Skip to content

Instantly share code, notes, and snippets.

Finally used OpenAI's Deep Research for the first time

Building a Scalable AI Fine-Tuning Cluster with AMD Ryzen AI Max+ 395

This report outlines a model-independent framework for fine-tuning large AI models on a cluster of AMD Ryzen AI Max+ 395 nodes. The design supports a minimum of two nodes and scales to much larger deployments. We focus on optimizing fine-tuning efficiency using the XDNA 2 neural processing unit (NPU) in these chips, while keeping the setup accessible to developers of open-source AI models. Key areas include architecture and low-level optimizations, model splitting strategies, network and data throughput tuning, alternative computation models, and continuous benchmarking for improvements.

1. Architecture Analysis & Low-Level Optimization

XDNA 2 NPU vs CPU/GPU: AMD’s XDNA 2 NPU (built into Ryzen AI Max chips) is a specialized spatial dataflow engine optimized for AI workloads. It consists of a 2D array of compute tiles with a flexible interconnect and on-chip SRAM

@SwarmStorm
SwarmStorm / README.md
Created June 21, 2023 05:01 — forked from mukhortov/README.md
Migrate GitHub issues with comments and labels to Jira

Migrate GitHub issues with comments and labels to Jira

Set up the environment

  1. Instal Deno. It's a secure runtime for JavaScript and TypeScript.
  2. Generate a personal access token for GitHub. And save it to ~/.github-oauth. You can use the following command to save it:
    echo generated_github_personal_access_token_goes_here > ~/.github-oauth