laurencecwj

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u

DeepSeek R1 Distill: Complete Tutorial for Deployment & Fine-Tuning

This guide shows how to deploy an uncensored DeepSeek R1 Distill model to Google Cloud Run with GPU support and how to perform a basic, functional fine-tuning process. The tutorial is split into:

Environment Setup
FastAPI Inference Server
Docker Configuration
Google Cloud Run Deployment
Fine-Tuning Pipeline (Cold Start, Reasoning RL, Data Collection, Final RL Phase)

-> <-

The Novice's LLM Training Guide

->Written by Alpin<- ->Inspired by /hdg/'s LoRA train rentry<- !!!warning This guide is being slowly updated. We've already moved to the axolotl trainer.

[TOC2]

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function
- Example: how do you calculate a metric to measure if the model’s output was funny?
You want to train with production data, but you can’t easily label your production data

	Cedille: A large autoregressive French language model
	The Wisdom of Hindsight Makes Language Models Better Instruction Followers
	ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks
	Query2doc: Query Expansion with Large Language Models
	The Internal State of an LLM Knows When its Lying
	Structured information extraction from complex scientific text with fine-tuned large language models
	TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
	Large Language Models Encode Clinical Knowledge
	PoET: A generative model of protein families as sequences-of-sequences
	Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

	#!/usr/bin/env python
	from __future__ import annotations
	import os
	from pathlib import Path
	from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
	import cv2
	import numpy as np
	import torch
	from tqdm import tqdm

	FROM rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4

	RUN mkdir /SD

	# Clone SD
	WORKDIR /SD
	RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
	WORKDIR /SD/stable-diffusion-webui
	RUN git reset --hard 22bcc7be428c94e9408f589966c2040187245d81