Skip to content

Instantly share code, notes, and snippets.

View anilkeshwani's full-sized avatar

Anil Keshwani anilkeshwani

View GitHub Profile
"""
The most atomic way to train and run inference for a GPT in pure, dependency-free Python.
This file is the complete algorithm.
Everything else is just efficiency.
@karpathy
"""
import os # os.path.exists
import math # math.log, math.exp
@karpathy
karpathy / add_to_zshrc.sh
Created August 25, 2024 20:43
Git Commit Message AI
# -----------------------------------------------------------------------------
# AI-powered Git Commit Function
# Copy paste this gist into your ~/.bashrc or ~/.zshrc to gain the `gcm` command. It:
# 1) gets the current staged changed diff
# 2) sends them to an LLM to write the git commit message
# 3) allows you to easily accept, edit, regenerate, cancel
# But - just read and edit the code however you like
# the `llm` CLI util is awesome, can get it here: https://llm.datasette.io/en/stable/
gcm() {
@devinschumacher
devinschumacher / cloud-gpus.md
Last active April 26, 2026 01:38
Cloud GPU Hosting // The Best Servers, Services & Providers [RANKED!]
title The Best Cloud GPU Providers for Artificial Intelligence & Machine Learning
tags
cloud gpu providers
cloud gpu
artificial intelligence

Cloud GPUs: Servers, Providers & Everything You Would Ever Need

@Narsil
Narsil / pure_torch.py
Created November 10, 2022 15:06
Loading a safetensors file with pure torch only
import mmap
import torch
import json
import os
from huggingface_hub import hf_hub_download
def load_file(filename, device):
with open(filename, mode="r", encoding="utf8") as file_obj:
with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m:
@zrruziev
zrruziev / NUMA node problem.md
Last active July 7, 2025 20:48
Fixing "successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero" problem

What is NUMA (Non-Uniformed Memory Access)

Non-Uniform Memory Access (NUMA) is one of the computer memory design methods used in multiprocessor systems, and the time to access the memory varies depending on the relative position between the memory and the processor. In the NUMA architecture, when a processor accesses its local memory, it is faster than when it accesses the remote memory. Remote memory refers to memory that is connected to another processor, and local memory refers to memory that is connected to its own processor. In other words, it is a technology to increase memory access efficiency while using multiple processors on one motherboard. When a specific processor runs out of memory, it monopolizes the bus by itself, so other processors have to play. , and designate 'access only here', and call it a NUMA node.

1. Check Nodes

lspci | grep -i nvidia
  
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 12GB] (rev a1)
@vzhong
vzhong / download_paper.py
Last active October 25, 2024 09:41
Download paper to Obsidian MD
#!/usr/bin/env python
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
import os
import re
import pathlib
import arxiv
import openreview
import urllib.request
@Guitaricet
Guitaricet / reproducibility.md
Last active September 2, 2025 08:26
Notes on reproducibility in PyTorch

Reproducibility

ML experiments may be very hard to reproduce. You have a lot of hyperparameters, different dataset splits, different ways to preprocess your data, bugs, etc. Ideally, you should log data split (already preprocessed), all hyperparameters (including learning rate scheduling), the initial state of your model and optimizer, random seeds used for initialization, dataset shuffling and all of your code. Your GPU is also should be in deterministic mode (which is not the default mode). For every single model run. This is a very hard task. Different random seed can significantly change your metrics and even GPU-induced randomness can be important. We're not solving all of these problems, but we need to address at least what we can handle.

For every result you report in the paper you need (at least) to:

  1. Track your model and optimizer hyperparameters (including learning rate schedule)
  2. Save final model parameters
  3. Report all of the parameters in the pap

C++ OOPS Concepts

The main aim of OOP is to bind together the data and the functions that operate on them so that no other part of the code can access this data except that function.

Characteristics of an Object Oriented Programming language

img