| title | The Best Cloud GPU Providers for Artificial Intelligence & Machine Learning | |||
|---|---|---|---|---|
| tags |
|
| """ | |
| The most atomic way to train and run inference for a GPT in pure, dependency-free Python. | |
| This file is the complete algorithm. | |
| Everything else is just efficiency. | |
| @karpathy | |
| """ | |
| import os # os.path.exists | |
| import math # math.log, math.exp |
| # ----------------------------------------------------------------------------- | |
| # AI-powered Git Commit Function | |
| # Copy paste this gist into your ~/.bashrc or ~/.zshrc to gain the `gcm` command. It: | |
| # 1) gets the current staged changed diff | |
| # 2) sends them to an LLM to write the git commit message | |
| # 3) allows you to easily accept, edit, regenerate, cancel | |
| # But - just read and edit the code however you like | |
| # the `llm` CLI util is awesome, can get it here: https://llm.datasette.io/en/stable/ | |
| gcm() { |
| import mmap | |
| import torch | |
| import json | |
| import os | |
| from huggingface_hub import hf_hub_download | |
| def load_file(filename, device): | |
| with open(filename, mode="r", encoding="utf8") as file_obj: | |
| with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m: |
Non-Uniform Memory Access (NUMA) is one of the computer memory design methods used in multiprocessor systems, and the time to access the memory varies depending on the relative position between the memory and the processor. In the NUMA architecture, when a processor accesses its local memory, it is faster than when it accesses the remote memory. Remote memory refers to memory that is connected to another processor, and local memory refers to memory that is connected to its own processor. In other words, it is a technology to increase memory access efficiency while using multiple processors on one motherboard. When a specific processor runs out of memory, it monopolizes the bus by itself, so other processors have to play. , and designate 'access only here', and call it a NUMA node.
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 12GB] (rev a1)| #!/usr/bin/env python | |
| from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter | |
| import os | |
| import re | |
| import pathlib | |
| import arxiv | |
| import openreview | |
| import urllib.request | |
ML experiments may be very hard to reproduce. You have a lot of hyperparameters, different dataset splits, different ways to preprocess your data, bugs, etc. Ideally, you should log data split (already preprocessed), all hyperparameters (including learning rate scheduling), the initial state of your model and optimizer, random seeds used for initialization, dataset shuffling and all of your code. Your GPU is also should be in deterministic mode (which is not the default mode). For every single model run. This is a very hard task. Different random seed can significantly change your metrics and even GPU-induced randomness can be important. We're not solving all of these problems, but we need to address at least what we can handle.
For every result you report in the paper you need (at least) to:
- Track your model and optimizer hyperparameters (including learning rate schedule)
- Save final model parameters
- Report all of the parameters in the pap
