Skip to content

Instantly share code, notes, and snippets.

View anilkeshwani's full-sized avatar

Anil Keshwani anilkeshwani

View GitHub Profile
@anilkeshwani
anilkeshwani / diff_jsons.sh
Created December 23, 2025 12:58
Bash: Diff JSONs after normalization with jq
#!/usr/bin/env bash
set -euo pipefail
files=(
"./path/to/yourfile.json"
".another/path/to/another/file.json"
"./toplevelfile.json"
)
for f in "${files[@]}"; do
import random
import matplotlib.pyplot as plt
def choose_from(size: int) -> int:
start, end = 0, size
dst = size
while dst > 1:
if random.getrandbits(1):
start += dst // 2 # choose rhs
else:
@anilkeshwani
anilkeshwani / extended_euclidean_algorithm.py
Created July 8, 2025 16:03
Euclidean Algorithm to find the GCD and Extended Euclidean Algorithm to find the GCD and Bézout coefficients
def euclidean_algorithm(a: int, b: int, verbose: bool = False) -> int:
if a < b:
a, b = b, a
while True:
a, b = b, a % b
if not b:
return a
if verbose:
print(f"{(a, b) = }")
@anilkeshwani
anilkeshwani / tmux_slurm_launcher_example.sh
Created May 23, 2025 13:35
Example Bash script to launch Slurm jobs in new tmux windows (detached from session) with Python
#!/usr/bin/env bash
for i in $(seq -f "%03g" 86 108); do
file="/mnt/scratch-artemis/anilkeshwani/mls/shards/mls-transcripts_uroman_${i}.jsonl"
echo "Launching ${file}"
tmux new-window -d -t mls -n "align_and_hubert_mls_$i" -- bash -c "
srun --partition a6000 --time=04:00:00 --qos=gpu-long --gres=gpu:1 \
/mnt/scratch-artemis/anilkeshwani/miniconda3/envs/sardalign/bin/python /mnt/scratch-artemis/anilkeshwani/speech-text-alignment/scripts/align_and_hubert_encode.py \
--jsonl \"${file}\" \
--ids-not-paths \
@anilkeshwani
anilkeshwani / compare_defaultdict_regular_dict_benchmark_tokenization.py
Created March 18, 2025 09:20
Benchmark performance of defaultdict vs regular dict with explicit key check for tokenization use case
import random
import string
import timeit
from collections import defaultdict
# Generate a random list of words (simulating a corpus)
random.seed(42)
words = ["".join(random.choices(string.ascii_lowercase, k=5)) for _ in range(100000)]
@anilkeshwani
anilkeshwani / get_pytorch_environment_info.py
Created March 11, 2025 13:06
Query PyTorch, CUDA and NVIDIA environment information
import platform
import subprocess
import sys
import numpy as np
import torch
# PyTorch info
print(f"PyTorch version: {torch.__version__}")
@anilkeshwani
anilkeshwani / show_hf_llama_3_2_3B_ckpt_structure.py
Last active November 26, 2024 16:03
Snippet showing the Llama 3.2 3B checkpoint structure (as an example of the splitting of models by tensor when saving checkpoints to Hugging Face repos; avoid exceeding 5GB max. even if this is not a limit)
#!/usr/bin/env python
"""
See: https://huggingface.co/docs/safetensors/en/index
"""
from pathlib import Path
from pprint import pp
from time import perf_counter
@anilkeshwani
anilkeshwani / quartz_github_workflows_deploy.yml
Created October 25, 2024 12:43
quartz/.github/workflows/deploy.yml - 2024-10-25 - Source: https://quartz.jzhao.xyz/hosting
name: Deploy Quartz site to GitHub Pages
on:
push:
branches:
- v4
permissions:
contents: read
pages: write
@anilkeshwani
anilkeshwani / salted.py
Last active October 19, 2024 21:26
Decoding output from hexdump (hexadecimal integers) and converting to binary
#!/usr/bin/env python
hexstr = "53 61 6c 74 65 64 5f 5f" # ef bf bd ef bf bd ef bf
hexstr = hexstr.replace(" ", "")
print(len(str(hexstr)))
print(bytes.fromhex(hexstr).decode("utf-8"))
decimal = int(hexstr, 16)