Skip to content

Instantly share code, notes, and snippets.

@senstella
senstella / parakeet-nemo-to-mlx.py
Created May 6, 2025 14:04
A simple script to convert NeMo Parakeet weights to MLX.
import torch
from safetensors.torch import save_file
INPUT_NAME = "model_weights.ckpt"
OUTPUT_NAME = "model.safetensors"
state = torch.load(INPUT_NAME, map_location="cpu")
new_state = {}
for key, value in state.items():
@rain-1
rain-1 / llama-home.md
Last active March 1, 2026 16:35
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
@DavidBuchanan314
DavidBuchanan314 / widevine_fixup.py
Last active December 24, 2025 21:09
Patch aarch64 widevine blobs from ChromeOS to work on non-ChromeOS linux, including platforms with 16K page size like Apple Silicon / Asahi Linux
"""
MIT License
Copyright (c) 2023 David Buchanan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is