Skip to content

Instantly share code, notes, and snippets.

View laurelkeys's full-sized avatar
👋
💻+🎨=😊

Tiago Chaves laurelkeys

👋
💻+🎨=😊
View GitHub Profile

Other posts

Lessons from Hash Table Merging

Merging two hash maps seems like an O(N) operation. However, while merging millions of keys, I encountered a massive >10x performance degradation unexpectedly. This post explores why some of the most popular libraries fall into this trap and how to fix it. The source code is available here.

@tahatorabpour
tahatorabpour / README.md
Last active April 18, 2026 21:07
Blender, high performance Multithreaded Exporter (Rift exporter)

Article:

https://lotusspring.substack.com

Disclaimer!

This code was written without the intention of being publicly shared. Not much effort was put into beautification or anything like that, one big file that does it all! Some effort is requried on your part to make this compile.

Python Disclaimer!

I heavily dislike python and consider the code wasteful slop. I have very little python experience, so there are likely much better ways of writing the python portion. Exercise caution!

@castano
castano / recursive_gaussian.cc
Created June 28, 2025 00:37
Recursive Implementation of the Gaussian Filter Using Truncated Cosine Functions
// Implements "Recursive Implementation of the Gaussian Filter Using Truncated Cosine Functions" by Charalampidis [2016].
// https://discovery.researcher.life/article/recursive-implementation-of-the-gaussian-filter-using-truncated-cosine-functions/dcf24675f5eb30dba93c5205cdae3c40
// This code is based on:
// https://github.com/cloudinary/ssimulacra2/blob/main/src/lib/jxl/gauss_blur.cc
// Copyright (c) the JPEG XL Project Authors. All rights reserved.
struct RecursiveGaussian {
RecursiveGaussian(float sigma);
float mul_in[3];
@arch1t3cht
arch1t3cht / video_noob_guide.md
Last active April 24, 2026 07:13
What you NEED to know before touching a video file

What you NEED to Know Before Touching a Video File

Hanging out in subtitling and video re-editing communities, I see my fair share of novice video editors and video encoders, and see plenty of them make the classic beginner mistakes when it comes to working with videos. A man can only read "Use Handbrake to convert your mkv to an mp4 :)" so many times before losing it, so I am writing this article to channel the resulting psychic damage into something productive.

If you are new to working with videos (or, let's face it, even if you aren't), please read through this guide to avoid making mistakes that can cost you lots of time, computing power, storage space, or video quality.

const State = struct {
clowns: StringHashMap(Clown) = .empty,
const Clown = struct {
scariness: f32,
funniness: f32,
};
fn deinit(state: *State, gpa: Allocator) void {
var it = state.clowns.iterator();
@castano
castano / SparkTextureOutput.md
Last active April 22, 2026 14:02
Writing to block compressed textures across different graphics APIs

Writing to Compressed Textures

In general it's not possible to use a block-compressed texture as a render target or as a compute shader output. Instead you have to either: Alias the block compressed texture with an uncompressed texture where each texel corresponds to a block, or to output the compressed blocks to an uncompressed texture buffer, and then copy the compressed blocks from that intermediate memory location to the final compressed texture.

Each of the graphics APIs expose this functionality in a different way. This document explains the options available under the following APIs:

  1. Every atomic object has a timeline (TL) of writes:

    • A write is either a store or a read-modify-write (RMW): it read latest write & pushed new one.
    • A write is either tagged Relaxed, Release, or SeqCst.
    • A read observes some write on the timeline:
      • On the same thread, future reads can't go backwards on the timeline.
      • A read is either tagged Relaxed, Acquire, or SeqCst.
      • RMWs can also be tagged Acquire (or AcqRel). If so, the Acquire refers to the "read" portion of "RMW".
  2. Each thread has its own view of the world:

  • Shared write timelines but each thread could be reading at different points.
@Marc-B-Reynolds
Marc-B-Reynolds / test.md.html
Last active November 13, 2024 04:06
Single file: markdeep + plotly that mostly works

Header

Some bold text before an inline function: $y = x^2$

NOTE: The gist preview is completely whacked. Click on raw for the source.

@raphlinus
raphlinus / simd_reduce_test.rs
Last active April 12, 2025 20:33
Comparison of scalar and SIMD max reduction
// run with `RUSTFLAGS='-C target-cpu=native' cargo +nightly bench`
#![feature(test)]
fn main() {
let mut a = [0u32; 65536];
a[1] = 42;
println!("{}", scalar_max(&a));
println!("{}", avx2_max(&a));
}
@Ipotrick
Ipotrick / Efficient GPU Work Expansion.md
Last active July 11, 2025 16:13
Efficient GPU Work Expansion

What is "Work Expansion"

In a GPU-driven renderer, "work expansion" is a commonly occurring problem. "Work Expansion" means that a single item of work spawns N following work items. Typically one work item will be executed by one shader thread/invocation.

An example for work expansion is gpu driven meshlet culling following mesh culling. In this example a "work item" is culling a mesh, where each mesh cull work item spawns N following meshlet cull work items.

There are many diverse cases of this problem and many solutions. Some are trivial to solve, for example, when N (how many work items are spawned) is fixed.