Merging two hash maps seems like an O(N) operation. However, while merging millions of keys, I encountered a massive >10x performance degradation unexpectedly. This post explores why some of the most popular libraries fall into this trap and how to fix it. The source code is available here.
https://lotusspring.substack.com
This code was written without the intention of being publicly shared. Not much effort was put into beautification or anything like that, one big file that does it all! Some effort is requried on your part to make this compile.
I heavily dislike python and consider the code wasteful slop. I have very little python experience, so there are likely much better ways of writing the python portion. Exercise caution!
| // Implements "Recursive Implementation of the Gaussian Filter Using Truncated Cosine Functions" by Charalampidis [2016]. | |
| // https://discovery.researcher.life/article/recursive-implementation-of-the-gaussian-filter-using-truncated-cosine-functions/dcf24675f5eb30dba93c5205cdae3c40 | |
| // This code is based on: | |
| // https://github.com/cloudinary/ssimulacra2/blob/main/src/lib/jxl/gauss_blur.cc | |
| // Copyright (c) the JPEG XL Project Authors. All rights reserved. | |
| struct RecursiveGaussian { | |
| RecursiveGaussian(float sigma); | |
| float mul_in[3]; |
Hanging out in subtitling and video re-editing communities, I see my fair share of novice video editors and video encoders, and see plenty of them make the classic beginner mistakes when it comes to working with videos. A man can only read "Use Handbrake to convert your mkv to an mp4 :)" so many times before losing it, so I am writing this article to channel the resulting psychic damage into something productive.
If you are new to working with videos (or, let's face it, even if you aren't), please read through this guide to avoid making mistakes that can cost you lots of time, computing power, storage space, or video quality.
| const State = struct { | |
| clowns: StringHashMap(Clown) = .empty, | |
| const Clown = struct { | |
| scariness: f32, | |
| funniness: f32, | |
| }; | |
| fn deinit(state: *State, gpa: Allocator) void { | |
| var it = state.clowns.iterator(); |
In general it's not possible to use a block-compressed texture as a render target or as a compute shader output. Instead you have to either: Alias the block compressed texture with an uncompressed texture where each texel corresponds to a block, or to output the compressed blocks to an uncompressed texture buffer, and then copy the compressed blocks from that intermediate memory location to the final compressed texture.
Each of the graphics APIs expose this functionality in a different way. This document explains the options available under the following APIs:
-
Every atomic object has a timeline (TL) of writes:
- A write is either a store or a read-modify-write (RMW): it read latest write & pushed new one.
- A write is either tagged Relaxed, Release, or SeqCst.
- A read observes some write on the timeline:
- On the same thread, future reads can't go backwards on the timeline.
- A read is either tagged Relaxed, Acquire, or SeqCst.
- RMWs can also be tagged Acquire (or AcqRel). If so, the Acquire refers to the "read" portion of "RMW".
-
Each thread has its own view of the world:
- Shared write timelines but each thread could be reading at different points.
| // run with `RUSTFLAGS='-C target-cpu=native' cargo +nightly bench` | |
| #![feature(test)] | |
| fn main() { | |
| let mut a = [0u32; 65536]; | |
| a[1] = 42; | |
| println!("{}", scalar_max(&a)); | |
| println!("{}", avx2_max(&a)); | |
| } |
In a GPU-driven renderer, "work expansion" is a commonly occurring problem. "Work Expansion" means that a single item of work spawns N following work items. Typically one work item will be executed by one shader thread/invocation.
An example for work expansion is gpu driven meshlet culling following mesh culling.
In this example a "work item" is culling a mesh, where each mesh cull work item spawns N following meshlet cull work items.
There are many diverse cases of this problem and many solutions. Some are trivial to solve, for example, when N (how many work items are spawned) is fixed.