This is an implementation of a modification of the QOA codec, designed for use in Udon for VRChat. You can learn about QOA at https://qoaformat.org/
The purpose of this project is that you cannot runtime decode compressed audio in VRChat with standard codecs loaded via methods such as StringLoader, so this allows you to do so.
It has a few differences from the standard implementation - namely, the encoding is hard-coded, all headers have been stripped, and all encoding of data has been switched from big endian to little endian for improved decode performance.
To modify the existing QOA encoder at https://github.com/phoboslab/qoa/blob/master/qoa.h to be compatible with this decoder implementation, you must:
- Modify
qoa_write_u64to write little-endian instead of big-endian. - Remove the call to
qoa_write_u64insideqoa_encode_framethat writes the frame header, as commented in the code - Remove the call to
qoa_encode_headerfromqoa_encode, and replace its assignment topto just be a constant 0.
Editing the read side will not be documented here, but it's a similar set of changes/replacements to fill in the missing pieces.
These changes will encode in a format compatible with the attached decoder. You can also postprocess the file in a way that applies these changes - offsets are all predictable as the encoding is operating over a consistent structure, so converting in post is not difficult.
You may want to re-add the headers for other use cases - in my case I wanted to stream this data over the network, so hard-coding everything and expecting all data to be continuous was more applicable to my needs. Additionally, the requirement of 256 chunks per frame is not actually a hard requirement of the format - this was just for checking for my bandwidth-optimized cut down version of the format. You can also derive this value from a file header - you can read up on how all of this works both in the header and parts of it at https://qoaformat.org/qoa-specification.pdf (it's only 1 page long!)
There are various improvements that were planned, the most important of which being allowing for decode of a single audio frame (5120 samples) across multiple game frames, but this was never implemented as the initial intent - to work with MIDI - was scrapped before completion. The codec as is does work, but if you wish to use it practically I would advise adding this improvement yourself.
Trying to encode audio in Udon is too expensive as it requires running the codec 16x as many times and with more ops, including a divide with round away from 0, which is going to be too expensive to perform in U#. But you could probably implement the encode in a shader, or perhaps in the upcoming Soba runtime.
An example of this codec can be found at https://vrchat.com/home/world/wrld_d191101b-e5e9-46cb-bddb-89b60506f484 using the song Tom. (HaTom) - With You Instrumental encoded at 25000 Hz Mono. The original PCM data was 7.9MB, the data included is down at 1.6MB - an approximately 4.9:1 compression ratio.
The codec itself can handle far higher bitrates and far more channels, but the format used in this example was chosen due to its approx. 10 kB/s (bytes, not bits) bitrate, which plays nicely with Udon netcode (if hitting near its cap)
This entire project was sparked from seeing https://discord.com/channels/189511567539306508/657394924433571870/1381777755729039450 from the VRChat official discord - I wanted to get MIDI working here too, but found it too unstable for my use case and ran out of steam on this project. Much credit to occala and KitKat for the help and encouragement when talking MIDI and such!