Skip to content

Instantly share code, notes, and snippets.

@TapGhoul
Last active August 18, 2025 06:40
Show Gist options
  • Select an option

  • Save TapGhoul/bc7f0d9095aebbc2ef6f766ee1a56586 to your computer and use it in GitHub Desktop.

Select an option

Save TapGhoul/bc7f0d9095aebbc2ef6f766ee1a56586 to your computer and use it in GitHub Desktop.
QOA Udon Decoder

QOA Udon Decoder

This is an implementation of a modification of the QOA codec, designed for use in Udon for VRChat. You can learn about QOA at https://qoaformat.org/

The purpose of this project is that you cannot runtime decode compressed audio in VRChat with standard codecs loaded via methods such as StringLoader, so this allows you to do so.

It has a few differences from the standard implementation - namely, the encoding is hard-coded, all headers have been stripped, and all encoding of data has been switched from big endian to little endian for improved decode performance.

To modify the existing QOA encoder at https://github.com/phoboslab/qoa/blob/master/qoa.h to be compatible with this decoder implementation, you must:

  • Modify qoa_write_u64 to write little-endian instead of big-endian.
  • Remove the call to qoa_write_u64 inside qoa_encode_frame that writes the frame header, as commented in the code
  • Remove the call to qoa_encode_header from qoa_encode, and replace its assignment to p to just be a constant 0.

Editing the read side will not be documented here, but it's a similar set of changes/replacements to fill in the missing pieces.

These changes will encode in a format compatible with the attached decoder. You can also postprocess the file in a way that applies these changes - offsets are all predictable as the encoding is operating over a consistent structure, so converting in post is not difficult.

You may want to re-add the headers for other use cases - in my case I wanted to stream this data over the network, so hard-coding everything and expecting all data to be continuous was more applicable to my needs. Additionally, the requirement of 256 chunks per frame is not actually a hard requirement of the format - this was just for checking for my bandwidth-optimized cut down version of the format. You can also derive this value from a file header - you can read up on how all of this works both in the header and parts of it at https://qoaformat.org/qoa-specification.pdf (it's only 1 page long!)

There are various improvements that were planned, the most important of which being allowing for decode of a single audio frame (5120 samples) across multiple game frames, but this was never implemented as the initial intent - to work with MIDI - was scrapped before completion. The codec as is does work, but if you wish to use it practically I would advise adding this improvement yourself.

Trying to encode audio in Udon is too expensive as it requires running the codec 16x as many times and with more ops, including a divide with round away from 0, which is going to be too expensive to perform in U#. But you could probably implement the encode in a shader, or perhaps in the upcoming Soba runtime.

An example of this codec can be found at https://vrchat.com/home/world/wrld_d191101b-e5e9-46cb-bddb-89b60506f484 using the song Tom. (HaTom) - With You Instrumental encoded at 25000 Hz Mono. The original PCM data was 7.9MB, the data included is down at 1.6MB - an approximately 4.9:1 compression ratio.

The codec itself can handle far higher bitrates and far more channels, but the format used in this example was chosen due to its approx. 10 kB/s (bytes, not bits) bitrate, which plays nicely with Udon netcode (if hitting near its cap)

This entire project was sparked from seeing https://discord.com/channels/189511567539306508/657394924433571870/1381777755729039450 from the VRChat official discord - I wanted to get MIDI working here too, but found it too unstable for my use case and ran out of steam on this project. Much credit to occala and KitKat for the help and encouragement when talking MIDI and such!

2025-07-05.15-54-33-faststart.mp4
#if UNITY_EDITOR && !COMPILER_UDONSHARP
using UnityEditor;
using UdonSharpEditor;
#endif
using System;
using System.Diagnostics;
using UdonSharp;
using UnityEngine;
using Debug = UnityEngine.Debug;
// ReSharper disable once CheckNamespace
namespace TapGhoul.QOA
{
[UdonBehaviourSyncMode(BehaviourSyncMode.None)]
// ReSharper disable once InconsistentNaming
public class QOAUdonDecoder : UdonSharpBehaviour
{
private const int BYTES_PER_FRAME = 256 * 8 + 16;
private const int SAMPLES_PER_FRAME = 256 * 20;
private const int FRAME_DURATION_MILLIS = SAMPLES_PER_FRAME * 1000 / 25000;
[SerializeField] public byte[] qoaData = new byte[0];
// ReSharper disable once InconsistentNaming
private readonly int[][] DEQUANT_TABLE =
{
new[] { 1, -1, 3, -3, 5, -5, 7, -7 },
new[] { 5, -5, 18, -18, 32, -32, 49, -49 },
new[] { 16, -16, 53, -53, 95, -95, 147, -147 },
new[] { 34, -34, 113, -113, 203, -203, 315, -315 },
new[] { 63, -63, 210, -210, 378, -378, 588, -588 },
new[] { 104, -104, 345, -345, 621, -621, 966, -966 },
new[] { 158, -158, 528, -528, 950, -950, 1477, -1477 },
new[] { 228, -228, 760, -760, 1368, -1368, 2128, -2128 },
new[] { 316, -316, 1053, -1053, 1895, -1895, 2947, -2947 },
new[] { 422, -422, 1405, -1405, 2529, -2529, 3934, -3934 },
new[] { 548, -548, 1828, -1828, 3290, -3290, 5117, -5117 },
new[] { 696, -696, 2320, -2320, 4176, -4176, 6496, -6496 },
new[] { 868, -868, 2893, -2893, 5207, -5207, 8099, -8099 },
new[] { 1064, -1064, 3548, -3548, 6386, -6386, 9933, -9933 },
new[] { 1286, -1286, 4288, -4288, 7718, -7718, 12005, -12005 },
new[] { 1536, -1536, 5120, -5120, 9216, -9216, 14336, -14336 }
};
private AudioSource _audioSource;
private AudioClip _clip;
private byte[] _frameData;
private int _frameIdx;
private Stopwatch _elapsedTime;
private void Start()
{
_elapsedTime = new Stopwatch();
_elapsedTime.Start();
_audioSource = GetComponent<AudioSource>();
var samples = qoaData.Length / BYTES_PER_FRAME * SAMPLES_PER_FRAME;
_clip = AudioClip.Create("clip1", samples, 1, 25000, false);
_frameData = new byte[BYTES_PER_FRAME];
_audioSource.clip = _clip;
_audioSource.loop = true;
_audioSource.Play();
}
private void Update()
{
if (_elapsedTime.ElapsedMilliseconds < _frameIdx * FRAME_DURATION_MILLIS) return;
var frames = qoaData.Length / BYTES_PER_FRAME;
if (_frameIdx < frames)
{
Array.Copy(qoaData, _frameIdx * BYTES_PER_FRAME, _frameData, 0, _frameData.Length);
var sw = Stopwatch.StartNew();
var data = DecodeFrame();
sw.Stop();
Debug.Log($"Took {sw.ElapsedMilliseconds} ms ({sw.ElapsedTicks} ticks)");
_clip.SetData(data, _frameIdx * SAMPLES_PER_FRAME);
_frameIdx++;
}
}
private float[] DecodeFrame()
{
if (_frameData.Length != BYTES_PER_FRAME)
{
Debug.LogWarning($"Invalid frame length: expected 2064 bytes, got {_frameData.Length}");
return null;
}
float[] samples = new float[256 * 20];
// While we operate in short, all ops except for updating the LMS work perfectly in 32-bit.
int history0 = BitConverter.ToInt16(_frameData, 6);
int history1 = BitConverter.ToInt16(_frameData, 4);
int history2 = BitConverter.ToInt16(_frameData, 2);
int history3 = BitConverter.ToInt16(_frameData, 0);
int weights0 = BitConverter.ToInt16(_frameData, 14);
int weights1 = BitConverter.ToInt16(_frameData, 12);
int weights2 = BitConverter.ToInt16(_frameData, 10);
int weights3 = BitConverter.ToInt16(_frameData, 8);
var sampleIdx = 0;
for (var readIdx = 16; readIdx < BYTES_PER_FRAME; readIdx += 8)
{
ulong slice = BitConverter.ToUInt64(_frameData, readIdx);
int scaleFactor = (int)((slice >> 60) & 0xf);
int[] residualLut = DEQUANT_TABLE[scaleFactor];
for (var bitOffset = 64 - 7; bitOffset >= 0; bitOffset -= 3)
{
// Perform LMS prediction
int prediction = history0 * weights0;
prediction += history1 * weights1;
prediction += history2 * weights2;
prediction += history3 * weights3;
prediction >>= 13;
// Get quantized residual
int residual = (int)((slice >> bitOffset) & 0x7);
int dequantized = residualLut[residual];
var reconstructed = Mathf.Clamp(prediction + dequantized, -32768, 32767);
samples[sampleIdx++] = reconstructed / 32768f;
// Update LMS predictor
int delta = dequantized >> 4;
int negDelta = -delta;
weights0 += history0 < 0 ? negDelta : delta;
weights1 += history1 < 0 ? negDelta : delta;
weights2 += history2 < 0 ? negDelta : delta;
weights3 += history3 < 0 ? negDelta : delta;
history0 = history1;
history1 = history2;
history2 = history3;
history3 = reconstructed;
}
}
return samples;
}
}
#if UNITY_EDITOR && !COMPILER_UDONSHARP
[CustomEditor(typeof(QOAUdonDecoder))]
// ReSharper disable once InconsistentNaming
public class QOAUdonDecoderEditor : Editor
{
public override void OnInspectorGUI()
{
if (UdonSharpGUI.DrawDefaultUdonSharpBehaviourHeader(target, false, false)) return;
// base.OnInspectorGUI();
var items = ((QOAUdonDecoder)target).qoaData.Length;
GUI.enabled = false;
EditorGUILayout.TextField("Buffer Size", $"{items}");
GUI.enabled = true;
if (GUILayout.Button("Read Data"))
{
var data = AssetDatabase.LoadAssetAtPath<TextAsset>("Assets/qoatest.txt");
((QOAUdonDecoder)target).qoaData = data.bytes;
Debug.Log("Read");
}
}
}
#endif
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment