Skip to content

Instantly share code, notes, and snippets.

@arch1t3cht
Last active March 7, 2026 17:21
Show Gist options
  • Select an option

  • Save arch1t3cht/8cb1793beb72d30e334c7c46ff8c1080 to your computer and use it in GitHub Desktop.

Select an option

Save arch1t3cht/8cb1793beb72d30e334c7c46ff8c1080 to your computer and use it in GitHub Desktop.

Revisions

  1. arch1t3cht revised this gist Jan 11, 2026. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -708,7 +708,7 @@ To clarify: The statements in the following list are *true*, but are written to
    - Simplifying a drawing by removing invisible contours (as ASSWipe does) may not actually improve performance
    when that drawing is used many times.
    - Hiding a bitmap using `\alpha&HFF&` will not improve performance, at least at the time of writing.
    This may change in the future, so it is better to use `\alpha&HFF&` than not using it,
    This may change in the future, so it is better to use `\alpha&HFF&` than not to use it,
    but at the moment you should not rely on it improving performance.
    To *really* remove bitmaps, delete or comment the line and/or use `\bord0` and/or `\shad0` to remove outlines and/or shadows.

  2. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -610,7 +610,7 @@ Again, this will hence be much more expensive to render than it needs to be.
    Like before, this can be fixed by splitting the two squares into separate runs, and hence separate bitmaps.
    You can do this inserting a `{}` in the middle of the drawing (note that this only works for drawings, not for text!), but this will then shift
    the second run by the width and height of the first one.
    You can compensate for this by shifting the second component in the opposite direction (arriving at `{\an7\p1}m 0 0 l 100 0 100 100 0 100{}m 900 900 l 1000 900 1000 1000 900 1000`),
    You can compensate for this by shifting the second component in the opposite direction (arriving at `{\an7\p1}m 0 0 l 100 0 100 100 0 100{}m 100 900 l 1100 900 1100 1000 1000 1000`),
    but the much simpler method is to just split the drawing into two separate events.
    Third-party tooling will be able to deal with that better, too.

    @@ -665,8 +665,8 @@ This works quite similarly to a clip gradient, the only difference being that it
    In particular, it can benefit from the composition cache just like a clip gradient will.

    A toy example for this is are the following lines (on a 1920x1080 PlayRes):
    - `{\pos(960,540)\an5\fnArial\fs200\bord30\shad0\3aH&80&\clip(800,400,960,700)}A{\2a1\alpha&HFF&}A`
    - `{\pos(960,540)\an5\fnArial\fs200\bord30\shad0\3aH&80&\clip(960,400,1200,700)}{\alpha&HFF&}A{\2a1\alpha\3a&H80&}A`
    - `{\pos(960,540)\an5\fnArial\fs200\bord30\shad0\3aH&80&\clip(800,400,960,700)}A{\2c1\alpha&HFF&}A`
    - `{\pos(960,540)\an5\fnArial\fs200\bord30\shad0\3aH&80&\clip(960,400,1200,700)}{\alpha&HFF&}A{\2c1\alpha\3a&H80&}A`

    However, this is very finnicky and no good tooling for this exists at the moment.

  3. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -392,7 +392,7 @@ After understanding libass's rendering pipeline, we can now talk about which of

    As a general rule of thumb, the rule about total bitmap size also holds here:
    The only steps that are really relevant for performance are the ones that somehow deal with bitmap data
    (that is, the ones that deal with the actual pixel data and do not just shuffle bitmap *meta*data like positioning or colors around),
    (that is, the ones that deal with the actual pixel data and do not just shuffle bitmap *meta*data like (the integer parts of) positions or colors around),
    and their performance is roughly proportional to the total size of the bitmaps they need to handle.
    Specifically, this means the following:

    @@ -436,7 +436,7 @@ Specifically, this means the following:
    after rounding to 1/64th of an output pixel.
    Hence, an `\xshad0.001` will be faster than an `\xshad0.1`, which will in turn be faster than a `\shad0.1` (since there the shift has to be performed in both directions).
    If you find yourself needing shadows, check if you can get away with rounding all of your shadow offsets to integers.
    (But note that transformations like scale, rotation, and shearing will interfere with this, if present.)
    (But note that transformations like scale, rotation, and shearing will interfere with this, if present, and that this optimization will only work when the display resolution is a multiple of the PlayRes.)
    - Rectangular `\clip` and `\iclip` are actually almost free for the reasons explained above.
    Since `\clip` reduces the resulting bitmap's size, it can even make the rendering (or, to be more precise, the blending) cheaper.
    Similarly, this step of creating a new bitmap structure for the karaoke effect is almost free too,
  4. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -438,7 +438,7 @@ Specifically, this means the following:
    If you find yourself needing shadows, check if you can get away with rounding all of your shadow offsets to integers.
    (But note that transformations like scale, rotation, and shearing will interfere with this, if present.)
    - Rectangular `\clip` and `\iclip` are actually almost free for the reasons explained above.
    Since `\clip` reduces the resulting bitmap's size, it can even make the rendering cheaper.
    Since `\clip` reduces the resulting bitmap's size, it can even make the rendering (or, to be more precise, the blending) cheaper.
    Similarly, this step of creating a new bitmap structure for the karaoke effect is almost free too,
    but of course it still creates another bitmap that must then be blended by the player.
    - Finally, there is the vector clip step.
  5. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -426,7 +426,7 @@ Specifically, this means the following:
    (assuming the display resolution is equal to the LayoutRes; otherwise everything will be scaled by the ratio of the two).
    Expanding a single 16x16 bitmap to 32x32 with a `\blur2.5` is not a problem for a single bitmap,
    but it is still a growth by a factor of 4 and can become noticable when there are many such bitmaps.
    Expanding a single 16x16 bitmap to a 796x796 with a `\blur100` (the largest possible value) will have a big impact,
    Expanding a single 16x16 bitmap to a 796x796 with a `\blur100`[^blurcap] will have a big impact,
    but will probably not be necessary in practice.
    - `\be`, on the other hand, gets more and more expensive the stronger its strength is.
    It also does not scale correctly and has issues with its padding, so you should never use it anyway.
    @@ -462,6 +462,9 @@ It is much better to just optimize every frame's rendering independently.
    [^cascadeblur]: libass uses some *extremely* cool mathematics to achieve this.
    If you are interested in how this works, check out the [paper](https://github.com/MrSmile/CascadeBlur) the author wrote about this algorithm.

    [^blurcap]: `\blur100` is the largest possible blur value **in libass**, at the time of writing.
    VSFilter allows larger blur values, so you should not rely on this capping.

    The last two points are quite important!
    A rectangular clip is basically free, while a vector clip can be very expensive.
    This means that
  6. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 7 additions and 3 deletions.
    10 changes: 7 additions & 3 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -279,7 +279,7 @@ Though then again, maybe you should turn such a drawing into a font instead.

    Note, however, that the shad trick has a performance cost:
    Even though the fill and outline will be fully transparent, they will still be returned by libass and need to be blended by the player (at least at the time of writing).
    Hence, a shad trick line will be roughly three times as expensive to render as a normal line, so it should only be used when necessary.
    Hence, a shad trick line will be more expensive to render than a normal line with only a fill (three times as many bitmaps to blend, twice as many bitmaps to rasterize), so it should only be used when necessary.

    ## The Single Biggest Performance Factor
    We are now ready to talk about the single most important factor when it comes to the rendering performance of subtitles:
    @@ -396,10 +396,13 @@ The only steps that are really relevant for performance are the ones that someho
    and their performance is roughly proportional to the total size of the bitmaps they need to handle.
    Specifically, this means the following:

    - Parsing events and drawings is not very expensive in the grand scheme of things, even when the event's text is very long or when there are very many events.
    - Parsing tags is not very expensive in the grand scheme of things, even when there are many tags or very many events.
    Neither are "animation" effects like moves, transforms, fades, etc.
    ASS rendering always happens on a specific timestamp, so a transform can just be replaced by a constant value (at that timestamp) when parsing.
    The fact that a subtitle changes from frame to frame does not make the subtitles any slower to render - the subtitles need to be rendered and blended every frame anyway, whether they change or not.[^detect_changes]

    Parsing *drawings* *can* have a performance impact when there are *very* many *different* drawings with very long text.
    Most of the time the bitmap sizes are the bigger problem, though.
    - Similarly, splitting style runs, looking up fonts and glyphs in fonts, and transforming shapes is comparatively cheap and not worth worrying about.
    - The first computationally expensive step is rasterizing shapes.
    The shape's complexity (i.e. how many vertices there are per area) does affect the speed there to some degree,
    @@ -424,7 +427,7 @@ Specifically, this means the following:
    Expanding a single 16x16 bitmap to 32x32 with a `\blur2.5` is not a problem for a single bitmap,
    but it is still a growth by a factor of 4 and can become noticable when there are many such bitmaps.
    Expanding a single 16x16 bitmap to a 796x796 with a `\blur100` (the largest possible value) will have a big impact,
    will probably not be necessary in practice.
    but will probably not be necessary in practice.
    - `\be`, on the other hand, gets more and more expensive the stronger its strength is.
    It also does not scale correctly and has issues with its padding, so you should never use it anyway.
    - Subtracting the fill from the outline once again scales with total bitmap size.
    @@ -567,6 +570,7 @@ While libass's caching behavior for a single frame is somewhat predictable,
    I would not recommend relying on libass's caching *across* frames.
    In fact, I do not recommend relying on *any* kind of behavior across frames at all when it comes to performance.
    Instead, ensure that each individual frame renders fast enough, even if it is the first frame that libass will ever render.
    If you are checking performance in mpv, you can resize mpv's window (e.g. by exiting and reentering fullscreen) to clear all of libass's bitmap caches.

    Note that what is *not* currently cached is the process of applying a vector clip (the rasterization *is* cached, but the multiplication is not).
    This could change in the future, though.
  7. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -693,7 +693,8 @@ To clarify: The statements in the following list are *true*, but are written to
    In particular, splitting an event with a few seconds of duration into a sequence of frame-by-frame events can greatly increase the file size,
    but will have almost no effect on performance.
    You may have other reasons to worry about subtitle file sizes, but it does not help to worry about it for performance reasons.
    - Large `\blur` is not much more expensive than smaller `\blur` (but it *is* more expensive than no `\blur` at all).
    - Large `\blur` is not much more expensive than smaller `\blur` (but it *is* more expensive than no `\blur` at all)
    unless the blurred bitmap is very small.
    The added cost mainly comes from needing to pad the bitmap more, not from the larger blur strength itself.
    - Rectangular clips are not expensive at all.
    In fact, they are almost free and can even *improve* performance in some cases.
    @@ -717,7 +718,8 @@ Some of the advice here is simplified for the sake of brevity; read the above se
    - Avoid large bitmaps that are mostly empty, like long diagonal text or sparse drawings.
    Split up your lines into multiple runs if you can.
    - Avoid blurring very large bitmaps if you can.
    Avoid using fractional `\shad` on very large bitmaps if you can - use either an integer value like `\shad1` or a value like `\shad0.001` that is small enough to round to 0 in units of 1/64th of an output pixel.
    Avoid blurring very small bitmaps by extreme amounts (or at least be aware that this can greatly increase their bitmap size).
    - Avoid using fractional `\shad` on very large bitmaps if you can - use either an integer value like `\shad1` or a value like `\shad0.001` that is small enough to round to 0 in units of 1/64th of an output pixel.
    - Do not use `\be`.
    - Avoid using large vector clips.
    Consider baking vector clips into drawings when feasible.
  8. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 10 additions and 2 deletions.
    12 changes: 10 additions & 2 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -416,7 +416,15 @@ Specifically, this means the following:
    That is, a very strong blur will not be any more expensive than a weak blur in principle.[^cascadeblur]
    However, a stronger blur will need to pad the bitmap by a larger amount so that the blur does not cut into the bitmap's edge.
    This in turn makes the bitmap larger, which makes the blur slightly slower.
    This should only be an issue on *extremely* strong blurs, though.
    This should only be an issue on very strong blurs relative to the bitmap sizes, though.

    *How* strong will depend on the size of the blurred bitmap and the count of bitmaps.
    Empirically, a blur with a strength of S will increase the width and height of a bitmap by about the 7 times S pixels
    (assuming the display resolution is equal to the LayoutRes; otherwise everything will be scaled by the ratio of the two).
    Expanding a single 16x16 bitmap to 32x32 with a `\blur2.5` is not a problem for a single bitmap,
    but it is still a growth by a factor of 4 and can become noticable when there are many such bitmaps.
    Expanding a single 16x16 bitmap to a 796x796 with a `\blur100` (the largest possible value) will have a big impact,
    will probably not be necessary in practice.
    - `\be`, on the other hand, gets more and more expensive the stronger its strength is.
    It also does not scale correctly and has issues with its padding, so you should never use it anyway.
    - Subtracting the fill from the outline once again scales with total bitmap size.
    @@ -542,7 +550,7 @@ but some general consequences of this are the following:
    then assuming that the rendering resolution is equal to the PlayRes they will rasterize to the same bitmap and hence one line can use the cached rasterization of the other.
    In fact, the same will happen if one line has a `\pos` of `(300.05,400)`, since `300.05` rounds to `300` when rounded to multiples of 1/8.
    However, a line with a `\pos` of `(300.5,400)` or even `(300.1,400)` will rasterize to a *different* bitmap and hence not be able to use the other line's cached rasterization.
    Hence, rounding a line's position to integer values can be benefitial for caching, at least when the rendering resolution is a multiple of the PlayRes.
    Hence, rounding a line's position to integer values can be beneficial for caching, at least when the rendering resolution is a multiple of the PlayRes.
    (And, as a converse, differently positioned copies of the same shape may not always be able to use caching in the way that you expect.)

    In summary, caching can cause massive performance savings in specific cases like clip gradients, but can also easily break.
  9. arch1t3cht revised this gist Dec 30, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -96,7 +96,7 @@ This means that **whenever a subtitle file looks different in libass than in VSF
    While it has already gotten very close, libass does not yet match VSFilter's rendering behavior completely.
    Still, matching VSFilter's rendering *is* the library's eventual goal.
    This has an *extremely* important consequence for subtitle authors:
    ***Any*** **behavior of libass that diffes from VSFilter can change at any moment, and should hence not be relied on by subtitle authors.**
    ***Any*** **behavior of libass that differs from VSFilter can change at any moment, and should hence not be relied on by subtitle authors.**

    This means that *even if* you ensure that your viewers will only ever use mpv to view your subtitles,
    if at the moment your script only renders "correctly" (as in, "how you want it to render") on libass and not on VSFilter,
  10. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -681,7 +681,8 @@ To clarify: The statements in the following list are *true*, but are written to
    - As a result of the previous two points, a subtitle file's file size does not *directly* affect performance.
    Yes, an absolutely massive file will likely perform worse than a smaller one on average,
    but only because a larger file will contain more events, which means more bitmaps, which means higher total bitmap size.
    But splitting an event with a few seconds of duration into a sequence of frame-by-frame events can greatly increase the file size,

    In particular, splitting an event with a few seconds of duration into a sequence of frame-by-frame events can greatly increase the file size,
    but will have almost no effect on performance.
    You may have other reasons to worry about subtitle file sizes, but it does not help to worry about it for performance reasons.
    - Large `\blur` is not much more expensive than smaller `\blur` (but it *is* more expensive than no `\blur` at all).
  11. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -681,9 +681,9 @@ To clarify: The statements in the following list are *true*, but are written to
    - As a result of the previous two points, a subtitle file's file size does not *directly* affect performance.
    Yes, an absolutely massive file will likely perform worse than a smaller one on average,
    but only because a larger file will contain more events, which means more bitmaps, which means higher total bitmap size.
    But splitting an event with a few seconds of duration into a sequence of frame-by-frame events can greatly increase the file size,
    but will have almost no effect on performance.
    You may have other reasons to worry about subtitle file sizes, but it does not help to worry about it for performance reasons.

    Splitting an event with a few seconds of duration into a sequence of frame-by-frame events can greatly increase the file size, but will have almost no effect on performance.
    - Large `\blur` is not much more expensive than smaller `\blur` (but it *is* more expensive than no `\blur` at all).
    The added cost mainly comes from needing to pad the bitmap more, not from the larger blur strength itself.
    - Rectangular clips are not expensive at all.
  12. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -678,7 +678,7 @@ To clarify: The statements in the following list are *true*, but are written to
    But, in the vast majority of cases, the added cost of parsing a drawing is negligible compared to the cost of rasterizing, blurring, and so on.
    - The complexity of a drawing (that is, the number of vertices) does not have a big effect on its performance cost, at least not until it reaches *absurd* levels.
    - Having separate events for every single frame does not significantly affect performance.
    - As a result of the previous two points, subtitle file's file size does not *directly* affect performance.
    - As a result of the previous two points, a subtitle file's file size does not *directly* affect performance.
    Yes, an absolutely massive file will likely perform worse than a smaller one on average,
    but only because a larger file will contain more events, which means more bitmaps, which means higher total bitmap size.
    You may have other reasons to worry about subtitle file sizes, but it does not help to worry about it for performance reasons.
  13. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -678,6 +678,12 @@ To clarify: The statements in the following list are *true*, but are written to
    But, in the vast majority of cases, the added cost of parsing a drawing is negligible compared to the cost of rasterizing, blurring, and so on.
    - The complexity of a drawing (that is, the number of vertices) does not have a big effect on its performance cost, at least not until it reaches *absurd* levels.
    - Having separate events for every single frame does not significantly affect performance.
    - As a result of the previous two points, subtitle file's file size does not *directly* affect performance.
    Yes, an absolutely massive file will likely perform worse than a smaller one on average,
    but only because a larger file will contain more events, which means more bitmaps, which means higher total bitmap size.
    You may have other reasons to worry about subtitle file sizes, but it does not help to worry about it for performance reasons.

    Splitting an event with a few seconds of duration into a sequence of frame-by-frame events can greatly increase the file size, but will have almost no effect on performance.
    - Large `\blur` is not much more expensive than smaller `\blur` (but it *is* more expensive than no `\blur` at all).
    The added cost mainly comes from needing to pad the bitmap more, not from the larger blur strength itself.
    - Rectangular clips are not expensive at all.
  14. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 0 additions and 7 deletions.
    7 changes: 0 additions & 7 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -302,11 +302,6 @@ Let's give some basic examples:
    No matter how complex the movement and `\t` transforms are[^complex], the resulting bitmap will be fairly small so this will render very efficiently.
    - An opaque rectangle covering the entire screen?
    This may be a very short subtitle line, but it will generate a very big bitmap and hence have a fairly big impact on performance.
    - A line of normal-sized text with a very extreme perspective transformation (think `\fry40` on an 1920x1080 PlayRes)?
    Such a line has a risk of resulting in a very large bitmap due to the extreme transformation - possibly a much larger bitmap than
    one might assume from looking at the visual output, since the bitmap size depends on the text's bounding box (which depends on the font's metrics),
    not on which pixels of the bitmap are actually filled.
    Hence, such extreme transformations should only be used with great care and should ideally be split into shorter runs (more on this later) and/or converted to drawings.
    - A gradient by character?
    Here, every character will have a different color and hence create its own shape run.
    This will mean that there will be a lot of bitmaps (one for each character's fill/outline/etc.).
    @@ -706,8 +701,6 @@ Some of the advice here is simplified for the sake of brevity; read the above se
    If you remember one thing from this article, it should be this.
    - Avoid large bitmaps that are mostly empty, like long diagonal text or sparse drawings.
    Split up your lines into multiple runs if you can.
    - Avoid extreme perspective transformations (like high `\fry` on a large LayoutRes); these can make an event's bounding box much larger than it looks.
    Consider converting your event to a drawing and baking in the transformations.
    - Avoid blurring very large bitmaps if you can.
    Avoid using fractional `\shad` on very large bitmaps if you can - use either an integer value like `\shad1` or a value like `\shad0.001` that is small enough to round to 0 in units of 1/64th of an output pixel.
    - Do not use `\be`.
  15. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 23 additions and 9 deletions.
    32 changes: 23 additions & 9 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -33,6 +33,9 @@ You shouldn't need to memorize these definitions to understand the article, but
    - A *run* (or *shape run*) is a contiguous sequence of characters that have the same style parameters, or a single drawing.
    - *Blur* always means `\blur` unless otherwise specified.
    You should never use `\be` anyway.
    - A *transform* is a `\t` tag.
    A *transformation* is a spatial transformation applied to a line, i.e. positioning, scaling, rotation, and shearing or a combination of them.
    This may be the most nonstandard terminology out of all of these, but I am making the distinction here to avoid some confusion.

    ## On the Different Renderers
    The original ASS renderer was guliverkli's VSFilter.
    @@ -247,9 +250,14 @@ Here, our setting is that we would like to thicken a shape, and then either stro
    We know that we can get a thickened version of our shape through the outline, so we apply a large `\bord` to our shape.
    However, this does not quite work:
    If we now apply a `\blur` to our line, it will only blur the outline and not the fill, so the fill will always be fully opaque on top of the blurred outline.
    This is not a problem when we want a small blur, but if the blur radius is larger than the thickening distance, the blur will reach inside the shape's fill
    This is not a problem when we want a small blur, but if the blur radius[^blurradius] is larger than the thickening distance, the blur will reach inside the shape's fill
    and the opaque fill on top of the blurred outline will create a hard edge.

    [^blurradius]: I am using "blur radius" as an informal term here,
    it is not necessarily the same as the number specified in the `\blur` tag.
    (In particular a Gaussian blur has always infinite "radius" in theory.)
    What I really just mean here is the blur being strong enough that it visibly reaches inside the shape's fill.

    The core problem here is that the outline and fill are two separate bitmaps.
    We can try to hide the fill using `\1a&HFF`, but this will cause the fill bitmap (before it is made transparent) to be subtracted from the outline bitmap,
    so that the outline will be hollow with a hard edge on the inside.
    @@ -291,22 +299,28 @@ Certain caveats apply here, in particular involving caching (see below), but thi

    Let's give some basic examples:
    - A single letter or small-ish drawing, moving across the screen over time and rapidly changing color?
    No matter how complex the movement and transforms are[^complex], the resulting bitmap will be fairly small so this will render very efficiently.
    No matter how complex the movement and `\t` transforms are[^complex], the resulting bitmap will be fairly small so this will render very efficiently.
    - An opaque rectangle covering the entire screen?
    This may be a very short subtitle line, but it will generate a very big bitmap and hence have a fairly big impact on performance.
    - A line of normal-sized text with a very extreme perspective transform (think `\fry40` on an 1920x1080 PlayRes)?
    Such a line has a risk of resulting in a very large bitmap due to the extreme transform - possibly a much larger bitmap than
    - A line of normal-sized text with a very extreme perspective transformation (think `\fry40` on an 1920x1080 PlayRes)?
    Such a line has a risk of resulting in a very large bitmap due to the extreme transformation - possibly a much larger bitmap than
    one might assume from looking at the visual output, since the bitmap size depends on the text's bounding box (which depends on the font's metrics),
    not on which pixels of the bitmap are actually filled.
    Hence, such extreme transforms should only be used with great care and should ideally be split into shorter runs (more on this later) and/or converted to drawings.
    Hence, such extreme transformations should only be used with great care and should ideally be split into shorter runs (more on this later) and/or converted to drawings.
    - A gradient by character?
    Here, every character will have a different color and hence create its own shape run.
    This will mean that there will be a lot of bitmaps (one for each character's fill/outline/etc.).
    However, each of these bitmaps only contains a single character, so the individual bitmaps will be fairly small.
    As the slogan says, what matters is the *total size*, not the count of the bitmaps.
    All of the per-character bitmaps combined will be exactly as large as a single bitmap for the entire line would be
    All of the per-character bitmaps combined will be just as large as a single bitmap for the entire line would be
    (in fact, it can also be much smaller when rotations are involved, see below),
    so a gradient by character will not render any slower than a single-color line.

    I say "just as large" here because the sizes may not match *exactly*:
    On one hand, splitting the line into characters allows spaces to be skipped, which results in some bitmap size savings.
    On the other hand, when the text has large outlines there may be some overlap between the individual per-character bitmaps,
    resulting in a slight increase in total bitmap size.
    But both of these are usually mostly negligible, so in summary we can say that splitting a line into individual characters has no big impact on performance.
    - A horizontal or vertical clip gradient?
    The situation is very similar here, at least when it comes to total bitmap size:
    There will be a *lot* of bitmaps,
    @@ -522,11 +536,11 @@ but some general consequences of this are the following:
    for example by culling parts that are off-screen, outside of a clip, or behind some other opaque shape.
    Of course, this may still negatively affect clipping, but how exactly these two factors weigh up against each other can only be determined by benchmarks on a case-by-case basis.
    - Similarly, baking a vector clip into a drawing will save the cost of blending the vector clip, but can hurt caching in certain cases
    (like when applied to many copies of the same line with e.g. different transforms or different vector clips).
    (like when applied to many copies of the same line with e.g. different transformations or different vector clips).
    Once again, which option is better will depend on the specific situation.
    - In general, when you *can* ensure that multiple variants of the same event (in a performance-critical setting) use the exact same shape (glyph or drawing) and positioning
    without making any big changes to your subtitle file, it may be a good idea to do so.
    - One very important caveat here is that an event's *position* can affect its rasterization (as can other transforms like scaling, rotation, and shearing, but that may be less surprising),
    - One very important caveat here is that an event's *position* can affect its rasterization (as can other transformations like scaling, rotation, and shearing, but that may be less surprising),
    and hence break the caching of its rasterization (and hence the composite caching too).
    Specifically, the fractional part of a shape's X and Y position (in output units) rounded to multiples of 1/8 affect the shape's rasterization.
    That is, two lines that are identical except that one of them has a `\pos` of `(100,200)` and the other one has a `\pos` of `(300,400)`,
    @@ -541,7 +555,7 @@ I would recommend not worrying too much about making sure to make use of caching
    What is more important is being aware of the situations that *do* strictly need caching (which are mainly clip gradients)
    and making sure not to break those for any of the reasons listed above.

    Not, also, that all of these points only apply when there are multiple copies of the same shape (on the same frame) involved.
    Note, also, that all of these points only apply when there are multiple copies of the same shape (on the same frame) involved.
    If a specific shape only appears once (per frame), you don't need to worry about breaking caching in the first place.

    At the time of writing, libass's caches are preserved throughout when rendering a single frame,
  16. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -355,7 +355,7 @@ Roughly, it performs the following steps[^steps]:
    This means applying position shifts, scaling, rotation, and shearing.
    These transformations happen on the level of *shapes*, that is, they transform a shape's end points and spline control points (as opposed to being somehow applied in raster space).
    - The shapes obtained from the previous step will be used for the *fill* of the run's characters or drawings.
    If the run has an outline set, libass will not take the fill shape and expand it by the outline width to obtain a thickened shape that will be used for the outline.
    If the run has an outline set, libass will now take the fill shape and expand it by the outline width to obtain a thickened shape that will be used for the outline.
    Note that libass behaves differently from VSFilter here!
    VSFilter draws outlines in raster space, not in vector space.
    This can cause some rendering differences, in particular for zero-width contours, which will have a nontrivial outline with libass but not with VSFilter.
  17. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -35,7 +35,7 @@ You shouldn't need to memorize these definitions to understand the article, but
    You should never use `\be` anyway.

    ## On the Different Renderers
    The original ASS renderer was Guliverkli2's VSFilter.
    The original ASS renderer was guliverkli's VSFilter.
    This was a DirectShow filter,
    meaning that it could hook into a video playback stream on Windows and draw subtitles directly onto the YCbCr video.

  18. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -186,7 +186,7 @@ It may have different levels of transparency at different points of its rectangl
    (the most common case being that the bitmap is fully opaque in some areas and fully transparent in other areas, with some transition in between those two areas),
    but its hue, saturation, and brightness cannot change throughout its rectangle.

    [^outputformat]: Of course, the way I have described this behavior it is simply an implementation detail of libass that a priori has nothing to do with libass.
    [^outputformat]: Of course, the way I have described this behavior it is simply an implementation detail of libass that a priori has nothing to do with the ASS format.
    In reality it is the other way around: VSFilter *also* constructs monochromatic bitmaps out of the subtitles internally, which it then blends onto the video.
    This causes ASS rendering to behave in the way it does, which in turn allows libass to choose this output format.
    I am focusing on libass here since it is the main subject of this article and because it is a bit simpler to describe,
  19. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -157,7 +157,7 @@ Still, the general gist of what kinds of operations are how expensive should pro
    Let us start with something that is not even technically libass's *internals*, but actually its "externals,"
    at least as far as the library is concerned.
    (Not that this makes any difference - the ultimately important part is that this is something that is not immediately visible to the Aegisub end user, but is still extremely helpful to understand.
    In fact, understanding this can also help with understanding the format's *rendering* better.)
    In fact, understanding this can also help with understanding the format's rendering *output* better.)

    Let's say I am writing a video player and want to use libass to render subtitles:
    How exactly am I supposed to use libass?
  20. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -150,7 +150,7 @@ and *especially* not related to how any of its internals work.
    Understanding libass's internal workings can be helpful to get a general idea of which types of typesetting have which kinds of impacts on performance,
    but there is no guarantee that these internals will forever stay the way that they are right now.
    Of course, it is unlikely that libass will have any major performance regressions in common use cases,
    but it would not be impossible for libass to at some point become slightly slower in a certain situations if that means becoming much faster for other, more important cases.
    but it would not be impossible for libass to at some point become slightly slower in certain situations if that means becoming much faster or more accurate for other, more important cases.
    Still, the general gist of what kinds of operations are how expensive should probably stay the same in the long run.

    ## Libass's Output Format
  21. arch1t3cht revised this gist Dec 29, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,7 @@ That's because it is.
    Luckily, you do not need to understand most of the black magic to learn many meaningful lessons about
    both the behavior and performance of renderers.

    Let's get it over with right away and post the sign:
    Let's get it over with right away and post the diagram:

    <img width="1200" height="559" alt="A Venn diagram showing two completely disjoint circles. One circle is labeled &quot;What typesetters think is bad for performance.&quot; The other circle is labeled &quot;What's actually bad for performance.&quot;" src="https://gist.github.com/user-attachments/assets/c8a0f6c7-5cdd-49cd-8d06-92be4dfe0812" />

  22. arch1t3cht created this gist Dec 29, 2025.
    715 changes: 715 additions & 0 deletions renderer_internals.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,715 @@
    # What Every Typesetter Should Know about Renderer Internals
    ASS subtitle rendering is seen by many people as some arcane black magic.
    That's because it is.
    Luckily, you do not need to understand most of the black magic to learn many meaningful lessons about
    both the behavior and performance of renderers.

    Let's get it over with right away and post the sign:

    <img width="1200" height="559" alt="A Venn diagram showing two completely disjoint circles. One circle is labeled &quot;What typesetters think is bad for performance.&quot; The other circle is labeled &quot;What's actually bad for performance.&quot;" src="https://gist.github.com/user-attachments/assets/c8a0f6c7-5cdd-49cd-8d06-92be4dfe0812" />

    This image has rightfully become a meme in the typesetting community due to the many, many misconceptions that many typesetters have about ASS rendering performance.
    This article goes into great detail about the inner workings of ASS renderers to clear up as many of these myths as possible.
    As a result, this article is quite long.
    If you are looking for a shorter summary, you can read the two sections at the end,
    but if you do any kind of serious typesetting, I strongly recommend you to read the entire thing when you can.

    ## Definitions
    I take some care to be consistent with my terminology here.
    None of the terms I use should be nonstandard in any way,
    but I try to only use one synonym of any specific term for one specific context to be less ambiguous.
    You shouldn't need to memorize these definitions to understand the article, but you can refer to it if you are confused by something.
    - An *event* or *line* is a single `Dialogue: ` "entry" in an ASS subtitle file.
    That is, an event has a style, a layer, margins, an event text, and so on.
    (I do use both *event* and *line* here since there is no real risk of confusion with *line breaks* here.
    *Event* is the more technical term while *line* is the more informal one.)
    - A *drawing* is a shape that is drawn using the ASS drawing syntax, i.e. `{\p1}m 0 0 l 100 0 ...`
    - A *character* is a visible letter of text that is rendered in an ASS Event.
    - A *glyph* is a shape that a font provides for a certain character (or ligature, etc.).
    - A *shape* is either a glyph or a (parsed) drawing.
    That is, a shape consists of a sequence of vertices that should be connected by lines, bezier curves, or other splines.
    Drawing (or "rasterizing") a *shape* means filling in its *interior*, *not* stroking along its contours, even when drawing outlines.
    - An *outline* is the outline or border drawn around an ASS event with `\bord`
    - A *run* (or *shape run*) is a contiguous sequence of characters that have the same style parameters, or a single drawing.
    - *Blur* always means `\blur` unless otherwise specified.
    You should never use `\be` anyway.

    ## On the Different Renderers
    The original ASS renderer was Guliverkli2's VSFilter.
    This was a DirectShow filter,
    meaning that it could hook into a video playback stream on Windows and draw subtitles directly onto the YCbCr video.

    Since then, multiple variants of VSFilter have appeared, with some differences between them.
    The three most relevant ones are xy-VSFilter, MPC-HC's internal subtitle renderer (MPC-HC ISR for short), and VSFilterMod.

    The former two are intended for live playback of subtitles, and are available in video players like MPC-HC.
    They only have comparatively minor changes in rendering over the original VSFilter.
    VSFilterMod, on the other hand, makes big extensions to the format,
    adding many new tags for effects like distortion, gradients, inserting images, and more.
    It can do this because it is *not intended for live playback* (so in particular it does not need to have realtime performance).
    VSFilterMod can be very useful when burning subtitles onto the video *in advance*, since it gives subtitle authors many more capabilities,
    but its extension to the ASS format cannot be used when authoring subtitles that are to be distributed as softsubs.

    Since, for the most part, this article is centered around authoring softsubs, VSFilterMod is not relevant for most of it.
    Still, I am mentioning it here since it certainly has its place when creating hardsubs,
    and since it is important to understand this distinction between hardsubbing and creating softsubs intended for live playback.

    So, back to xy-VSFilter and MPC-HC ISR.
    Both of these renderers are based on the original VSFilter, and hence use the Windows GDI and DirectWrite APIs for drawing text.
    This makes them Windows-only, which in turn meant that users on Linux and MacOS could not watch videos with ASS subtitles.

    libass was created to solve this problem.
    libass is a cross-platform ASS rendering library that was written completely from scratch, without copying any of VSFilter's code.
    Unlike the VSFilter variants, libass does not use GDI and DirectWrite for rendering[^libassgdi], so it can be used on all major operating systems.
    However, this also means that it needs to take great care to emulate GDI's behavior as closely as possible in order for its rendering to match VSFilter's.

    [^libassgdi]: It can still use them for looking up fonts on Windows, but that is not too relevant here.

    Being cross-platform was not the only improvement that libass brought to the ecosystem.
    Over time, libass was improved with many performance optimizations that made its rendering much faster and more efficient.
    Today, libass is much faster than the VSFilters in many cases.

    Furthermore, libass is still being actively developed, while development on VSFilter is mostly discontinued
    (with the little development that *is* being done on the VSFilters when necessary often being coordinated by libass's developers).
    Today, this makes libass the effective "standard" renderer for most users.
    libass is used in mpv (which is generally regarded as the best media player around, and the target of many fansub releases),
    as well as in many other players like VLC, Kodi, or even new versions of MPC-HC (though it is not selected by default).

    Now, why do you need to know this?
    Well, many subtitle authors hear about how libass is the "standard" renderer nowadays,
    and conclude that "I only need to worry about libass when authoring subtitles."
    They may then include a text like "This release is intended for playback in mpv" in their releases
    and attribute any rendering errors on VSFilter to their viewers using the wrong player.

    **However, this is not correct.**
    While libass *is* the most popular renderer and most viewers can and should use it,
    libass's rendering behavior is *not* authoritative for the format.
    Neither is the document that is sometimes presented as the "ASS Specification"
    (and neither are any third-party documents like Aegisub's manual).

    While it's unfortunate for all involved parties, the reality is that there is no (authoritative) specification document for the ASS format.
    **The ASS subtitle format is implementation-defined**, with the reference implementation being VSFilter.
    This means that **whenever a subtitle file looks different in libass than in VSFilter, it is VSFilter who is correct and libass who is wrong**.
    While it has already gotten very close, libass does not yet match VSFilter's rendering behavior completely.
    Still, matching VSFilter's rendering *is* the library's eventual goal.
    This has an *extremely* important consequence for subtitle authors:
    ***Any*** **behavior of libass that diffes from VSFilter can change at any moment, and should hence not be relied on by subtitle authors.**

    This means that *even if* you ensure that your viewers will only ever use mpv to view your subtitles,
    if at the moment your script only renders "correctly" (as in, "how you want it to render") on libass and not on VSFilter,
    there is no guarantee that your script will *keep* rendering this way on libass in the future.[^libassextensions]

    [^libassextensions]: At this point it should be mentioned that libass maintains a [list of libass's ASS extensions](https://github.com/libass/libass/wiki/Libass'-ASS-Extensions) on its GitHub wiki.
    If you know what you are doing and are aware of the risks (or if you cannot avoid it, as could be the case when working with bidirectional text), you can rely on these extensions.
    However, you should be aware that these extensions are *not* precisely specified (more on this below) and could hence also have subtle changes in behavior in the future.

    This is something that often causes frustration among libass users,
    with comments along the lines of "Why break my subtitle file's rendering and stagnate the format just to make some 20-year old subtitles that nobody cares about render correctly?"
    This is easy to think, but it's important to realize that there *exist* old subtitle files that currently *cannot* be correctly played back outside of Windows,
    and that rectifying this situation is part of libass's goal.
    Moreover, the alternative isn't any better either:
    If we were to stop respecting VSFilter as the reference implementation, that would mean that *no* reference exists at all, and that libass is free to make up whatever behavior it likes at any moment.
    Unless it were to *precisely* specify *every single edge case* of its rendering (and [Hyrum's law](https://www.hyrumslaw.com) suggests that this may be a lost cause),
    any behavior that you might rely on could be deemed as "unintended" or "a bug" by libass in the future and changed.
    With VSFilter being the reference implementation, there is at least *a* reference implementation, and any libass behavior that matches VSFilter can be relied on to not change in the future
    (of course there could still be rendering regressions, but these can then be objectively deemed as bugs and be fixed).

    The bright side of this story is that this only applies to *rendering*, not to performance.
    If you are sure that your viewers (or at least the viewers that you care about) will only use libass for rendering,
    you're free to rely on libass's performance and author subtitles that may stutter on VSFilter but still render smoothly on libass.

    This is how we arrive at the slogan that I usually use here:
    **You may target only libass for performance, but you should target the *intersection* of libass and VSFilter for rendering.**
    That is the only way to author subtitles that render correctly on libass and are guaranteed to continue to render correctly in the future.

    For a list of known libass/VSFilter differences you should watch out for, refer to the [corresponding page on the libass wiki](https://github.com/libass/libass/wiki/Differences-between-Libass-and-VSFilters) and the [list of ASS quirks](https://fansubbers.miraheze.org/wiki/List_of_ASS_quirks).

    Finally, I should mention that this story gets more complicated when additional third-party authoring tooling gets involved.
    Many authoring tools (think Aegisub, or the various Aegisub scripts) have their own quirks in parsing ASS subtitles
    (or often do not even try to match VSFilter's parsing, and instead apply their own (incorrect) mental model of what the format *could* be).
    To give a few examples:

    - At the time of writing, Aegisub's manual claims that `\fad()` is a simple fade two arguments while `\fade()` is a complex fade that takes seven arguments.
    In reality, `\fad` and `\fade` are completely synonymous and can each be used with either 2 or 7 arguments.
    However, some automation scripts may not be aware of this, so even if it does not change rendering in *any* way,
    you may still want to use `\fad` for two-argument fades and `\fade` for seven-argument fades anyway.
    - Colors are often formatted as `\1c&HFF0000&`, but there is actually no need for either the leading `&H` nor the trailing `&` as far as the renderers are concerned.
    However, Aegisub typically formats colors this way so many automation scripts expect this format.

    You can choose yourself how much you want to care about
    these quirks - since they will not affect the final rendering result this is only a matter of convenience for you as the author - but it is something to be aware of.

    ## How Much to Rely on Internals
    The rest of this article will explain part of how libass works internally (at the time of writing),
    since understanding this can be very helpful for understanding how to estimate or improve the rendering performance of your subtitle file.

    Note that the concepts explained in the previous section also apply here:
    libass's only guarantee is that it will (try its best to) render subtitles the same way VSFilter does.
    It does not make any guarantees related to its performance for certain types of files,
    and *especially* not related to how any of its internals work.
    Understanding libass's internal workings can be helpful to get a general idea of which types of typesetting have which kinds of impacts on performance,
    but there is no guarantee that these internals will forever stay the way that they are right now.
    Of course, it is unlikely that libass will have any major performance regressions in common use cases,
    but it would not be impossible for libass to at some point become slightly slower in a certain situations if that means becoming much faster for other, more important cases.
    Still, the general gist of what kinds of operations are how expensive should probably stay the same in the long run.

    ## Libass's Output Format
    Let us start with something that is not even technically libass's *internals*, but actually its "externals,"
    at least as far as the library is concerned.
    (Not that this makes any difference - the ultimately important part is that this is something that is not immediately visible to the Aegisub end user, but is still extremely helpful to understand.
    In fact, understanding this can also help with understanding the format's *rendering* better.)

    Let's say I am writing a video player and want to use libass to render subtitles:
    How exactly am I supposed to use libass?

    It turns out that (given a subtitle file and a timestamp) libass does not simply take an RGB image and spit out an RGB image with the subtitles drawn onto them.
    In fact, libass does not even need the video frames at all; all you need to tell it is the video's resolution and the resolution you want to draw subtitles at.
    Instead, given a subtitle file and a timestamp, libass will return a **sequence of monochromatic bitmaps with an alpha channel**.
    More precisely, it will return a sequence of "bitmaps," where each bitmap consists of the following:

    - A width and a height
    - X and Y coordinates specifying where on the screen the bitmap should be positioned
    - An RGBA color
    - `width*height` alpha values from 0 to 255, specifying the transparency of each pixel in the bitmap's `width x height` rectangle.

    It is then the player's responsibility to blend these bitmaps onto the video one after the other, in the given order.
    (Players may also need to apply some colorspace conversions to the colors of the bitmaps, but that is not too relevant for our purposes.)
    Players are free to do this however they please:
    Aegisub will "naively" blend the bitmaps onto the RGB frame on the CPU while mpv will pack all the bitmaps onto a single large texture and send that to the GPU,
    so that blending can be done on the GPU as part of mpv's resizing and colorspace conversion pipeline.

    This output format has some very important consequences about the limitations of ASS rendering![^outputformat]
    Let me repeat the slogan again to really hammer this in:
    **Any ASS subtitle file will result in a sequence of monochromatic alpha bitmaps.**
    *Any one* of these bitmaps can only have a *single* color.
    It may have different levels of transparency at different points of its rectangle
    (the most common case being that the bitmap is fully opaque in some areas and fully transparent in other areas, with some transition in between those two areas),
    but its hue, saturation, and brightness cannot change throughout its rectangle.

    [^outputformat]: Of course, the way I have described this behavior it is simply an implementation detail of libass that a priori has nothing to do with libass.
    In reality it is the other way around: VSFilter *also* constructs monochromatic bitmaps out of the subtitles internally, which it then blends onto the video.
    This causes ASS rendering to behave in the way it does, which in turn allows libass to choose this output format.
    I am focusing on libass here since it is the main subject of this article and because it is a bit simpler to describe,
    but either way, understanding that any ASS shape will only ever be able to result in a small number of monochromatic bitmaps with alpha is a very helpful way to internalize the format's abilities and limitations.

    How can a single subtitle line have multiple different colors, then?
    Well, one single subtitle line - and, in fact, one single shape run, i.e. one section of text with no tags or line breaks in between or one drawing - can result in more than one bitmaps.
    More specifically, one shape run can result in up to four bitmaps[^iclip]:

    - One bitmap for the shape's fill
    - One bitmap for the shape's outline, obtained by thickening the shape, rasterizing that, and possibly subtracting the fill from it[^borderbox]
    - One bitmap for the shape's shadow, obtained by shifting the outline bitmap by a certain distance
    - One bitmap for karaoke highlighting (using `\k`, `\ko`, `\kf`, etc.), obtained from the shape's fill or outline, possibly cut off horizontally at some position

    [^iclip]: Actually, that is not quite true.
    Any one of these bitmaps can turn into four smaller bitmaps when a rectangular `\iclip` is involved.
    But these four bitmaps will then all have the same color, so this does not change anything about the conclusions we will draw from this simplified statement.
    [^borderbox]: Or as a simple rectangle in the case of BorderStyle=3 or 4

    You may recognize these as exactly the four parts of a line that you can specify colors for via `\1c`, `\3c`, `\4c`, and `\2c`!
    And this is in fact exactly what these tags do in libass: They just control the color that is associated to the corresponding bitmap returned by libass.

    So, what does this mean for the ASS format's capabilities and limitations?
    It means that **any shape run can have at most four colors**!
    And since, when doing advanced effects, these four bitmaps rarely interact with each other in the way you want,
    you can often only use *one* color per shape run.

    So what do you do when you want more colors?
    Well, you need to make more shape runs.
    Either by splitting a single line into multiple shape runs with different colors (a gradient by character),
    or by creating many copies of your line with different colors and different positions/clips/etc (e.g a clip gradient).
    This is nothing new to a typesetter, but understanding the "monochromatic bitmaps" limitation can help one understand *why* these techniques have to work the way they do.

    The fact that the bitmaps returned by libass are blended onto the video frame in sequence (as opposed to first being "added together" in some other way) also has some important consequences:
    **The only way to fully cover the original video's content is with a fully opaque bitmap.**
    One may be tempted to think that if two rectangles with 80% opacity are stacked on top of one another, they will completely mask the background since the two 80% opacity values add up to more than 100%,
    but this is not the case:
    The bitmaps are blended onto the video one after the other, so after blending the first bitmap, the background will still be visible with 20% opacity,
    so after blending the second bitmap it will still be visible with 4% opacity.
    This can be very important when dealing with blurred edges and/or antialiasing.
    For example, two opaque rectangles joined at an edge will completely cover their area, but when those two rectangles are blurred *individually*, the background will bleed through at the edge they share.
    The same will happen when the edge is not completely aligned to the pixel grid in display units, since then the edges will be slightly blurred due to antialiasing.

    Finally, understanding how shapes render as four monochromatic bitmaps can also help you to understand other typesetting techniques like simple blur & glow setups:
    `\blur` is implemented by applying a gaussian blur to an individual bitmap's alpha channel.
    It does not change a bitmap's color or interact with any other bitmaps.
    Moreover, `\blur` is only ever applied to one bitmap of a shape at a time (but may then also be applied to the shadow or karaoke color as a consequence, since these are copied from the fill or outline):
    If the line has no outline, the line's fill (i.e. the fill bitmap's alpha channel) will be blurred.
    If the line *has* an outline, the outline is blurred but the fill is not.
    However, the line's outline bitmap is *below* the fill in rendering order, so blurring the outline will only blur the outer side of the outline, and not the transition from the outline to the fill.

    So, if we want the transition from outline to fill to also be blurred, we have to create multiple copies of our line,
    make sure that one of them has a blurred fill and the other has a blurred outline, and then layer them correctly.
    This is exactly what the typical "Blur & Glow" workflow does.

    Similarly, we can understand the "shad trick."
    Here, our setting is that we would like to thicken a shape, and then either strongly blur that thickened version or make it semi-transparent (or both).
    We know that we can get a thickened version of our shape through the outline, so we apply a large `\bord` to our shape.
    However, this does not quite work:
    If we now apply a `\blur` to our line, it will only blur the outline and not the fill, so the fill will always be fully opaque on top of the blurred outline.
    This is not a problem when we want a small blur, but if the blur radius is larger than the thickening distance, the blur will reach inside the shape's fill
    and the opaque fill on top of the blurred outline will create a hard edge.

    The core problem here is that the outline and fill are two separate bitmaps.
    We can try to hide the fill using `\1a&HFF`, but this will cause the fill bitmap (before it is made transparent) to be subtracted from the outline bitmap,
    so that the outline will be hollow with a hard edge on the inside.
    Luckily, the shadow comes to our rescue.
    When our line has nonzero `\bord`, which it does, the base for the shadow bitmap will be a copy of the outline.
    However, if we add a `\ko0` to our line (or give the fill a `\1a&HFE`, making it almost but not *fully* transparent - this was the old method before the `\ko0` method was discovered),
    the fill is not subtracted from the shadow like it is from the outline.
    Hence, if we then force a shadow to be created with `\shad0.001` and remember to hide all other bitmaps with `\1a&HFF\2a&HFF\3a&HFF` we get a beautiful single thickened bitmap.
    This is exactly what the "shad trick" is.[^tshad]

    [^tshad]: The shad trick also has a second benefit:
    The shadow's position can be changed freely throughout a line's using `\t(\xshad)` and `\t(\yshad)`
    while a line's position can only be changed using `\move` which only allows for a single linear movement in a certain time interval.
    Hence, the shad trick can be used to create a moving line in a more compact way (i.e. resulting in a smaller subtitle file).
    This will usually not help improve performance (in fact it may make it worse due to the cost of the shad trick),
    but it can help in cases where creating frame-by-frame events would make your file size blow up by extreme amounts,
    like when frame-by-frame tracking a very complex drawing.
    Though then again, maybe you should turn such a drawing into a font instead.

    Note, however, that the shad trick has a performance cost:
    Even though the fill and outline will be fully transparent, they will still be returned by libass and need to be blended by the player (at least at the time of writing).
    Hence, a shad trick line will be roughly three times as expensive to render as a normal line, so it should only be used when necessary.

    ## The Single Biggest Performance Factor
    We are now ready to talk about the single most important factor when it comes to the rendering performance of subtitles:

    *No matter* how libass produces the bitmaps it outputs, the player needs to blend them onto the video in some way or another.
    The blending calculations themselves are not hard - CPUs can do math extremely quickly.
    The main "difficulty" (in the sense of performance) turns out to come from the sheer *amount* of data that has to be processed here,
    i.e. the amount of memory that needs to be accessed or copied in order to blend the bitmaps.
    This is a very common theme in image processing:
    Very often the main bottleneck is not CPU speed, but memory bandwidth
    (and, as a corollary, factors like cache size and locality).

    Either way, we can come to the following conclusion:
    **The dominating factor in ASS rendering performance is *total bitmap size*.**
    That is, the sizes (width times height) of all the bitmaps output by libass, summed together.
    Certain caveats apply here, in particular involving caching (see below), but this slogan can already explain the vast majority of performance guidelines.

    Let's give some basic examples:
    - A single letter or small-ish drawing, moving across the screen over time and rapidly changing color?
    No matter how complex the movement and transforms are[^complex], the resulting bitmap will be fairly small so this will render very efficiently.
    - An opaque rectangle covering the entire screen?
    This may be a very short subtitle line, but it will generate a very big bitmap and hence have a fairly big impact on performance.
    - A line of normal-sized text with a very extreme perspective transform (think `\fry40` on an 1920x1080 PlayRes)?
    Such a line has a risk of resulting in a very large bitmap due to the extreme transform - possibly a much larger bitmap than
    one might assume from looking at the visual output, since the bitmap size depends on the text's bounding box (which depends on the font's metrics),
    not on which pixels of the bitmap are actually filled.
    Hence, such extreme transforms should only be used with great care and should ideally be split into shorter runs (more on this later) and/or converted to drawings.
    - A gradient by character?
    Here, every character will have a different color and hence create its own shape run.
    This will mean that there will be a lot of bitmaps (one for each character's fill/outline/etc.).
    However, each of these bitmaps only contains a single character, so the individual bitmaps will be fairly small.
    As the slogan says, what matters is the *total size*, not the count of the bitmaps.
    All of the per-character bitmaps combined will be exactly as large as a single bitmap for the entire line would be
    (in fact, it can also be much smaller when rotations are involved, see below),
    so a gradient by character will not render any slower than a single-color line.
    - A horizontal or vertical clip gradient?
    The situation is very similar here, at least when it comes to total bitmap size:
    There will be a *lot* of bitmaps,
    but each bitmap will just be a single-pixel strip.
    So, once again, the total bitmap size will be no different from the size of a single line with no gradient.
    We will see later on that there are some other factors here that play into why clip gradients are so efficient,
    but we can already see that the total bitmap size works out.
    - A diagonal clip gradient?
    This one is more of a problem.
    Once again, there will be many bitmaps for the many small strips of the shape,
    but this time the individual bitmaps will not be a single pixel wide or tall!
    Bitmaps are always rectangles, so a bitmap containing a narrow diagonal strip will need a comparatively large bitmap to contain it.
    As a result, a diagonal clip gradient will result in a large number of *large* bitmaps,
    and can hence have quite a high performance impact!

    [^complex]: Of course, if you have a thousand transforms in a single line they might also start having an impact on performance.
    But you should simply never need this in practice.
    Even when transforming five different tags frame-by-frame, you'll probably still only have a couple dozen transforms, which will be perfectly fine.
    For the rest of this article, please disregard pathological cases (i.e. cases that are simply redundant on a *syntax* level) like these when I use phrases like "no matter how complex."

    So, we see that understanding this total bitmap size rule can already tell you a lot about the performance of different types of subtitle effects.
    If you remember one thing from this article, it should be this rule.
    Still, there is more to learn about this topic, as we will see in the following sections.

    ## Libass's Rendering Pipeline
    Let me now explain in more detail how libass renders a subtitle file.
    Roughly, it performs the following steps[^steps]:

    [^steps]: This is not a complete list and may not be in the exact correct order in all cases, it just lists the steps that are relevant for understanding rendering and rendering performance in most common cases.

    - Given a subtitle file and a timestamp, find all events (i.e. subtitle lines) which are visible at the given timestamp.
    - For every such event:
    - Parse the event's text. This goes through the event character by character, parsing override tags when ones are found.
    When tags like `\t`, `\move`, or `\fad` are encountered, their respective values at the current timestamp are computed.
    The result of this step is a list of text characters or drawings, together with a full list of style parameter values
    (including the font name, font size, color, blur, outline/shadow, scale, rotation, shear, etc.) for each individual character or drawing.
    - Split the event's tags into *runs*,
    where each run is a contiguous sequence of characters (or a single drawing) that have the same style parameters.
    Each drawing gets its own run.
    An empty tag block `{}` will not split a run of characters, but it *will* split a drawing into two drawings and hence two runs.
    - For each such run:
    - Turn each individual character or drawing into a shape.
    For characters, this entails looking up the font and reading the corresponding glyphs in the fonts.[^shaping]
    For drawings, it entails parsing the drawing text and turning it into libass's own internal binary format for a shape.
    - Transform the shapes according to their run's style parameters.
    This means applying position shifts, scaling, rotation, and shearing.
    These transformations happen on the level of *shapes*, that is, they transform a shape's end points and spline control points (as opposed to being somehow applied in raster space).
    - The shapes obtained from the previous step will be used for the *fill* of the run's characters or drawings.
    If the run has an outline set, libass will not take the fill shape and expand it by the outline width to obtain a thickened shape that will be used for the outline.
    Note that libass behaves differently from VSFilter here!
    VSFilter draws outlines in raster space, not in vector space.
    This can cause some rendering differences, in particular for zero-width contours, which will have a nontrivial outline with libass but not with VSFilter.
    As explained in the previous sections, this means that you should *not* rely on libass's stroking behavior for zero-width contours.
    - Rasterize the shapes obtained from the previous step to bitmaps.
    - Combine all of the run's fill and outline bitmaps into a single bitmap each.
    - If the run has a nonzero blur, blur the outline bitmap if there is one, otherwise blur the fill bitmap.
    - If necessary, subtract the fill bitmap from the outline bitmap.
    If the line has a non-integer shadow (in output units, not necessarily in PlayRes units), shift the outline or fill bitmap by the necessary amount to obtain the shadow bitmap.
    (Shifting the shadow by an *integer* amount can be done by just adjusting the position fields, without modifying the actual pixel data)
    - We have now obtained up to three combined bitmaps (fill / outline / shadow) for each run.
    Next, apply rectangular `\clip` or `\iclip` as well as karaoke effects if present by cropping the bitmaps (or a copy of the bitmaps in the case of karaoke effects).
    This is actually very cheap, since it can be done by just modifying the coordinates, sizes, and pointers of the bitmaps, without modifying (or even copying) the underlying pixel data.
    In the case of a rectangular `\iclip`, this can be done by combining four rectangular cropped copies of the same bitmap, covering the four sides of the rectangle that is cut out of the bitmap.
    Here, making a "copy" of the bitmap (both in the case of `\iclip` and for karaoke effects) can be done by simply creating another bitmap object that refers to the same underlying pixel data buffer,
    so this is also very fast and does not need to make a copy of the pixel data.
    - If the line has a vector clip, rasterize the vector clip to a bitmap and multiply each bitmap's output layer with that rasterized bitmap (or its inverse if it's an `\iclip`).
    - Having now obtained a set of bitmaps for each event, check if there are any events on the same layer that collide with one another, and move some of them out of the way if so.

    [^shaping]: It also entails doing *font shaping* beforehand, which involves laying out bidirectional text and applying ligatures and diacritics, but this is not too relevant for our purposes.
    The main takeaway is that there may not be a one-to-one correspondence between "characters" (more specifically, unicode codepoints) in the event text and rendered glyphs.

    ### Consequences for Performance
    After understanding libass's rendering pipeline, we can now talk about which of these different steps affect performance the most (and which don't).

    As a general rule of thumb, the rule about total bitmap size also holds here:
    The only steps that are really relevant for performance are the ones that somehow deal with bitmap data
    (that is, the ones that deal with the actual pixel data and do not just shuffle bitmap *meta*data like positioning or colors around),
    and their performance is roughly proportional to the total size of the bitmaps they need to handle.
    Specifically, this means the following:

    - Parsing events and drawings is not very expensive in the grand scheme of things, even when the event's text is very long or when there are very many events.
    Neither are "animation" effects like moves, transforms, fades, etc.
    ASS rendering always happens on a specific timestamp, so a transform can just be replaced by a constant value (at that timestamp) when parsing.
    The fact that a subtitle changes from frame to frame does not make the subtitles any slower to render - the subtitles need to be rendered and blended every frame anyway, whether they change or not.[^detect_changes]
    - Similarly, splitting style runs, looking up fonts and glyphs in fonts, and transforming shapes is comparatively cheap and not worth worrying about.
    - The first computationally expensive step is rasterizing shapes.
    The shape's complexity (i.e. how many vertices there are per area) does affect the speed there to some degree,
    but the main factor is just the resulting bitmap's size which depends on the shape's bounding box.
    The boundary box is obtained from the font's metrics if the shape comes from a font glyph,
    or computed as the maxima and minima of all vertex coordinates when the shape comes from a drawing.
    - Combining multiple bitmaps into a single one simply scales with total bitmap size.
    - Blurring bitmaps - you guessed it - also scales with the size of the blurred bitmap.
    However, there is a bit more to say here:
    - Out of the various steps scaling with total bitmap size, blurring is generally the most expensive one, *if* it is performed.
    Often there is little you can do to avoid blurring,
    but if you have a rectangle covering the entire screen with a `\blur1` then you might want to ask yourself if you *really* need that blur or if you can achieve your effect without it.
    - *If* `\blur` is present, the cost of a blur will be approximately constant with respect to the blur strength.
    That is, a very strong blur will not be any more expensive than a weak blur in principle.[^cascadeblur]
    However, a stronger blur will need to pad the bitmap by a larger amount so that the blur does not cut into the bitmap's edge.
    This in turn makes the bitmap larger, which makes the blur slightly slower.
    This should only be an issue on *extremely* strong blurs, though.
    - `\be`, on the other hand, gets more and more expensive the stronger its strength is.
    It also does not scale correctly and has issues with its padding, so you should never use it anyway.
    - Subtracting the fill from the outline once again scales with total bitmap size.
    The same holds for shifting the shadow's bitmap,
    but here it is important to mention that this shift will only be performed if there is actually a nontrivial distance to shift by,
    after rounding to 1/64th of an output pixel.
    Hence, an `\xshad0.001` will be faster than an `\xshad0.1`, which will in turn be faster than a `\shad0.1` (since there the shift has to be performed in both directions).
    If you find yourself needing shadows, check if you can get away with rounding all of your shadow offsets to integers.
    (But note that transformations like scale, rotation, and shearing will interfere with this, if present.)
    - Rectangular `\clip` and `\iclip` are actually almost free for the reasons explained above.
    Since `\clip` reduces the resulting bitmap's size, it can even make the rendering cheaper.
    Similarly, this step of creating a new bitmap structure for the karaoke effect is almost free too,
    but of course it still creates another bitmap that must then be blended by the player.
    - Finally, there is the vector clip step.
    (As well as the collision detection, but that is basically free too.)
    Here, the vector clip has to be *rasterized to a bitmap* and applied to the event's bitmaps.
    Hence, this scales with both the total bitmap size *and* the size of the vector clip shape - whichever is larger.

    [^detect_changes]: Again, this is actually not completely true.
    Libass *does* have some optimizations for subtitles that do not change from frame to frame.
    We will talk about caching in detail later on, but libass is also able to communicate to the player that "the subtitles for this frame are *exactly* the same as for the last frame you rendered",
    or that "the subtitles for this frame have the same bitmaps and colors as the last frame you rendered; only their positions have changed."
    However, this optimization is not too relevant:
    Apart from the fact that the player still needs to blend the bitmaps either way,
    this *only* works if the subtitles stay the *exact* same across frames.
    As soon as a single event changes, this optimization no longer triggers.
    Moreover, I would advise against relying on behavior across frames in the first place.
    It does not help if your second frame renders very quickly if the first one already causes lags,
    and the viewer could seek around the video and break your assumptions about which frames are rendered after which other frames.
    It is much better to just optimize every frame's rendering independently.

    [^cascadeblur]: libass uses some *extremely* cool mathematics to achieve this.
    If you are interested in how this works, check out the [paper](https://github.com/MrSmile/CascadeBlur) the author wrote about this algorithm.

    The last two points are quite important!
    A rectangular clip is basically free, while a vector clip can be very expensive.
    This means that
    - You should never use a rectangular vector clip.
    If your clip *can* be a rectangular clip, it should be one.
    - You should not make unnecessarily large vector clips.
    Keep them as small as possible.
    At the *very least*, clip them to the frame's boundaries.
    - At least in a vacuum,
    it is more efficient to "bake in" a vector clip by intersecting it with your line's shape and using the resulting shape as a drawing (with no vector clip).
    This will usually get you better visual results anyway, since vector clip edges cannot be blurred while drawing edges can.
    But I say "in a vacuum" here since this can break caching when multiple events are involved, see below.
    In such situations which option will be faster will depend a lot on the specific situation and can probably only be determined with benchmarks,
    but either way, baking in vector clips should be an option to consider.

    Finally, one additional corollary of the above pipeline is that, at least at the time of writing,
    clips are only considered very late in the pipeline.
    Events with large drawings or a lot of glyphs will still get fully rasterized, combined, blurred, and shifted,
    even if they have a very small rectangular clip that would hide the majority of the event (or even cut off some shapes or runs entirely).
    The same holds for the frame borders: Shapes are still fully rasterized, even if they are outside of the video frame.
    However, this could very well change in the future, especially for the frame borders.

    ## Caching
    If you read that last paragraph carefully (and have been paying attention),
    you might be confused now:
    Why exactly is a (horizontal or vertical) clip gradient so fast?
    Sure, the total bitmap size that libass *outputs* is the same as that of a normal line with no gradient,
    but that is only *after* applying the rectangular clips.
    Before applying the clips, there is a very large number of lines,
    each of which rasterizes to a bitmap (or sequence of bitmaps) that itself is the size of the entire line.
    That result in a massive total bitmap size for the rasterization step, right?

    This is exactly where caching comes in.
    Libass will *cache* the results of most of its expensive operations.
    These include (at the time of writing):
    - Looking up a font from a font name (and bold/italic/vertical values)
    - Obtaining a shape from a font's glyph, as well as the glyph's metrics
    - Parsing a drawing into a shape
    - Thickening a shape
    - Rasterizing a shape to a bitmap
    - Creating a "composite" bitmap from all the per-shape bitmaps in a run by (if applicable):
    - combining all the per-shape bitmaps of a run into a single bitmap for fill and outline each,
    - applying blur,
    - subtracting the fill from the outline, and
    - shifting the shadow bitmap.

    Note that this cache is for the *combined* operation of applying these four steps, not for each of the steps individually.

    When libass is about to perform one of these operations, it will first check if it already has the output for the current input in its cache, and use that if so.
    It can do this without copying the pixel data: It can just work with a reference to the same buffer.

    With this, we can now *fully* understand how a clip gradient has the performance that it does:
    A clip gradient consists of a lot of copies of the same line, just with different colors and different one-pixel-wide rectangular clips.
    Neither the color nor the rectangular clips affect the rasterization,
    so after rendering the first such copy, all later copies of the same line can skip the entire rasterizing and composing pipeline and just get the result from the cache.
    The result will be that all of these copies of the same line refer to the *same* composite bitmap, just with different crops and colors.
    As a result, we only have to rasterize *once*, resulting in a total rasterized bitmap size as large as our original line,
    and while there will be very many output bitmaps, each of them is very small so the total output bitmap size is *also* only as large as our original line.
    This is what makes clip gradients almost as fast as a single-colored line.

    So, we can now make our golden rule of total bitmap size more precise:
    *Without any caching*, the cost of rendering a frame of subtitles is mostly proportional to the total bitmap size *before* applying rectangular clips, plus the total bitmap size of all vector clips.
    With *perfect* caching, the cost can be lower, but it will never be less than (proportional to) the total bitmap size *output* by libass, i.e. *after* applying rectangular clips.
    In reality, the cost will be somewhere in between these two values, depending on how many things can be cached.

    Once again it is hard to give *concrete* advice here outside of very specific context,
    but some general consequences of this are the following:
    - "Simplifying" events, in particular drawings, by removing invisible contours can actually be harmful
    when applied per-event in a context where there are many copies of the same event.
    We have already seen that a drawing's "complexity" does not contribute that much to its performance cost,
    so purely "simplifying" a drawing will not improve its performance by much.

    On the contrary, simplifying each drawing individually can break caching:
    Suddenly, there are many slightly different versions of the same shape, rather than a lot of copies of a single shape that can be cached.
    In particular, ASSWipe (an Aegisub automation script that, among other things, *will* "purge invisible contours") can be counterproductive in some cases.

    Simplifying events *can* be a great performance improvement when it cuts down - you guessed it - total bitmap size,
    for example by culling parts that are off-screen, outside of a clip, or behind some other opaque shape.
    Of course, this may still negatively affect clipping, but how exactly these two factors weigh up against each other can only be determined by benchmarks on a case-by-case basis.
    - Similarly, baking a vector clip into a drawing will save the cost of blending the vector clip, but can hurt caching in certain cases
    (like when applied to many copies of the same line with e.g. different transforms or different vector clips).
    Once again, which option is better will depend on the specific situation.
    - In general, when you *can* ensure that multiple variants of the same event (in a performance-critical setting) use the exact same shape (glyph or drawing) and positioning
    without making any big changes to your subtitle file, it may be a good idea to do so.
    - One very important caveat here is that an event's *position* can affect its rasterization (as can other transforms like scaling, rotation, and shearing, but that may be less surprising),
    and hence break the caching of its rasterization (and hence the composite caching too).
    Specifically, the fractional part of a shape's X and Y position (in output units) rounded to multiples of 1/8 affect the shape's rasterization.
    That is, two lines that are identical except that one of them has a `\pos` of `(100,200)` and the other one has a `\pos` of `(300,400)`,
    then assuming that the rendering resolution is equal to the PlayRes they will rasterize to the same bitmap and hence one line can use the cached rasterization of the other.
    In fact, the same will happen if one line has a `\pos` of `(300.05,400)`, since `300.05` rounds to `300` when rounded to multiples of 1/8.
    However, a line with a `\pos` of `(300.5,400)` or even `(300.1,400)` will rasterize to a *different* bitmap and hence not be able to use the other line's cached rasterization.
    Hence, rounding a line's position to integer values can be benefitial for caching, at least when the rendering resolution is a multiple of the PlayRes.
    (And, as a converse, differently positioned copies of the same shape may not always be able to use caching in the way that you expect.)

    In summary, caching can cause massive performance savings in specific cases like clip gradients, but can also easily break.
    I would recommend not worrying too much about making sure to make use of caching in a situation that does not strictly need it.
    What is more important is being aware of the situations that *do* strictly need caching (which are mainly clip gradients)
    and making sure not to break those for any of the reasons listed above.

    Not, also, that all of these points only apply when there are multiple copies of the same shape (on the same frame) involved.
    If a specific shape only appears once (per frame), you don't need to worry about breaking caching in the first place.

    At the time of writing, libass's caches are preserved throughout when rendering a single frame,
    but may be cleaned up between frames when they get too large.
    While libass's caching behavior for a single frame is somewhat predictable,
    I would not recommend relying on libass's caching *across* frames.
    In fact, I do not recommend relying on *any* kind of behavior across frames at all when it comes to performance.
    Instead, ensure that each individual frame renders fast enough, even if it is the first frame that libass will ever render.

    Note that what is *not* currently cached is the process of applying a vector clip (the rasterization *is* cached, but the multiplication is not).
    This could change in the future, though.

    ## Bitmaps are Rectangles
    In the previous sections we have learned the following two facts:

    - Performance mainly depends on total bitmap size
    - The fill and outline of each "run" of a subtitle event is rasterized to a single bitmap (from which the shadow and karaoke effect, if present, are then derived).

    Recall that a "run" was a contiguous sequence of characters that have the same style parameters, or a single drawing.

    A bitmap is always rectangular, so each bitmap has to be chosen large enough to fit its entire run into its rectangle.
    For your run-of-the-mill (heh) horizontal (or even vertical) text, this is not an issue, but it can be a big problem once rotations or drawings get involved!

    For example, take a line like `{\frz45}A very very very very very very very long diagonal line`.
    Since there are no changes in override tags values in the middle of this line, this line will split into a single run.
    The bitmap containing this run will be a fairly large square, but most of its space will be "wasted"!
    That is, the majority of the bitmap will be transparent, and only the diagonal will have opaque pixels.
    Of course, the blurring and blending code will not know this and happily blur and blend all the fully transparent pixels.
    This makes this line much more expensive than it needs to be.

    This can be fixed by forcing the line to be split into multiple runs by changing some tag values in between each character.
    A good tag to use for this is `\2a`, since you rarely actually need to set it.
    The result could look like `{\frz45}A{\2a1} {\2a0}v{\2a1}e{\2a0}r...` (or use `\2a&H00&` and `\2a&H01&` if you want to appease broken third-party tools).
    This way, every individual character will receive its own, much smaller rectangle around it, and the resulting total bitmap size will be much smaller.

    <!-- TODO: Pictures for this -->

    Similarly, consider a drawing like `{\an7\p1}m 0 0 l 100 0 100 100 0 100 m 1000 1000 l 1100 1000 1100 1100 1000 1100`.
    This will draw two 100x100 squares that are 1000 pixels (in PlayRes units) apart from one another.
    Since this is a single drawing, this will hence result in a single 1100x1100 bitmap, even though only two 100x100 squares are actually used.
    Again, this will hence be much more expensive to render than it needs to be.
    Like before, this can be fixed by splitting the two squares into separate runs, and hence separate bitmaps.
    You can do this inserting a `{}` in the middle of the drawing (note that this only works for drawings, not for text!), but this will then shift
    the second run by the width and height of the first one.
    You can compensate for this by shifting the second component in the opposite direction (arriving at `{\an7\p1}m 0 0 l 100 0 100 100 0 100{}m 900 900 l 1000 900 1000 1000 900 1000`),
    but the much simpler method is to just split the drawing into two separate events.
    Third-party tooling will be able to deal with that better, too.

    To give one more example, let's suppose we want to draw a border around the entire frame.
    (That is, for example, draw a 10-pixel-wide strip at each edge of the frame.)
    Even if the resulting drawing would be connected, using a single drawing for this would once again waste a huge amount of bitmap space,
    since the resulting bitmap would be as large as the entire frame.
    Instead, it would be much more performant to split our border into four different drawings, one for each side.

    However, things aren't always as simple.
    Remember that bitmaps are blended onto the image one after the other.
    Hence, if transparency is involved[^antialiasing],
    blending a shape with two components as two separate bitmaps will create a different *visual* result than blending it as a single bitmap
    if the components overlap.
    (And even if the shapes themselves do not overlap, they might overlap after being given an outline and/or being blurred)
    For example, consider a line like `{\an5\fs200\bord30\shad0\3a&H80&}AA`.
    Here, the outlines of the two characters are a single transparent bitmap, so they have a constant opacity throughout.
    However, if we replace the line with `{\an5\fs200\bord30\shad0\3a&H80&}A{\2a1}A`, suddenly the overlap between the two outlines becomes much darker[^darker]!
    This is because the two outlines are now two separate transparent bitmaps, so blending them one after the other makes their overlap darker than the rest.

    [^antialiasing]: And transparency will *always* be involved to some degree since shapes will be anti-aliased when rasterized.
    Even if you only have strictly horizontal or vertical edges at integer PlayRes coordinates, those coordinates can become fractional when the subtitles are rendered at a different resolution.

    [^darker]: Assuming a standard style with white fill and a black outline.

    The upside of this is that this gives us a very easy way to check if a certain line has a run break or not - or more generally which shapes belong to separate bitmaps and which belong to the same one.
    Just make everything half-transparent and, if needed, give it a huge `\bord` so that the outlines overlap, and check if the intersections become more opaque or not.
    (Just make sure that changing the transparency and/or the `\bord` does not add or remove additional run breaks.)

    The downside, however, is that this can cause very undesirable rendering in situations where you want to, or have to use separate bitmaps.
    For example, forcing a run break at every character in our example of a very long line with `{\frz45)` worked well to reduce total bitmap size,
    but if our line has a large semi-transparent outline and/or a large and blurred outline, splitting the line into separate runs will cause the overlaps between outlines to look bad.
    (This can also happen if the line has *no* outline and a strongly blurred fill, but it's usually much less noticeable there.)

    This *can* be worked around in some situations by using rectangular clips[^vectclipsplit] to ensure that only one (output) bitmap covers every point in the line.
    This can be done by making a bunch of duplicates of your event (remember, this will then benefit from caching), and giving each duplicate a small rectangular clip
    such that

    [^vectclipsplit]: I do not recommend using vector clips here, even if they only contain horizontal and vertical lines aligned to PlayRes pixel grid.
    The performance impact aside, these will not look correct when the subtitles are played back on any other resolution than a multiple of the PlayRes.

    a) All of the clips are disjoint.

    b) All of the clips together cover the entire rendered output.

    c) Whenever two shapes overlap, there are two clips, each fully containing one of the shapes but not fully containing the other.

    Then, you can selectively hide individual runs in invidivual events with `{\alpha&HFF&}` (remember to restore all alpha channels to their previous values after the runs)
    so that every point in the visible output is covered by exactly one visible run.

    This works quite similarly to a clip gradient, the only difference being that it selectively toggles visibility instead of making a linear color gradient.
    In particular, it can benefit from the composition cache just like a clip gradient will.

    A toy example for this is are the following lines (on a 1920x1080 PlayRes):
    - `{\pos(960,540)\an5\fnArial\fs200\bord30\shad0\3aH&80&\clip(800,400,960,700)}A{\2a1\alpha&HFF&}A`
    - `{\pos(960,540)\an5\fnArial\fs200\bord30\shad0\3aH&80&\clip(960,400,1200,700)}{\alpha&HFF&}A{\2a1\alpha\3a&H80&}A`

    However, this is very finnicky and no good tooling for this exists at the moment.

    ## Mythbusting
    This finally concludes most things that can and should be said about rendering performance.
    In these last two sections, I will summarize everything I discussed above with some concrete advice,
    and bring up a few miscellaneous points that haven't been mentioned in detail yet.

    Let us start with a "mythbusting" section:
    I have talked at length about which parameters affect performance the most, which in turn means that most of the other parameters do *not* significantly affect performance.
    There exist many popular misconceptions about ASS rendering performance, so let's spend some time to explicitly address them here.

    To clarify: The statements in the following list are *true*, but are written to contradict myths that assert the opposite.

    - libass does not completely supersede VSFilter.
    Any behavior of libass that differs from VSFilter is not guaranteed to stay this way in the future unless libass *explicitly* promises it.
    - Changing a line's parameters over time with `\move`, `\t`, or `\fad` does not affect performance.
    Rendering happens one frame at a time, and once the frame timestamp is known a `\move` is not any different from a `\pos`.
    - Using drawings does not significantly affect performance (if the drawings are a similar size as a run of characters would be given your current styling).
    Both drawings and characters are just converted to shapes internally and then rendered in the exact same way.

    If you have *very long* drawing strings repeated many times across a file, it may be slightly more efficient to create a custom font that has your drawings as glyphs.
    This is simply because fonts use a binary format and can hence store shapes more compactly than drawing strings.
    But, in the vast majority of cases, the added cost of parsing a drawing is negligible compared to the cost of rasterizing, blurring, and so on.
    - The complexity of a drawing (that is, the number of vertices) does not have a big effect on its performance cost, at least not until it reaches *absurd* levels.
    - Having separate events for every single frame does not significantly affect performance.
    - Large `\blur` is not much more expensive than smaller `\blur` (but it *is* more expensive than no `\blur` at all).
    The added cost mainly comes from needing to pad the bitmap more, not from the larger blur strength itself.
    - Rectangular clips are not expensive at all.
    In fact, they are almost free and can even *improve* performance in some cases.
    - Simplifying a drawing by removing invisible contours (as ASSWipe does) may not actually improve performance
    when that drawing is used many times.
    - Hiding a bitmap using `\alpha&HFF&` will not improve performance, at least at the time of writing.
    This may change in the future, so it is better to use `\alpha&HFF&` than not using it,
    but at the moment you should not rely on it improving performance.
    To *really* remove bitmaps, delete or comment the line and/or use `\bord0` and/or `\shad0` to remove outlines and/or shadows.

    In particular, the shad trick is significantly less efficient than using a single fill bitmap.

    ## Summary of Guidelines
    Finally, let us summarize what we have learned in this article with more concrete advice.
    Some of the advice here is simplified for the sake of brevity; read the above sections for the finer details.

    - You may target exclusively libass for performance, but you should test your subtitle's rendering on both libass *and* VSFilter and make sure they agree.
    - Make sure that your subtitles also render correctly (and, ideally, perform well) when the display resolution is not a multiple of your PlayRes.
    - The single biggest factor when it comes to rendering performance is **total bitmap size**.
    If you remember one thing from this article, it should be this.
    - Avoid large bitmaps that are mostly empty, like long diagonal text or sparse drawings.
    Split up your lines into multiple runs if you can.
    - Avoid extreme perspective transformations (like high `\fry` on a large LayoutRes); these can make an event's bounding box much larger than it looks.
    Consider converting your event to a drawing and baking in the transformations.
    - Avoid blurring very large bitmaps if you can.
    Avoid using fractional `\shad` on very large bitmaps if you can - use either an integer value like `\shad1` or a value like `\shad0.001` that is small enough to round to 0 in units of 1/64th of an output pixel.
    - Do not use `\be`.
    - Avoid using large vector clips.
    Consider baking vector clips into drawings when feasible.
    - Avoid using diagonal or radial clip gradients when possible.
    Using a small number of strongly blurred shapes for gradients may be more efficient.
    - In situations that strongly rely on caching, in particular clip gradients, take care not to break the caching
    by changing parameters other than colors (so in particular also position) between copies of the line.
    - In particular, take great care when using tools like ASSWipe to simplify drawings; they might break caching.
    - Do not use the shad trick unless you are sure you need it.
    - Do not rely on shapes that are outside of the frame borders or outside of your line's rectangular clip being removed and not having a performance cost.
    If you have a very wide line horizontally scrolling across the frame, break it up into multiple sections and delete the invisible sections at each point in time.
    - Prefer `\bord0`, `\shad0`, and commenting or deleting lines or sections to using `\alpha&HFF&` for hiding lines or sections when possible, unless you believe it interferes with caching.
    - Do not rely on caching across frames.
    Ensure that each individual frame renders fast enough, even if it is the first frame libass ever renders.
    - If you have a very long drawing that is repeated many times, consider encoding it as a font.
    This should not be the *first* tool you reach for when improving performance, though.
    - When in doubt about which method has better performance in a specific case, run benchmarks.