jaredpar/plan.md

BlobBuilder Allocation Strategy Plan

Problem Statement

Roslyn's VBCSCompiler process suffers from significant byte[] allocations driven by System.Reflection.Metadata APIs, primarily through BlobBuilder. The goal is to enable a pooling strategy where Roslyn pre-allocates ~4MB of byte[] and PE file writing completes with zero (or near-zero) additional allocations.

BlobBuilder Architecture Deep Dive

What BlobBuilder Is

BlobBuilder is like StringBuilder but for bytes. You call WriteInt32(), WriteBytes(), WriteUTF8(), etc. and it accumulates the data. Internally it manages a linked list of byte[] chunks. Key source: BlobBuilder.cs.

Single-Chunk State (Simple Case)

When you create a new BlobBuilder(capacity: 256), you get a single object with a single byte[]:

[head] ← _nextOrPrevious points to itself
  _buffer = byte[256]
  _length = 0  (no data written yet, high bit clear = "is head")

The head is the only writable node. You write data, _length increases. Simple.

Multi-Chunk Growth (Expand)

When data exceeds the buffer capacity, Expand() is called. This is where it gets interesting.

Say the head has 256 bytes of data in a 256-byte buffer, and we need to write more:

BEFORE: [head: 256/256 bytes used]

Expand() does:
1. AllocateChunk() → creates a NEW BlobBuilder with a new byte[]
2. The head's CURRENT buffer (with 256 bytes of data) is moved to the new chunk
3. The new chunk's EMPTY buffer is moved to the head
4. The new chunk becomes frozen (interior node), head stays writable

AFTER:
  [chunk1:frozen] → [head]
     ^________________|
  chunk1._buffer = the ORIGINAL 256-byte buffer (has data)
  head._buffer   = the NEW buffer from the allocated chunk (empty, ready for writes)

Critical detail: The head always keeps the newest/emptiest buffer. The old buffer with data gets pushed into a frozen interior chunk. The byte[] arrays are swapped between the head and the newly allocated chunk — the head object identity stays the same, but it now holds a different byte[].

The Frozen Bit

Each node's _length field uses the high bit as a "frozen" flag:

IsHead = high bit is 0 → writable
Frozen = high bit is 1 → read-only interior chunk

Only ONE node in a chain is the head (writable). All others are frozen. This is enforced at runtime.

The Chain Structure: Head Is at the End

The linked list is structured so that the head (the chunk you hold a reference to and write into) is always the last chunk in logical order. The earlier chunks are frozen interior nodes.

From the code comment (BlobBuilder.cs:26-33):
  [1:first]->[2]->[3:last]<-[4:head]
      ^_______________|

  Content order: 1, 2, 3, 4
  The head (4) is LAST in content order, but it's the node the caller holds.

The _nextOrPrevious field serves double duty:

On the head: points to the last frozen chunk (backward pointer)
On frozen chunks: points to the next chunk in forward order
The first chunk is found via head._nextOrPrevious._nextOrPrevious (last chunk's forward link wraps around to first)

When you call GetChunks() or WriteContentTo(), the enumerator navigates to FirstChunk and follows forward links through all frozen chunks, then yields the head last. So the caller holds the tail of the content, and reading the full content requires traversing from the beginning.

This is key: you write at the end (head), but you read from the beginning (first chunk). The head must always remain the last logical position, which is why buffer swaps are necessary during LinkSuffix — the suffix data belongs at the end, so its buffer must end up in the head.

Linking: Composing Two Chains

LinkSuffix(suffix) merges the suffix chain into the current chain. The logical content becomes: [this's content][suffix's content].

Let's walk through a concrete code example:

var builderA = new BlobBuilder(256);
var builderB = new BlobBuilder(256);

builderA.WriteBytes(new byte[] { 0xAA, 0xAA, 0xAA });  // "AAA"
builderB.WriteBytes(new byte[] { 0xBB, 0xBB, 0xBB });  // "BBB"

// Link B as suffix of A: result content = AAA + BBB
builderA.LinkSuffix(builderB);

// After this call:
// - builderA is still the head (writable), content = AAABBB
// - builderB is frozen (read-only interior chunk)
// - builderA._buffer now holds bufferB (BB,BB,BB) ← SWAPPED
// - builderB._buffer now holds bufferA (AA,AA,AA) ← SWAPPED

Here's what happens inside LinkSuffix(builderB). We need the result to represent [AA,AA,AA, BB,BB,BB] with builderA remaining the head (writable end):

BEFORE builderA.LinkSuffix(builderB):
  builderA._buffer = bufferA  (contains AA,AA,AA)  _length = 3
  builderB._buffer = bufferB  (contains BB,BB,BB)  _length = 3

The SWAP (BlobBuilder.cs:466-472):
  builderB._buffer = bufferA  ← builderB gets builderA's old buffer
  builderB._length = frozen   ← builderB becomes a frozen interior chunk
  builderA._buffer = bufferB  ← builderA gets builderB's buffer  
  builderA._length = 3        ← builderA adopts builderB's data length

AFTER LinkSuffix:
  [builderB:frozen] → [builderA:head]
       ^___________________|
       
  builderB._buffer = bufferA  (contains AA,AA,AA)  frozen, length 3
  builderA._buffer = bufferB  (contains BB,BB,BB)  head, length 3

  Enumeration order: builderB(AA,AA,AA) then builderA(BB,BB,BB) = AAABBB correct!
  builderA.Count = 6 (3 previous + 3 current)

Key insight: builderA still has the same object identity (same C# reference), but its _buffer field now points to bufferB — the buffer that was originally allocated by builderB. The original bufferA is now trapped inside the frozen builderB node.

The pooling problem is now clear: If builderA came from a pool with bufferA pre-allocated, returning it to the pool gives back an object holding bufferB. The pool has lost track of bufferA (frozen inside builderB). The pool's byte[] have scattered across frozen nodes it doesn't own.

Why This Design?

The design optimizes for:

Append writes — always write to the head, which is at the end
Zero-copy composition — linking is O(1), just pointer manipulation + buffer swap
Sequential enumeration — GetChunks() iterates from first to last in logical order
Single-pass stream write — WriteContentTo(Stream) iterates chunks and writes each, no intermediate allocation needed

The alternative (copying data during composition) was avoided because PE sections can be large (IL streams, metadata heaps) and copying would be O(n).

Root Causes (in context of the above)

Unpooled instantiation: new BlobBuilder() is called directly throughout the library (ManagedPEBuilder, MetadataBuilder heaps, etc.) with no factory/pool indirection.
Buffer swapping breaks pool invariants: As described above, LinkSuffix/Expand/Clear all swap byte[] between BlobBuilder instances. A pool can't guarantee the byte[] stored is the one originally allocated.
No allocation context: AllocateChunk(int minimalSize) receives only a size — not what the buffer is for. A pool can't differentiate between a 256-byte chunk for a relocation section vs. a 256-byte chunk that will grow to 500KB for the #Strings heap.

Current State

PR #115294 (open, not merged) adds: protected BlobBuilder(byte[]) constructor, Func<int, BlobBuilder> factory on MetadataBuilder, CreateBlobBuilder virtual on ManagedPEBuilder, OnLinking virtual notification, Buffer property for swapping. This addresses Problem 1 partially and gives hooks for Problem 2, but doesn't solve the fundamental swap issue.
PooledBlobBuilder exists internally — pools 128 instances with 1024-byte buffers. Used only for temporary encoding in GetOrAddBlobUTF8/GetOrAddBlobUTF16/GetOrAddDocumentName. Not used for PE writing.
HeapBlobBuilder (private in MetadataBuilder.Heaps.cs) has custom AllocateChunk that respects _capacityExpansion, but still creates new byte[] on every chunk.

Architecture: How PE Serialization Flows

Caller creates:
  MetadataBuilder         → accumulates tables, heaps (#Strings, #US, #Blob, #GUID)
  MetadataRootBuilder     → wraps MetadataBuilder, computes MetadataSizes
  ManagedPEBuilder        → orchestrates PE sections

PEBuilder.Serialize(BlobBuilder mainBuilder)
  │
  ├── SerializeSections()
  │     └── for each section:
  │           SerializeSection(name, location) → returns NEW BlobBuilder
  │             ├── SerializeTextSection():
  │             │     sectionBuilder = new BlobBuilder()        ← ALLOC
  │             │     metadataBuilder = new BlobBuilder()       ← ALLOC
  │             │     debugTableBuilder = new BlobBuilder(size) ← ALLOC (conditional)
  │             │     MetadataRootBuilder.Serialize(metadataBuilder, ...)
  │             │       ├── header, tables written directly
  │             │       └── WriteHeapsTo(metadataBuilder, stringHeap)
  │             │             ├── LinkSuffix(stringHeap)     ← BUFFER SWAP
  │             │             ├── LinkSuffix(userStringBuilder) ← BUFFER SWAP
  │             │             ├── LinkSuffix(guidBuilder)    ← BUFFER SWAP
  │             │             └── ReserveBytes(blobHeapSize) ← may EXPAND
  │             │     textSection.Serialize(sectionBuilder, ..., metadataBuilder, ...)
  │             │       └── LinkSuffix(metadataBuilder)      ← BUFFER SWAP
  │             ├── SerializeResourceSection():
  │             │     sectionBuilder = new BlobBuilder()        ← ALLOC
  │             └── SerializeRelocationSection():
  │                   sectionBuilder = new BlobBuilder()        ← ALLOC
  │
  └── for each serialized section:
        mainBuilder.LinkSuffix(section.Builder)              ← BUFFER SWAP
        mainBuilder.Align(fileAlignment)

Key observation: Sizes are known BEFORE serialization (via MetadataSizes). The two-phase approach (compute sizes → serialize) is already built into the architecture.

Complete BlobBuilder Allocation Inventory

Every BlobBuilder creation in the library, categorized by sizing strategy:

#	File:Line	Code	Initial Capacity	Size Category	Sizing Strategy
Top-level section builders (ManagedPEBuilder)
1	ManagedPEBuilder.cs:123	`new BlobBuilder()` — text section builder	256 (default)	Exact at serialize time	`ManagedTextSection.ComputeSizeOfTextSection()`
2	ManagedPEBuilder.cs:124	`new BlobBuilder()` — metadata builder	256 (default)	Exact at serialize time	`MetadataSizes.MetadataSize`
3	ManagedPEBuilder.cs:147	`new BlobBuilder(TableSize)` — debug table	`TableSize` (exact)	Exact at serialize time	Already correctly sized
4	ManagedPEBuilder.cs:189	`new BlobBuilder()` — resource section builder	256 (default)	Exact at serialize time	`_nativeResourcesOpt.Count` (known)
5	ManagedPEBuilder.cs:198	`new BlobBuilder()` — relocation section builder	256 (default)	Exact at serialize time	12 or 14 bytes (tiny, always fits)
Metadata heap builders (MetadataBuilder)
6	MetadataBuilder.Heaps.cs:36	`new HeapBlobBuilder(4096)` — #US heap	4096	Hint-based	`HeapIndex.UserString` — pool uses high-water mark from previous compilations
7	MetadataBuilder.Heaps.cs:51	`new HeapBlobBuilder(16)` — #GUID heap	16	Hint-based	`HeapIndex.Guid` — almost always 16 bytes (1 GUID)
8	MetadataBuilder.cs:14	`new HeapBlobBuilder(_stringHeapCapacity)` — #String heap	4096	Hint-based	`HeapIndex.String` — pool uses high-water mark
Other builders
9	DebugDirectoryBuilder.cs:26	`new BlobBuilder()` — debug data builder	256 (default)	Hint-based	Small in practice. Pool uses previous size or fixed bucket.
Chunk growth (AllocateChunk — called from Expand)
10	BlobBuilder.cs:66-68	`AllocateChunk(minimalSize)` — default impl	`max(buffer.Length, minimalSize)`	Expansion	Subclass overrides to use pool. Size = whatever is needed.
11	MetadataBuilder.Heaps.cs:22-24	`HeapBlobBuilder.AllocateChunk`	`max(minimalSize, ChunkCapacity, _capacityExpansion)`	Expansion	Heap-specific growth. Pool intercepts via factory.
12	PooledBlobBuilder.cs:25-32	`PooledBlobBuilder.AllocateChunk`	`ChunkSize` (1024) or `minimalSize`	Already pooled	Uses internal ObjectPool (128 instances × 1024 bytes)
Temporary builders (already pooled internally)
13	MetadataBuilder.Heaps.cs:282	`PooledBlobBuilder.GetInstance()` — UTF-16 encoding	1024	Already pooled	Used/freed within single method call
14	MetadataBuilder.Heaps.cs:300	`PooledBlobBuilder.GetInstance()` — UTF-8 encoding	1024	Already pooled	Used/freed within single method call
15	MetadataBuilder.Heaps.cs:325	`PooledBlobBuilder.GetInstance()` — document name	1024	Already pooled	Used/freed within single method call
16	MetadataBuilder.Heaps.cs:328	`PooledBlobBuilder.GetInstance()` — document name parts	1024	Already pooled	Used/freed within single method call
Caller-provided (outside this library)
17	(Roslyn)	IL stream (`_ilStream`)	Varies	Caller-managed	Roslyn knows the IL size; can pre-allocate.
18	(Roslyn)	Managed resources (`_managedResourcesOpt`)	Varies	Caller-managed	Roslyn knows resource size.
19	(Roslyn)	Mapped field data (`_mappedFieldDataOpt`)	Varies	Caller-managed	Roslyn knows field data size.
20	(Roslyn)	Main output builder (passed to `PEBuilder.Serialize`)	Varies	Exact (computable)	Sum of all sections + headers + alignment.

Size categories summary:

Exact at serialize time (1–5): Sizes computed from MetadataSizes, ManagedTextSection, etc. Factory gets exact minimumSize.
Hint-based (6–9): Created before sizes are known. Factory gets HeapIndex or similar hint; pool uses historical high-water marks.
Expansion (10–11): Chunk growth during writes. Subclass overrides AllocateChunk to use pool. If initial size is right (from hint), these rarely fire.
Already pooled (12–16): Internal PooledBlobBuilder — no changes needed.
Caller-managed (17–20): Roslyn controls these directly. Can pre-size using its own knowledge.

Typical Section Sizes (Real Data)

Measured from real IL-only assemblies (not R2R):

Microsoft.CodeAnalysis.CSharp.dll (5.8MB — representative large Roslyn assembly):

Component	Size	% of File	% of Metadata
.text section	5,617,664	92.1%	—
Metadata total	3,169,212	52.0%	100%
— #~ (tables)	1,988,604	—	62.7%
— #Strings	586,300	—	18.5%
— #US	163,672	—	5.2%
— #Blob	430,512	—	13.6%
— #GUID	16	—	~0%
IL (estimated)	2,448,352	40.1%	—
.rsrc section	470,528	7.7%	—
.reloc section	512	~0%	—

Size variability across assemblies (50KB → 6MB):

Component	Range	Variability	Sizing Strategy
#~ (tables)	10KB – 1.9MB	High — proportional to types/members	Historical high-water mark per project
#Strings	10KB – 573KB	High — proportional to identifiers	Historical high-water mark
#Blob	6KB – 420KB	High — proportional to signatures/constants	Historical high-water mark
IL stream	20KB – 2.8MB	High — proportional to code volume	Roslyn knows exact size at emit time
#US (user strings)	2KB – 160KB	Medium — proportional to string literals	Historical high-water mark
#GUID	16 bytes	Constant — always 1 GUID	Fixed: 16 bytes
.reloc	12–14 bytes	Constant (512 with padding)	Fixed: 512 bytes
PE headers	~512 bytes	Constant	Fixed: 512 bytes
Debug directory	28–56 bytes	Nearly constant (1–2 entries)	Fixed: 64 bytes

Key insight for pooling: The highly variable components (#~, #Strings, #Blob, #US, IL) need historical sizing. But within a single Roslyn compilation server process, the same project tends to produce similarly-sized outputs across incremental builds. So after the first compilation, the pool's high-water marks converge quickly. The constant components (GUID, reloc, headers, debug) can use fixed-size pool buckets.

Strategy Analysis

Strategy A: Complete the Factory Pattern (PR #115294 direction)

Approach: Every new BlobBuilder() goes through a virtual/delegate factory. Callers override to return pooled instances.

What PR #115294 does:

ManagedPEBuilder.CreateBlobBuilder(int minimumSize) virtual
MetadataBuilder(Func<int, BlobBuilder> createBlobBuilderFunc) factory delegate
BlobBuilder.OnLinking(BlobBuilder other) notification

Remaining gaps:

AllocateChunk still creates new byte[] during growth (Expand). Even with the factory, each chunk expansion allocates.
LinkSuffix buffer swaps still break the pool invariant: after linking, the pooled builder holds a different byte[] than the one it was created with. The OnLinking notification tells you it happened but doesn't prevent it.
No way to pre-size sections accurately since the factory only gets minimalSize.

Verdict: Necessary but insufficient. Good foundation that other strategies can build on.

Strategy B: Subclass with Metadata Swapping (Pool-Aware BlobBuilder)

Approach: Since BlobBuilder is unsealed, create a subclass that carries pool metadata alongside the byte[] buffer. During link operations and expansion, the metadata swaps along with the buffer so that every node always has a consistent {buffer, metadata} pair. When FreeChunk() is called, the metadata tells the pool exactly which buffer this is and where to return it.

How it works:

class PooledBlobBuilder : BlobBuilder
{
    // Metadata that describes THIS buffer's pool identity
    internal int BucketIndex;      // which size bucket this buffer belongs to
    internal IBufferPool OwnerPool; // the pool that owns this buffer
    
    protected override BlobBuilder AllocateChunk(int minimalSize)
    {
        // Rent from pool, create new PooledBlobBuilder with correct metadata
        var (buffer, bucketIndex) = OwnerPool.Rent(minimalSize);
        return new PooledBlobBuilder(buffer) { BucketIndex = bucketIndex, OwnerPool = OwnerPool };
    }
    
    protected override void FreeChunk()
    {
        // Return THIS buffer to the correct bucket, using the metadata
        OwnerPool.Return(_buffer, BucketIndex);
    }
}

During LinkSuffix, when the base class swaps _buffer between this and suffix, a hook (OnLinking from PR #115294 or a new virtual) also swaps the metadata fields:

protected override void OnLinking(BlobBuilder other)
{
    if (other is PooledBlobBuilder pooled)
    {
        // Swap metadata to stay consistent with swapped buffers
        (BucketIndex, pooled.BucketIndex) = (pooled.BucketIndex, BucketIndex);
        (OwnerPool, pooled.OwnerPool) = (pooled.OwnerPool, pooled.OwnerPool);
    }
}

Similarly in Expand(): AllocateChunk creates a new PooledBlobBuilder with correct metadata. When the base class swaps buffers between the head and the new chunk, both are PooledBlobBuilder instances, so the metadata can be swapped via the same hook.

Key requirement: All BlobBuilder instances in the chain must be the same subclass type. This is naturally achieved when AllocateChunk always returns PooledBlobBuilder, and the factory pattern (Strategy A) ensures all top-level builders are PooledBlobBuilder too.

Where hooks are needed (places that swap _buffer):

LinkSuffix (BlobBuilder.cs:466-472) — swaps buffers between this and suffix
Expand (BlobBuilder.cs:543-547, 567-571) — swaps buffers between head and new chunk
Clear (BlobBuilder.cs:89-92) — swaps buffer with first chunk

Trade-offs:

✅ Preserves O(1) zero-copy linking — no behavioral change
✅ Pool can correctly reclaim all buffers after use
✅ Works with existing architecture (subclass, virtual overrides)
✅ No copying overhead
⚠️ Requires hooks at all swap sites — PR #115294's OnLinking covers LinkSuffix, but Expand and Clear need similar hooks
⚠️ All builders in the chain must be the same type — mixing pooled and non-pooled builders would lose metadata
⚠️ Subclass fields add per-instance overhead (2 references per chunk)

Verdict: This is the right approach. Preserves O(1) linking while making pooling correct. Needs hooks at all three swap sites (LinkSuffix, Expand, Clear).

Strategy C: Pre-sized Allocation (Complementary)

Approach: When total output size is known, pre-allocate builders with the right capacity to avoid chunk growth entirely.

How it works:

MetadataSizes already computes exact sizes for each metadata stream
ManagedTextSection can compute the exact text section size
The factory/pool can pre-size builders to the expected capacity per section
If builders never grow, there are no expansion allocations and fewer swaps

Combined with Strategy B: The pool maintains per-section size estimates. After the first compilation, it knows the #Strings heap is ~200KB, the IL stream is ~500KB, etc. Subsequent compilations get correctly-sized pooled builders. Growth (Expand) rarely happens, and when it does, the metadata swapping ensures correctness.

Verdict: Important optimization layer on top of Strategy B. Reduces the number of chunks and swaps.

Strategy D: ArrayPool-backed Buffers (Implementation Detail)

Approach: The pool implementation uses ArrayPool<byte> (or a custom pool) for the actual byte[] buffers.

The subclass from Strategy B would use ArrayPool<byte>.Shared.Rent(size) in AllocateChunk and ArrayPool<byte>.Shared.Return(buffer) in FreeChunk(). This avoids new byte[] allocations entirely.

Verdict: Natural implementation choice for Strategy B's pool.

Recommended Approach: Strategy A + B + C (Phased)

Phase 1: Factory Pattern + Swap Hooks (Foundation)

Complete the factory pattern (PR #115294) and add swap hooks at all three sites.

Changes needed in BlobBuilder:

Merge PR #115294's factory infrastructure — CreateBlobBuilder virtual, OnLinking hook, factory delegate on MetadataBuilder.
Add swap hooks at all buffer swap sites:

Currently buffers are swapped by directly assigning _buffer fields. We need a virtual notification so subclasses can swap their metadata in sync.

The three swap sites:
- LinkSuffix (line 466-472): Already has OnLinking from PR #115294
- Expand (line 543-547, 567-571): Needs a new hook — could be a virtual OnChunkSwapped(BlobBuilder other) or extend AllocateChunk to return the chunk with metadata already set up (since the subclass controls AllocateChunk, it creates the new chunk with correct metadata, and after the swap, the metadata just needs to be swapped too)
- Clear (line 89-92): Needs a hook — but Clear's swap is with the first chunk (which was created by AllocateChunk), so if all chunks are the same subclass type, a virtual notification works
Ensure all new BlobBuilder() sites go through factory — the 5 sites in ManagedPEBuilder (lines 123, 124, 147, 189, 198) plus MetadataBuilder heaps.

Phase 2: Pool Implementation (Roslyn-Side or Runtime-Side)

Build the actual PooledBlobBuilder subclass with:

ArrayPool<byte>-backed buffers
Per-buffer metadata (bucket index, pool reference)
Metadata swapping in OnLinking / OnChunkSwapped hooks
FreeChunk() that returns buffers to the correct pool bucket

Phase 3: Size Estimation (Warm Pool)

After the first compilation, the pool learns typical section sizes:

#Strings heap: ~X KB
#US heap: ~Y KB
IL stream: ~Z KB
Total PE: ~W KB

Subsequent compilations get pre-sized builders from the pool, eliminating most chunk growth.

Phase 4: Per-Section Context (Optional)

Add context to the factory so the pool knows WHAT it's allocating for:

enum BlobBuilderPurpose { TextSection, MetadataStream, ILStream, ... }
CreateBlobBuilder(int minimumSize, BlobBuilderPurpose purpose)

This lets the pool maintain separate size estimates per purpose.

Impact Analysis

Allocation Site	Phase 1 (Hooks)	Phase 2 (Pool)	Phase 3 (Pre-sizing)
`sectionBuilder = new BlobBuilder()` (×3)	Factory-created	Pooled + ArrayPool buffers	Pre-sized, no growth
`metadataBuilder = new BlobBuilder()`	Factory-created	Pooled + ArrayPool buffers	Pre-sized, no growth
`debugTableBuilder = new BlobBuilder(size)`	Factory-created	Pooled + ArrayPool buffers	Already pre-sized
LinkSuffix buffer swaps	Metadata swaps correctly	Pool reclaims all buffers	Fewer swaps (pre-sized)
HeapBlobBuilder chunk growth	Factory-created chunks	Pooled chunks	Pre-sized, no growth
Main builder expansion	Pooled chunks via AllocateChunk	Pooled chunks	Pre-sized, no growth
BlobBuilder objects themselves	Still allocated	Could be object-pooled	Object-pooled

After Phase 2 + 3: A caller providing correctly sized pooled builders with pre-warmed size estimates achieves zero byte[] allocations during PE serialization.

Key Considerations

API compatibility: All changes must be backward compatible. Existing callers who don't use pooling should see identical behavior. Default BlobBuilder behavior is unchanged (hooks are no-ops).
Mixed types: If a non-pooled BlobBuilder is linked with a PooledBlobBuilder, the metadata swap in OnLinking would need to handle the type mismatch (e.g., no-op if other is not PooledBlobBuilder). This means the non-pooled builder's buffer won't be returned to the pool, which is correct — it wasn't rented from the pool.
Heap data: The #Blob heap uses ReserveBytes + BlobWriter (WriteAlignedBlobHeap) — no LinkSuffix. The #Strings, #US, and #GUID heaps use LinkSuffix and will benefit from swap hooks.
Clear() semantics: When Clear() is called on a pooled builder, all frozen chunks in the chain call FreeChunk(), returning their buffers. The head swaps buffer with the first chunk first (which is fine — the metadata swap hook keeps it consistent), then each frozen chunk is freed with correct metadata.

Open Questions

Should the pool implementation live in System.Reflection.Metadata (as a public type) or in Roslyn (as a consumer)?
For Phase 4 (per-section context), the HeapIndex enum already exists for metadata heaps (UserString, String, Blob, Guid). PE sections are currently just string constants (".text", ".rsrc", ".reloc"). The factory delegate could accept a HeapIndex? for heap allocations and the section name string for section allocations — or a new enum could unify both. Worth considering whether the existing HeapIndex is sufficient context, or whether a broader enum is needed.