Skip to content

Instantly share code, notes, and snippets.

@westonruter
Last active April 29, 2026 00:43
Show Gist options
  • Select an option

  • Save westonruter/5ae4155059d2135197976f324d00645c to your computer and use it in GitHub Desktop.

Select an option

Save westonruter/5ae4155059d2135197976f324d00645c to your computer and use it in GitHub Desktop.

Prompt:

ok, so I've used SPX to obtain profiling for the homepage in WP 6.9 and WP 7.0. For 6.9, I'm seeing the report at http://localhost:8869/?SPX_UI_URI=/report.html&key=spx-full-20260428_175015-a4ba424a004a-56-1804289383 and for 7.0 I'm seeing it at http://localhost:8870/?SPX_UI_URI=/report.html&key=spx-full-20260428_175011-83e23fa31a20-70-1714636915
I understand that the data is available in the /tmp/spx directory in the containers for each environment. I want you to analyze them to determine the primary reasons why WP 7.0 is slower than WP 6.9

Claude Opus 4.7 response:

Big-picture numbers:

Metric wp69 wp70 Δ
wall time 104,310 133,820 +28.3%
call count 416,045 439,406 +5.6%
unique functions 2,438 2,545 +4.4%
peak memory 6.55 MB 6.80 MB +3.9%

So wp70 is ~28% slower with only ~5% more calls — meaning the per-call cost has grown, not just the call volume. Need to dig into the trace files. Let me extract them to the host so I can analyze locally.

Headline

wp70's homepage render is +68% slower at the trace level (719ms vs 428ms inclusive at the root). Note that SPX profiler overhead is amplifying absolute numbers (~4× the actual wall time), but the relative rankings are reliable.

Where all the regression lives: block rendering

Look at the inclusive-time table — the regression is concentrated almost entirely in one call path:

Function wp69 wp70 Δ
WP_Block::render (×934 calls in both) 428ms 719ms +291ms
render_block (×635) 154ms 253ms +100ms
do_blocks (×18) 153ms 249ms +96ms
render_block_core_post_template (×5) 66ms 113ms +47ms
render_block_core_pattern (×4) 56ms 89ms +33ms
render_block_core_post_content (×11) 41ms 70ms +30ms

The block count (934) is identical in both versions, so wp70 isn't rendering more blocks — it's spending more time rendering each one.

Why each block is slower: new per-block work in 7.0

The "new functions in wp70" table is the smoking gun. These didn't exist in 6.9 and now run on essentially every block:

New function Calls Notes
wp_render_custom_css_class_name 934 Once per block render
wp_render_custom_css_support_styles 891 Once per block (most blocks)
block_core_gallery_render_context 891 Runs even on non-gallery blocks
block_core_paragraph_add_class 217 Per paragraph block
wp_apply_anchor_support 169 Per block with supports
wp_register_anchor_support 110 Lazy registration
wp_mark_auto_generate_control_attributes 110 Block supports machinery

This is consistent with what's landed in 7.0: a new "custom CSS" / anchor / supports pipeline that hooks into every WP_Block::render. Each block now does extra tag-processor work to inject classes/attributes.

Secondary effect: HTML Tag Processor pressure

That new per-block CSS/anchor work is implemented via WP_HTML_Tag_Processor, and it shows in the call-count delta:

Function wp69 calls wp70 calls Δ
WP_HTML_Tag_Processor::class_name_updates_to_attributes_updates 11,135 13,144 +2,009
WP_HTML_Tag_Processor::after_tag 10,656 12,388 +1,732
WP_HTML_Tag_Processor::parse_next_attribute 7,025 8,305 +1,280
WP_HTML_Tag_Processor::base_class_next_token 5,389 6,255 +866
block_has_support 10,710 11,775 +1,065

These are individually cheap, but in aggregate the tag processor accounts for ~2ms of pure exclusive-time regression and several ms inclusive — it's the substrate the new block features run on.

Tertiary: WP_Query is ~2× slower per call

Same number of queries (55), but inclusive time per WP_Query::get_posts roughly doubled (12ms → 22ms total). wpdb::_do_query exclusive time also +33% (9ms → 12ms) at the same call count. So queries are doing more work per call — possibly more filter passes (WP_Hook::apply_filters is +375 calls, +46ms inclusive). Worth a separate dig but it's the smaller story compared to block rendering.

Verdict

~80–90% of the wp70 regression is in the block-rendering path, driven by a new set of per-block hooks in 7.0 — wp_render_custom_css_class_name, wp_render_custom_css_support_styles, anchor support registration, and the gallery render-context hook — all of which run for every block (~900 invocations), and all of which lean on WP_HTML_Tag_Processor.

The two concrete things worth investigating first:

  1. wp_render_custom_css_support_styles / wp_render_custom_css_class_name — these run unconditionally on every block. Are they no-ops for blocks without custom CSS? If so, the early-return is too late (after the work is already done).
  2. block_core_gallery_render_context firing 891 times when there are presumably few/no gallery blocks on the homepage — looks like it's registered as a global filter rather than scoped to gallery blocks.

Want me to dig into either of those (find where they're registered, check what they do early-return-wise)?

ok, I greatly increased the number of blocks being rendered on the pages for the two test sites and I obtained new profiles for each:
http://localhost:8869/?SPX_UI_URI=/report.html&key=spx-full-20260428_191520-a4ba424a004a-55-1957747793
http://localhost:8870/?SPX_UI_URI=/report.html&key=spx-full-20260428_191527-83e23fa31a20-56-1957747793
Please analyze these as well.

The picture broadly confirms round 1 but with some interesting refinements.

What changed between the two runs

wp69 round 1 wp69 round 2 wp70 round 1 wp70 round 2
WP_Block::render calls 934 1,590 934 1,590
Wall (JSON) µs 104,310 109,871 133,820 132,736
Trace inclusive ms 428 470 719 593
wp70 vs wp69 (JSON) +28% +21%
wp70 vs wp69 (trace) +68% +26%

Two things jump out:

  1. wp69 grew by only ~42ms inclusive despite 70% more blocks → most of the blocks added were cheap (probably paragraphs).
  2. The relative regression shrank with more blocks (28% → 21%, or 68% → 26% by trace). That's the opposite of what you'd expect if the regression were purely per-block — meaning a meaningful chunk of wp70's overhead is per-request fixed cost that gets amortized as the page gets heavier.

What's confirmed from round 1

The same call path dominates the regression — WP_Block::render → render_block → do_blocks still owns +123ms inclusive (essentially the entire +123ms total). Same new per-block hooks light up:

New in 7.0 calls excl µs
wp_render_custom_css_class_name 1,590 (every block) 203
wp_render_custom_css_support_styles 1,501 (95%) 80
block_core_gallery_render_context 1,501 (fires globally, not just on galleries) 26
block_core_paragraph_add_class 313 (per paragraph) 80 excl / 1,353 inclusive
wp_get_view_transitions_admin_css 1 214

block_core_paragraph_add_class is interesting — its exclusive cost is tiny but inclusive is 17× larger, meaning it's pulling the tag processor hard for what looks like a small amount of work per paragraph.

What's stronger in round 2

The hook system is now a bigger relative slice:

Function Δ inclusive calls Δ
apply_filters +20ms -1,034 (yet slower)
WP_Hook::apply_filters +22ms +471
do_action +3.9ms +1
WP_Hook::do_action +4.3ms 0

So wp70 is firing more filters per block render and each hook call is slightly heavier (more handlers attached, presumably from the new block-supports machinery).

HTML Tag Processor pressure also scales linearly with block count, as expected — parse_next_attribute got +1,757 additional calls in wp70 (vs +1,280 in round 1), contributing +1ms exclusive, +2.1ms via base_class_next_token.

What's weaker in round 2

  • WP_Query regression collapsed: round 1 showed +10ms inclusive over 55 queries (~180µs/query slower); round 2 shows only +1.7ms over 71 queries (~24µs/query). So either round 1 was noise or the queries in round 2 are simpler. Probably the former — I'd discount the WP_Query story I called out last time.
  • Core-boot regressions (wp-load.php, wp-settings.php) +2.8ms show up here but didn't last time — small fixed cost from new code being included in wp70 startup. Worth noting but not actionable.
  • wp_interactivity_process_directives +1.4ms — Interactivity API doing more directive processing on the rendered HTML in wp70.

Updated verdict

The story is essentially the same: block rendering is the regression, and the cause is the new per-block hooks in 7.0 (custom CSS class/styles, anchor support, gallery render-context) plus the tag-processor work they trigger. Most actionable targets:

  1. block_core_gallery_render_context running 1,501 times when there's likely 0–2 galleries on the page — looks like a global filter that should be scoped.
  2. block_core_paragraph_add_class — 17× inclusive vs exclusive ratio — does a lot of tag-processor work per paragraph for a small payoff. Worth checking whether it can early-return when there's no class to add.
  3. wp_render_custom_css_support_styles running on 95% of blocks — does it early-return for blocks without custom CSS? If yes, the early-return is happening too late (since it still takes 80µs exclusive in aggregate).

Drop the WP_Query angle — the round 2 data shows it was likely noise.

ok, now I've obtain reports for just the Hello World post to just show the minimum number of blocks. Here are the reports:
http://localhost:8869/?SPX_UI_URI=/report.html&key=spx-full-20260428_192308-a4ba424a004a-70-35005211
http://localhost:8870/?SPX_UI_URI=/report.html&key=spx-full-20260428_192314-83e23fa31a20-67-1102520059
Analyze them as well.

The minimal page (124 blocks) gives the cleanest view yet — it isolates the per-request fixed cost from per-block work. Here's the three-round summary:

Three-round view

Page Blocks wp69 ms (JSON) wp70 ms (JSON) Δ ms Δ %
Hello World 124 46 54 +8 +16%
Homepage v1 934 104 134 +30 +28%
Homepage v2 1,590 110 133 +23 +21%

The 8ms baseline regression on a near-minimal page is the fixed per-request overhead wp70 has added regardless of block count.

What the small page exposes that the big pages hid

These regressions barely registered on the homepage runs but are now in the top of the inclusive list:

Function wp70 - wp69 inclusive calls Δ Notes
WP_Theme_JSON::remove_keys_not_in_schema +1.9ms +10 Schema validation getting expensive
wp_enqueue_global_styles +1.0ms 0 Same calls, more work per call
WP_Theme_JSON::resolve_custom_css_format +1.2ms +10 New code path?
WP_Theme_JSON::__construct +0.84ms +2 Heavier theme.json instantiation
WP_Theme_JSON_Resolver::get_merged_data +0.85ms +1 Merge step in theme.json pipeline
wp_get_global_styles +0.77ms +1 (new in 7.0) Once per request
wp_get_global_stylesheet +0.77ms 0 Same call, more work
wp_get_view_transitions_admin_css +0.16ms +1 (new) New view-transitions feature
wp-load.php / wp-config.php / wp-settings.php +0.78ms each 0 Heavier core boot

So the new fixed-cost story is mostly theme.json / global styles processing: at least ~4ms of regression coming from the theme.json pipeline being heavier in 7.0 (more schema work, merging, custom-CSS resolution). This was completely buried under block-rendering work in the homepage profiles.

Decomposing the regression

Rough model from the three points:

  • Per-request fixed cost: ~5ms (theme.json reprocessing, global styles, view transitions CSS, slightly heavier core load)
  • Per-block variable cost: ~15–30µs depending on block type (custom CSS class injection, anchor support registration, gallery render-context, tag-processor invocations)

That's why Hello World shows +16% (fixed cost dominates) but homepages show +21–28% (per-block work piles on top).

Hook overhead is now clearly visible

Even on 124 blocks, WP_Hook::apply_filters fires 148 more times in wp70 with +6.9ms inclusive. That's ~47µs of overhead per extra hook call. On the homepage it was +471 calls / +22ms (same ratio). So this isn't profiler noise — wp70 has measurably more hook handlers attached, and they get fired on every block.

WP_Hook::do_action also +4.9ms inclusive on 24 actions, same pattern.

What I'd dig into next

In priority order:

  1. WP_Theme_JSON::remove_keys_not_in_schema +1.9ms — schema-validating theme.json for ~2,500 entries on every request. Is this cached? Looks like it may not be in wp70.
  2. WP_Theme_JSON::resolve_custom_css_format +1.2ms (new-ish code path) — this is per-request work from the new custom-CSS pipeline. Profile its callers.
  3. wp_get_global_styles (new) +0.77ms — one call per request. Could it be lazy-loaded only if a block actually consumes it?
  4. Core boot +2.3ms across wp-load/config/settings — likely from new files being included unconditionally. A git diff on wp-settings.php between 6.9 and trunk would show what's new.
  5. The per-block items from rounds 1–2 still apply but are now revealed to be a smaller share of the total than the theme.json / global styles fixed cost.

The picture is clearer now: WP 7.0 has both a per-request "tax" (mostly theme.json reprocessing and global styles) AND a per-block "tax" (new CSS/anchor/style hooks). The per-request tax is what makes even tiny pages 16% slower; the per-block tax is what makes homepages worse.

You can run wp-env status for each environment to ontain the install path for each. The WordPress directory in each install path is where files are for each site. How about if you look at the functions you've identified as being regressions in 7.0 and compare them with 6.9 to see if there are areas for improvement

Here's the consolidated picture. The agent's findings line up well with what the profiles said and surface concrete improvement leads.

Top regressions and where to fix them

1. WP_Theme_JSON_Resolver::get_merged_data — no result cache (biggest leverage) Same body in both versions, but called 7× per request and runs 4 full theme.json merges each time. The piece-getters (get_core_data, get_theme_data, etc.) cache themselves; the composition doesn't. Adding a static cache keyed on origin would eliminate ~6 of the 7 merges per request and indirectly cut wp_get_global_styles cost.

2. wp_get_global_styles — re-fetched per-block in layout supports Trunk added a new caller at block-supports/layout.php:887-911 (wp_render_layout_support_flag):

if ( null === $global_styles ) { $global_styles = wp_get_global_styles(); }

The variable is per-invocation, so on a page with N layout-supporting blocks it runs N times. Making it static (or pulling the per-block-name slice from already-resolved $global_settings['blocks'][...]) would collapse N calls to 1.

3. WP_Theme_JSON::remove_keys_not_in_schema — schema grew, walked per call Function gained a is_bool() branch and the VALID_SETTINGS/VALID_STYLES constants gained new keys (dimensions.height/width, typography.textIndent, lightbox flags). Same call count, more work per call. Best fix: pre-flatten the schema into a path-set at class load and walk iteratively instead of recursively, plus memoize since inputs frequently repeat.

4. resolve_custom_css_format — body identical, walks bigger tree Regression is downstream of (3) — no fix needed in this function, but a strpos() fast-path that skips the recursion when no var: strings are present would help, or fold it into the same descent as remove_keys_not_in_schema.

The new per-block hooks (root cause of homepage regression)

The block_has_support() lookup is the recurring cost — fired per block render for both customCSS and anchor supports, even though only a handful of block types opt in. Pre-computing a flat set of opt-in block names at registration and doing isset($set[$blockName]) first would short-circuit the vast majority of calls.

block_core_paragraph_add_class is the biggest absolute offender per-call: it spins up a full WP_HTML_Tag_Processor for every paragraph just to add wp-block-paragraph to the <p> tag (17× inclusive/exclusive ratio). A targeted preg_replace or moving the class into the editor's save-time serialization would essentially zero out this cost.

block_core_gallery_render_context fires on every block render (1,501 times on the homepage) just to check if ('core/gallery' === $block['blockName']). Folding the galleryId generation into block_core_gallery_render itself would remove the per-block filter dispatch entirely.

Suggested order of attack (biggest ROI first)

  1. Static cache WP_Theme_JSON_Resolver::get_merged_data — likely 1–2ms per request, very safe.
  2. Static $global_styles in wp_render_layout_support_flag — scales with block count; a one-line change.
  3. Pre-flatten the customCSS / anchor block-type opt-in sets — eliminates per-block block_has_support() calls on every render.
  4. Replace block_core_paragraph_add_class tag-processor with regex (or move to save-time) — biggest per-paragraph win.
  5. Schema-lookup pre-flattening in remove_keys_not_in_schema — bigger refactor but addresses the largest theme.json hotspot.
  6. Drop block_core_gallery_render_context filter, inline into render callback — small but easy.

Items 1, 2, and 3 are the lowest-risk, highest-yield changes — all are pure caching/short-circuit additions with no behavioral change.

Want me to dig into any one of these — confirm the diff against trunk, prototype a patch, or check whether there's already a Trac ticket open?

Twenty Twenty-Five

With Twenty Twenty-Five active and the Theme Unit Test data imported

100 Iterations

URL http://wp69perftest.local/ http://wp70perftest.local/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
Response Time (median) 62.79 69.38 +6.59 +10.5%
wp-before-template (median) 9.01 9.5 +0.49 +5.4%
wp-template (median) 52.13 57.77 +5.64 +10.8%
wp-total (median) 61.21 67.41 +6.20 +10.1%

500 iterations

URL http://wp69perftest.local/ http://wp70perftest.local/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
Response Time (median) 64.91 72.23 +7.32 +11.3%
wp-before-template (median) 9.29 9.95 +0.66 +7.1%
wp-template (median) 53.92 60.6 +6.68 +12.4%
wp-total (median) 63.36 70.69 +7.33 +11.6%

/template-featured-image-horizontal/

Fast 4G, 20 requests

research
./cli/run.mjs benchmark-web-vitals --url=http://wp69perftest.local/template-featured-image-horizontal/ --url=http://wp70perftest.local/template-featured-image-horizontal/ --output=md --network-conditions=Fast 4G --diff --number=20

URL http://wp69perftest.local/template-featured-image-horizontal/ http://wp70perftest.local/template-featured-image-horizontal/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
FCP (median) 228.6 233.2 +4.60 +2.0%
LCP (median) 421.35 408.35 -13.00 -3.1%
TTFB (median) 44.35 49.1 +4.75 +10.7%
TTLB (median) 198.5 199.35 +0.85 +0.4%
LCP-TTFB (median) 377.1 362.5 -14.60 -3.9%

No throttling, 30 requests

research
./cli/run.mjs benchmark-web-vitals --url=http://wp69perftest.local/template-featured-image-horizontal/ --url=http://wp70perftest.local/template-featured-image-horizontal/ --output=md --diff --number=30

URL http://wp69perftest.local/template-featured-image-horizontal/ http://wp70perftest.local/template-featured-image-horizontal/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
FCP (median) 95.3 102.9 +7.60 +8.0%
LCP (median) 95.3 102.9 +7.60 +8.0%
TTFB (median) 45.35 50.65 +5.30 +11.7%
TTLB (median) 46.05 51.2 +5.15 +11.2%
LCP-TTFB (median) 49.05 51.9 +2.85 +5.8%

Fast 4G, 50 requests

research
./cli/run.mjs benchmark-web-vitals --url=http://wp69perftest.local/template-featured-image-horizontal/ --url=http://wp70perftest.local/template-featured-image-horizontal/ --output=md --network-conditions=Fast 4G --diff --number=50

URL http://wp69perftest.local/template-featured-image-horizontal/ http://wp70perftest.local/template-featured-image-horizontal/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
FCP (median) 235.6 234.1 -1.50 -0.6%
LCP (median) 421.55 412.25 -9.30 -2.2%
TTFB (median) 45.6 50.35 +4.75 +10.4%
TTLB (median) 199.7 200.55 +0.85 +0.4%
LCP-TTFB (median) 374.25 362.05 -12.20 -3.3%

Moto 4G, Fast 3G, 25 requests

research
./cli/run.mjs benchmark-web-vitals --url=http://wp69perftest.local/template-featured-image-horizontal/ --url=http://wp70perftest.local/template-featured-image-horizontal/ --output=md --network-conditions=Fast 4G --emulate-device=Moto G4 --diff --number=25

URL http://wp69perftest.local/template-featured-image-horizontal/ http://wp70perftest.local/template-featured-image-horizontal/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
FCP (median) 237.4 237.9 +0.50 +0.2%
LCP (median) 409.9 426 +16.10 +3.9%
TTFB (median) 47.5 51.7 +4.20 +8.8%
TTLB (median) 200.7 203.3 +2.60 +1.3%
LCP-TTFB (median) 360.9 372.3 +11.40 +3.2%

Moto 4G, Fast 3G, 50 requests

research
./cli/run.mjs benchmark-web-vitals --url=http://wp69perftest.local/template-featured-image-horizontal/ --url=http://wp70perftest.local/template-featured-image-horizontal/ --output=md --network-conditions=Fast 4G --emulate-device=Moto G4 --diff --number=50

URL http://wp69perftest.local/template-featured-image-horizontal/ http://wp70perftest.local/template-featured-image-horizontal/ Diff (ms) Diff (%)
Success Rate 100% 100% -- --
FCP (median) 231.65 235.4 +3.75 +1.6%
LCP (median) 416.55 419.55 +3.00 +0.7%
TTFB (median) 47.75 53.3 +5.55 +11.6%
TTLB (median) 202.75 202.7 -0.05 -0.0%
LCP-TTFB (median) 366.55 365.85 -0.70 -0.2%

Slow 3G, Moto 4G, 50 requests

research
./cli/run.mjs benchmark-web-vitals --url=http://wp69perftest.local/template-featured-image-horizontal/ --url=http://wp70perftest.local/template-featured-image-horizontal/ --output=md --network-conditions=Slow 3G --emulate-device=Moto G4 --diff --number=50

URL http://wp69perftest.local/template-featured-image-horizontal/ http://wp70perftest.local/template-featured-image-horizontal/ Diff (ms) Diff (%)
Success Rate 98% 100% -- --
FCP (median) 2247.2 2281.1 +33.90 +1.5%
LCP (median) 4762.4 4762.45 +0.05 +0.0%
TTFB (median) 51.2 56.6 +5.40 +10.5%
TTLB (median) 2310.6 2343.2 +32.60 +1.4%
LCP-TTFB (median) 4710.5 4705.45 -5.05 -0.1%
@szepeviktor
Copy link
Copy Markdown

@westonruter Finally, an SPX user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment