Prompt:
ok, so I've used SPX to obtain profiling for the homepage in WP 6.9 and WP 7.0. For 6.9, I'm seeing the report at
http://localhost:8869/?SPX_UI_URI=/report.html&key=spx-full-20260428_175015-a4ba424a004a-56-1804289383and for 7.0 I'm seeing it athttp://localhost:8870/?SPX_UI_URI=/report.html&key=spx-full-20260428_175011-83e23fa31a20-70-1714636915
I understand that the data is available in the /tmp/spx directory in the containers for each environment. I want you to analyze them to determine the primary reasons why WP 7.0 is slower than WP 6.9
Claude Opus 4.7 response:
Big-picture numbers:
| Metric | wp69 | wp70 | Δ |
|---|---|---|---|
| wall time | 104,310 | 133,820 | +28.3% |
| call count | 416,045 | 439,406 | +5.6% |
| unique functions | 2,438 | 2,545 | +4.4% |
| peak memory | 6.55 MB | 6.80 MB | +3.9% |
So wp70 is ~28% slower with only ~5% more calls — meaning the per-call cost has grown, not just the call volume. Need to dig into the trace files. Let me extract them to the host so I can analyze locally.
wp70's homepage render is +68% slower at the trace level (719ms vs 428ms inclusive at the root). Note that SPX profiler overhead is amplifying absolute numbers (~4× the actual wall time), but the relative rankings are reliable.
Look at the inclusive-time table — the regression is concentrated almost entirely in one call path:
| Function | wp69 | wp70 | Δ |
|---|---|---|---|
WP_Block::render (×934 calls in both) |
428ms | 719ms | +291ms |
render_block (×635) |
154ms | 253ms | +100ms |
do_blocks (×18) |
153ms | 249ms | +96ms |
render_block_core_post_template (×5) |
66ms | 113ms | +47ms |
render_block_core_pattern (×4) |
56ms | 89ms | +33ms |
render_block_core_post_content (×11) |
41ms | 70ms | +30ms |
The block count (934) is identical in both versions, so wp70 isn't rendering more blocks — it's spending more time rendering each one.
The "new functions in wp70" table is the smoking gun. These didn't exist in 6.9 and now run on essentially every block:
| New function | Calls | Notes |
|---|---|---|
wp_render_custom_css_class_name |
934 | Once per block render |
wp_render_custom_css_support_styles |
891 | Once per block (most blocks) |
block_core_gallery_render_context |
891 | Runs even on non-gallery blocks |
block_core_paragraph_add_class |
217 | Per paragraph block |
wp_apply_anchor_support |
169 | Per block with supports |
wp_register_anchor_support |
110 | Lazy registration |
wp_mark_auto_generate_control_attributes |
110 | Block supports machinery |
This is consistent with what's landed in 7.0: a new "custom CSS" / anchor / supports pipeline that hooks into every WP_Block::render. Each block now does extra tag-processor work to inject classes/attributes.
That new per-block CSS/anchor work is implemented via WP_HTML_Tag_Processor, and it shows in the call-count delta:
| Function | wp69 calls | wp70 calls | Δ |
|---|---|---|---|
WP_HTML_Tag_Processor::class_name_updates_to_attributes_updates |
11,135 | 13,144 | +2,009 |
WP_HTML_Tag_Processor::after_tag |
10,656 | 12,388 | +1,732 |
WP_HTML_Tag_Processor::parse_next_attribute |
7,025 | 8,305 | +1,280 |
WP_HTML_Tag_Processor::base_class_next_token |
5,389 | 6,255 | +866 |
block_has_support |
10,710 | 11,775 | +1,065 |
These are individually cheap, but in aggregate the tag processor accounts for ~2ms of pure exclusive-time regression and several ms inclusive — it's the substrate the new block features run on.
Same number of queries (55), but inclusive time per WP_Query::get_posts roughly doubled (12ms → 22ms total). wpdb::_do_query exclusive time also +33% (9ms → 12ms) at the same call count. So queries are doing more work per call — possibly more filter passes (WP_Hook::apply_filters is +375 calls, +46ms inclusive). Worth a separate dig but it's the smaller story compared to block rendering.
~80–90% of the wp70 regression is in the block-rendering path, driven by a new set of per-block hooks in 7.0 — wp_render_custom_css_class_name, wp_render_custom_css_support_styles, anchor support registration, and the gallery render-context hook — all of which run for every block (~900 invocations), and all of which lean on WP_HTML_Tag_Processor.
The two concrete things worth investigating first:
wp_render_custom_css_support_styles/wp_render_custom_css_class_name— these run unconditionally on every block. Are they no-ops for blocks without custom CSS? If so, the early-return is too late (after the work is already done).block_core_gallery_render_contextfiring 891 times when there are presumably few/no gallery blocks on the homepage — looks like it's registered as a global filter rather than scoped to gallery blocks.
Want me to dig into either of those (find where they're registered, check what they do early-return-wise)?
ok, I greatly increased the number of blocks being rendered on the pages for the two test sites and I obtained new profiles for each:
http://localhost:8869/?SPX_UI_URI=/report.html&key=spx-full-20260428_191520-a4ba424a004a-55-1957747793
http://localhost:8870/?SPX_UI_URI=/report.html&key=spx-full-20260428_191527-83e23fa31a20-56-1957747793
Please analyze these as well.
The picture broadly confirms round 1 but with some interesting refinements.
| wp69 round 1 | wp69 round 2 | wp70 round 1 | wp70 round 2 | |
|---|---|---|---|---|
WP_Block::render calls |
934 | 1,590 | 934 | 1,590 |
| Wall (JSON) µs | 104,310 | 109,871 | 133,820 | 132,736 |
| Trace inclusive ms | 428 | 470 | 719 | 593 |
| wp70 vs wp69 (JSON) | +28% | +21% | ||
| wp70 vs wp69 (trace) | +68% | +26% |
Two things jump out:
- wp69 grew by only ~42ms inclusive despite 70% more blocks → most of the blocks added were cheap (probably paragraphs).
- The relative regression shrank with more blocks (28% → 21%, or 68% → 26% by trace). That's the opposite of what you'd expect if the regression were purely per-block — meaning a meaningful chunk of wp70's overhead is per-request fixed cost that gets amortized as the page gets heavier.
The same call path dominates the regression — WP_Block::render → render_block → do_blocks still owns +123ms inclusive (essentially the entire +123ms total). Same new per-block hooks light up:
| New in 7.0 | calls | excl µs |
|---|---|---|
wp_render_custom_css_class_name |
1,590 (every block) | 203 |
wp_render_custom_css_support_styles |
1,501 (95%) | 80 |
block_core_gallery_render_context |
1,501 (fires globally, not just on galleries) | 26 |
block_core_paragraph_add_class |
313 (per paragraph) | 80 excl / 1,353 inclusive |
wp_get_view_transitions_admin_css |
1 | 214 |
block_core_paragraph_add_class is interesting — its exclusive cost is tiny but inclusive is 17× larger, meaning it's pulling the tag processor hard for what looks like a small amount of work per paragraph.
The hook system is now a bigger relative slice:
| Function | Δ inclusive | calls Δ |
|---|---|---|
apply_filters |
+20ms | -1,034 (yet slower) |
WP_Hook::apply_filters |
+22ms | +471 |
do_action |
+3.9ms | +1 |
WP_Hook::do_action |
+4.3ms | 0 |
So wp70 is firing more filters per block render and each hook call is slightly heavier (more handlers attached, presumably from the new block-supports machinery).
HTML Tag Processor pressure also scales linearly with block count, as expected — parse_next_attribute got +1,757 additional calls in wp70 (vs +1,280 in round 1), contributing +1ms exclusive, +2.1ms via base_class_next_token.
WP_Queryregression collapsed: round 1 showed +10ms inclusive over 55 queries (~180µs/query slower); round 2 shows only +1.7ms over 71 queries (~24µs/query). So either round 1 was noise or the queries in round 2 are simpler. Probably the former — I'd discount the WP_Query story I called out last time.- Core-boot regressions (
wp-load.php,wp-settings.php) +2.8ms show up here but didn't last time — small fixed cost from new code being included in wp70 startup. Worth noting but not actionable. wp_interactivity_process_directives+1.4ms — Interactivity API doing more directive processing on the rendered HTML in wp70.
The story is essentially the same: block rendering is the regression, and the cause is the new per-block hooks in 7.0 (custom CSS class/styles, anchor support, gallery render-context) plus the tag-processor work they trigger. Most actionable targets:
block_core_gallery_render_contextrunning 1,501 times when there's likely 0–2 galleries on the page — looks like a global filter that should be scoped.block_core_paragraph_add_class— 17× inclusive vs exclusive ratio — does a lot of tag-processor work per paragraph for a small payoff. Worth checking whether it can early-return when there's no class to add.wp_render_custom_css_support_stylesrunning on 95% of blocks — does it early-return for blocks without custom CSS? If yes, the early-return is happening too late (since it still takes 80µs exclusive in aggregate).
Drop the WP_Query angle — the round 2 data shows it was likely noise.
ok, now I've obtain reports for just the Hello World post to just show the minimum number of blocks. Here are the reports:
http://localhost:8869/?SPX_UI_URI=/report.html&key=spx-full-20260428_192308-a4ba424a004a-70-35005211
http://localhost:8870/?SPX_UI_URI=/report.html&key=spx-full-20260428_192314-83e23fa31a20-67-1102520059
Analyze them as well.
The minimal page (124 blocks) gives the cleanest view yet — it isolates the per-request fixed cost from per-block work. Here's the three-round summary:
| Page | Blocks | wp69 ms (JSON) | wp70 ms (JSON) | Δ ms | Δ % |
|---|---|---|---|---|---|
| Hello World | 124 | 46 | 54 | +8 | +16% |
| Homepage v1 | 934 | 104 | 134 | +30 | +28% |
| Homepage v2 | 1,590 | 110 | 133 | +23 | +21% |
The 8ms baseline regression on a near-minimal page is the fixed per-request overhead wp70 has added regardless of block count.
These regressions barely registered on the homepage runs but are now in the top of the inclusive list:
| Function | wp70 - wp69 inclusive | calls Δ | Notes |
|---|---|---|---|
WP_Theme_JSON::remove_keys_not_in_schema |
+1.9ms | +10 | Schema validation getting expensive |
wp_enqueue_global_styles |
+1.0ms | 0 | Same calls, more work per call |
WP_Theme_JSON::resolve_custom_css_format |
+1.2ms | +10 | New code path? |
WP_Theme_JSON::__construct |
+0.84ms | +2 | Heavier theme.json instantiation |
WP_Theme_JSON_Resolver::get_merged_data |
+0.85ms | +1 | Merge step in theme.json pipeline |
wp_get_global_styles |
+0.77ms | +1 (new in 7.0) | Once per request |
wp_get_global_stylesheet |
+0.77ms | 0 | Same call, more work |
wp_get_view_transitions_admin_css |
+0.16ms | +1 (new) | New view-transitions feature |
wp-load.php / wp-config.php / wp-settings.php |
+0.78ms each | 0 | Heavier core boot |
So the new fixed-cost story is mostly theme.json / global styles processing: at least ~4ms of regression coming from the theme.json pipeline being heavier in 7.0 (more schema work, merging, custom-CSS resolution). This was completely buried under block-rendering work in the homepage profiles.
Rough model from the three points:
- Per-request fixed cost: ~5ms (theme.json reprocessing, global styles, view transitions CSS, slightly heavier core load)
- Per-block variable cost: ~15–30µs depending on block type (custom CSS class injection, anchor support registration, gallery render-context, tag-processor invocations)
That's why Hello World shows +16% (fixed cost dominates) but homepages show +21–28% (per-block work piles on top).
Even on 124 blocks, WP_Hook::apply_filters fires 148 more times in wp70 with +6.9ms inclusive. That's ~47µs of overhead per extra hook call. On the homepage it was +471 calls / +22ms (same ratio). So this isn't profiler noise — wp70 has measurably more hook handlers attached, and they get fired on every block.
WP_Hook::do_action also +4.9ms inclusive on 24 actions, same pattern.
In priority order:
WP_Theme_JSON::remove_keys_not_in_schema+1.9ms — schema-validating theme.json for ~2,500 entries on every request. Is this cached? Looks like it may not be in wp70.WP_Theme_JSON::resolve_custom_css_format+1.2ms (new-ish code path) — this is per-request work from the new custom-CSS pipeline. Profile its callers.wp_get_global_styles(new) +0.77ms — one call per request. Could it be lazy-loaded only if a block actually consumes it?- Core boot +2.3ms across wp-load/config/settings — likely from new files being included unconditionally. A
git diffonwp-settings.phpbetween 6.9 and trunk would show what's new. - The per-block items from rounds 1–2 still apply but are now revealed to be a smaller share of the total than the theme.json / global styles fixed cost.
The picture is clearer now: WP 7.0 has both a per-request "tax" (mostly theme.json reprocessing and global styles) AND a per-block "tax" (new CSS/anchor/style hooks). The per-request tax is what makes even tiny pages 16% slower; the per-block tax is what makes homepages worse.
You can run wp-env status for each environment to ontain the install path for each. The WordPress directory in each install path is where files are for each site. How about if you look at the functions you've identified as being regressions in 7.0 and compare them with 6.9 to see if there are areas for improvement
Here's the consolidated picture. The agent's findings line up well with what the profiles said and surface concrete improvement leads.
1. WP_Theme_JSON_Resolver::get_merged_data — no result cache (biggest leverage)
Same body in both versions, but called 7× per request and runs 4 full theme.json merges each time. The piece-getters (get_core_data, get_theme_data, etc.) cache themselves; the composition doesn't. Adding a static cache keyed on origin would eliminate ~6 of the 7 merges per request and indirectly cut wp_get_global_styles cost.
2. wp_get_global_styles — re-fetched per-block in layout supports
Trunk added a new caller at block-supports/layout.php:887-911 (wp_render_layout_support_flag):
if ( null === $global_styles ) { $global_styles = wp_get_global_styles(); }The variable is per-invocation, so on a page with N layout-supporting blocks it runs N times. Making it static (or pulling the per-block-name slice from already-resolved $global_settings['blocks'][...]) would collapse N calls to 1.
3. WP_Theme_JSON::remove_keys_not_in_schema — schema grew, walked per call
Function gained a is_bool() branch and the VALID_SETTINGS/VALID_STYLES constants gained new keys (dimensions.height/width, typography.textIndent, lightbox flags). Same call count, more work per call. Best fix: pre-flatten the schema into a path-set at class load and walk iteratively instead of recursively, plus memoize since inputs frequently repeat.
4. resolve_custom_css_format — body identical, walks bigger tree
Regression is downstream of (3) — no fix needed in this function, but a strpos() fast-path that skips the recursion when no var: strings are present would help, or fold it into the same descent as remove_keys_not_in_schema.
The block_has_support() lookup is the recurring cost — fired per block render for both customCSS and anchor supports, even though only a handful of block types opt in. Pre-computing a flat set of opt-in block names at registration and doing isset($set[$blockName]) first would short-circuit the vast majority of calls.
block_core_paragraph_add_class is the biggest absolute offender per-call: it spins up a full WP_HTML_Tag_Processor for every paragraph just to add wp-block-paragraph to the <p> tag (17× inclusive/exclusive ratio). A targeted preg_replace or moving the class into the editor's save-time serialization would essentially zero out this cost.
block_core_gallery_render_context fires on every block render (1,501 times on the homepage) just to check if ('core/gallery' === $block['blockName']). Folding the galleryId generation into block_core_gallery_render itself would remove the per-block filter dispatch entirely.
- Static cache
WP_Theme_JSON_Resolver::get_merged_data— likely 1–2ms per request, very safe. - Static
$global_stylesinwp_render_layout_support_flag— scales with block count; a one-line change. - Pre-flatten the customCSS / anchor block-type opt-in sets — eliminates per-block
block_has_support()calls on every render. - Replace
block_core_paragraph_add_classtag-processor with regex (or move to save-time) — biggest per-paragraph win. - Schema-lookup pre-flattening in
remove_keys_not_in_schema— bigger refactor but addresses the largest theme.json hotspot. - Drop
block_core_gallery_render_contextfilter, inline into render callback — small but easy.
Items 1, 2, and 3 are the lowest-risk, highest-yield changes — all are pure caching/short-circuit additions with no behavioral change.
Want me to dig into any one of these — confirm the diff against trunk, prototype a patch, or check whether there's already a Trac ticket open?
@westonruter Finally, an SPX user.