Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save stbenjam/ad89fbd480e6a5f5ca8b70aa21237593 to your computer and use it in GitHub Desktop.

Select an option

Save stbenjam/ad89fbd480e6a5f5ca8b70aa21237593 to your computer and use it in GitHub Desktop.
Payload Analysis: 4.14.0-0.nightly-2026-05-10-173114 - Multus CNI JoinHostPort bug (CVE-2025-47912)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Payload Analysis: 4.14.0-0.nightly-2026-05-10-173114</title>
<style>
:root {
--red: #dc3545;
--green: #28a745;
--yellow: #ffc107;
--orange: #fd7e14;
--blue: #007bff;
--gray: #6c757d;
--light-gray: #f8f9fa;
--dark: #212529;
--border: #dee2e6;
}
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif; line-height: 1.6; color: var(--dark); max-width: 1200px; margin: 0 auto; padding: 20px; background: #fff; }
h1 { font-size: 1.8rem; margin-bottom: 8px; }
h2 { font-size: 1.4rem; margin: 24px 0 12px; border-bottom: 2px solid var(--border); padding-bottom: 6px; }
h3 { font-size: 1.1rem; margin: 16px 0 8px; }
a { color: var(--blue); text-decoration: none; }
a:hover { text-decoration: underline; }
.badge { display: inline-block; padding: 2px 10px; border-radius: 12px; font-size: 0.85rem; font-weight: 600; color: #fff; }
.badge-red { background: var(--red); }
.badge-green { background: var(--green); }
.badge-yellow { background: var(--yellow); color: var(--dark); }
.badge-orange { background: var(--orange); }
.badge-gray { background: var(--gray); }
.executive-summary { background: #fef3f3; border-left: 4px solid var(--red); padding: 16px 20px; margin: 16px 0; border-radius: 0 8px 8px 0; }
.executive-summary h2 { border: none; margin-top: 0; }
.summary-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 12px; margin: 16px 0; }
.summary-card { background: var(--light-gray); border-radius: 8px; padding: 16px; text-align: center; }
.summary-card .label { font-size: 0.8rem; color: var(--gray); text-transform: uppercase; letter-spacing: 0.5px; }
.summary-card .value { font-size: 1.8rem; font-weight: 700; margin-top: 4px; }
.summary-card .value.red { color: var(--red); }
.summary-card .value.green { color: var(--green); }
.summary-card .value.orange { color: var(--orange); }
table { width: 100%; border-collapse: collapse; margin: 12px 0; }
th, td { padding: 10px 14px; text-align: left; border-bottom: 1px solid var(--border); }
th { background: var(--light-gray); font-weight: 600; font-size: 0.85rem; text-transform: uppercase; letter-spacing: 0.3px; }
tr:hover { background: #f1f3f5; }
.status-pass { color: var(--green); font-weight: 600; }
.status-fail { color: var(--red); font-weight: 600; }
.status-pending { color: var(--yellow); font-weight: 600; }
details { margin: 12px 0; border: 1px solid var(--border); border-radius: 8px; overflow: hidden; }
summary { padding: 12px 16px; background: var(--light-gray); cursor: pointer; font-weight: 600; }
summary:hover { background: #e9ecef; }
.detail-content { padding: 16px; }
.error-box { background: #fff5f5; border: 1px solid #fecaca; border-radius: 6px; padding: 12px 16px; margin: 8px 0; font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace; font-size: 0.85rem; white-space: pre-wrap; word-break: break-all; }
.timeline { margin: 12px 0; padding-left: 20px; border-left: 3px solid var(--border); }
.timeline-item { margin: 8px 0; padding-left: 12px; position: relative; }
.timeline-item::before { content: ''; width: 10px; height: 10px; border-radius: 50%; background: var(--gray); position: absolute; left: -27px; top: 6px; }
.timeline-item.error::before { background: var(--red); }
.timeline-item .time { font-family: monospace; color: var(--gray); font-size: 0.85rem; }
.streak-bar { display: flex; gap: 4px; margin: 8px 0; }
.streak-block { width: 28px; height: 28px; border-radius: 4px; display: flex; align-items: center; justify-content: center; font-size: 0.65rem; color: #fff; font-weight: 700; }
.streak-block.fail { background: var(--red); }
.streak-block.pass { background: var(--green); }
.streak-block.force { background: var(--orange); }
.streak-block.pending { background: var(--yellow); color: var(--dark); }
.streak-block.target { outline: 3px solid var(--dark); outline-offset: 1px; }
.no-revert-box { background: #f0fdf4; border: 1px solid #bbf7d0; border-radius: 8px; padding: 16px 20px; margin: 16px 0; }
.alert-list { list-style: none; padding: 0; }
.alert-list li { padding: 6px 0; border-bottom: 1px solid #f0f0f0; }
.alert-list li:last-child { border-bottom: none; }
.alert-name { font-weight: 600; color: var(--red); }
.alert-duration { color: var(--gray); font-size: 0.85rem; }
.meta-info { color: var(--gray); font-size: 0.9rem; margin-bottom: 20px; }
.footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid var(--border); color: var(--gray); font-size: 0.8rem; }
</style>
</head>
<body>
<h1>Payload Analysis: 4.14.0-0.nightly-2026-05-10-173114</h1>
<p class="meta-info">
<span class="badge badge-red">Rejected</span>&ensp;
Architecture: <strong>amd64</strong> &middot;
Stream: <strong>nightly</strong> &middot;
Version: <strong>4.14</strong> &middot;
Generated: 2026-05-12 &middot;
<a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-10-173114" target="_blank">Release Controller</a>
</p>
<div class="executive-summary">
<h2>Executive Summary</h2>
<p>
This payload was <strong>rejected</strong> due to the persistent failure of a single blocking job:
<strong>gcp-ovn-rt-upgrade-4.14-minor</strong> (4.13&rarr;4.14 GCP OVN RT kernel upgrade).
The job failed all 4 attempts (3 retries). All other 9 blocking jobs succeeded.
</p>
<p style="margin-top:8px;">
<strong>Root cause:</strong> A latent bug in the <strong>Multus CNI thin entrypoint</strong>
(<code>cmd/thin_entrypoint/main.go</code>) unconditionally wraps the API server hostname in square brackets
(<code>fmt.Sprintf("%s://[%s]:%s", ...)</code>), which is only valid for IPv6 addresses.
This was exposed when the build toolchain picked up <strong>Go 1.24.8+</strong>, which tightened
<code>url.Parse()</code> to reject brackets around non-IPv6 addresses as a security hardening for
<strong>CVE-2025-47912</strong>. The result: Multus cannot initialize its Kubernetes client, DNS pods
cannot be recycled, the DNS operator degrades, and the upgrade times out.
</p>
<p style="margin-top:8px;">
<strong>This is a long-running permafail (~27 days).</strong> The job has been failing since
<a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-04-16-012435" target="_blank">4.14.0-0.nightly-2026-04-16-012435</a>
(April 16). The last accepted payload (<a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-07-213817" target="_blank">4.14.0-0.nightly-2026-05-07-213817</a>)
was <strong>force-accepted</strong> despite this same failure. It has been <strong>~99 hours</strong> since the last accepted payload.
</p>
<p style="margin-top:8px;">
<strong>Fixes already merged but not yet in payloads:</strong>
<a href="https://github.com/openshift/multus-cni/pull/287" target="_blank">openshift/multus-cni#287</a> (OCPBUGS-85253, merged May 7) and
<a href="https://github.com/openshift/cluster-network-operator/pull/2996" target="_blank">openshift/cluster-network-operator#2996</a> (OCPBUGS-84184, merged May 6)
replace the bracket-wrapping with <code>net.JoinHostPort()</code>.
The fixed images have not yet been built into a payload &mdash; upcoming nightlies should pick them up.
</p>
</div>
<div class="summary-grid">
<div class="summary-card">
<div class="label">Blocking Jobs</div>
<div class="value">10</div>
</div>
<div class="summary-card">
<div class="label">Passed</div>
<div class="value green">9</div>
</div>
<div class="summary-card">
<div class="label">Failed</div>
<div class="value red">1</div>
</div>
<div class="summary-card">
<div class="label">Hours Since Accepted</div>
<div class="value orange">~99h</div>
</div>
<div class="summary-card">
<div class="label">Failure Streak</div>
<div class="value red">5+ payloads</div>
</div>
<div class="summary-card">
<div class="label">New PRs</div>
<div class="value">0</div>
</div>
</div>
<h2>Blocking Job Results</h2>
<table>
<thead>
<tr><th>Job</th><th>Status</th><th>Retries</th><th>Link</th></tr>
</thead>
<tbody>
<tr>
<td>aws-ovn-serial</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.14-e2e-aws-ovn-serial/2053529046221852672" target="_blank">Prow</a></td>
</tr>
<tr>
<td>aws-ovn-upgrade-micro</td>
<td class="status-pass">Succeeded</td>
<td>1</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-e2e-aws-ovn-upgrade/2053545492083642368" target="_blank">Prow</a></td>
</tr>
<tr>
<td>aws-sdn-upgrade-4.14-micro</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.14-e2e-aws-sdn-upgrade/2053529046167326720" target="_blank">Prow</a></td>
</tr>
<tr>
<td>azure-ovn-upgrade-4.14-micro</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-e2e-azure-ovn-upgrade/2053529046901329920" target="_blank">Prow</a></td>
</tr>
<tr>
<td>driver-toolkit</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.14-e2e-aws-driver-toolkit/2053529051938689024" target="_blank">Prow</a></td>
</tr>
<tr>
<td>fips-scan</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.14-fips-payload-scan/2053529046402207744" target="_blank">Prow</a></td>
</tr>
<tr style="background: #fff5f5;">
<td><strong>gcp-ovn-rt-upgrade-4.14-minor</strong></td>
<td class="status-fail">Failed</td>
<td>3</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/2053715790477135872" target="_blank">Prow (final)</a></td>
</tr>
<tr>
<td>hypershift-ovn-conformance</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/2053529046251212800" target="_blank">Prow</a></td>
</tr>
<tr>
<td>metal-ipi-ovn-ipv6</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.14-e2e-metal-ipi-ovn-ipv6/2053529046305738752" target="_blank">Prow</a></td>
</tr>
<tr>
<td>metal-ipi-sdn-bm</td>
<td class="status-pass">Succeeded</td>
<td>0</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.14-e2e-metal-ipi-sdn-bm/2053529046347681792" target="_blank">Prow</a></td>
</tr>
</tbody>
</table>
<h2>Failure History: gcp-ovn-rt-upgrade-4.14-minor</h2>
<p>This job has been persistently failing across all recent payloads. The streak visualization shows the job's status across recent payloads (newest on right):</p>
<div style="display: flex; align-items: center; gap: 8px; margin: 12px 0; flex-wrap: wrap;">
<div style="display: flex; align-items: center; gap: 4px;">
<div class="streak-block pass" title="4.14.0-0.nightly-2026-04-14-134916 (Accepted, gcp PASSED)" style="font-size:0.5rem;">4/14</div>
<div class="streak-block force" title="4.14.0-0.nightly-2026-04-16-012435 (Accepted/Force, gcp FAILED)" style="font-size:0.5rem;">4/16</div>
<div class="streak-block force" title="4.14.0-0.nightly-2026-05-07-213817 (Accepted/Force, gcp FAILED)" style="font-size:0.5rem;">5/7</div>
<div class="streak-block fail" title="4.14.0-0.nightly-2026-05-08-111625 (Rejected)" style="font-size:0.5rem;">5/8</div>
<div class="streak-block fail" title="4.14.0-0.nightly-2026-05-09-034016 (Rejected)" style="font-size:0.5rem;">5/9a</div>
<div class="streak-block fail" title="4.14.0-0.nightly-2026-05-09-211506 (Rejected)" style="font-size:0.5rem;">5/9b</div>
<div class="streak-block fail target" title="4.14.0-0.nightly-2026-05-10-173114 (Rejected) - TARGET" style="font-size:0.5rem;">5/10</div>
<div class="streak-block fail" title="4.14.0-0.nightly-2026-05-11-095134 (Rejected)" style="font-size:0.5rem;">5/11</div>
</div>
<div style="font-size: 0.75rem; color: var(--gray);">
<span class="streak-block pass" style="width:14px; height:14px; display:inline-flex; font-size:0;">&nbsp;</span> Pass
<span class="streak-block force" style="width:14px; height:14px; display:inline-flex; font-size:0; margin-left:6px;">&nbsp;</span> Force-accepted (gcp failed)
<span class="streak-block fail" style="width:14px; height:14px; display:inline-flex; font-size:0; margin-left:6px;">&nbsp;</span> Rejected
<span style="border: 2px solid var(--dark); width:14px; height:14px; display:inline-flex; border-radius:4px; margin-left:6px;">&nbsp;</span> Target
</div>
</div>
<table>
<thead>
<tr><th>Payload</th><th>Phase</th><th>gcp-ovn-rt-upgrade Status</th><th>New PRs</th></tr>
</thead>
<tbody>
<tr>
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-04-14-134916" target="_blank">...2026-04-14-134916</a></td>
<td><span class="badge badge-green">Accepted</span></td>
<td class="status-pass">Succeeded (1 retry)</td>
<td>&mdash;</td>
</tr>
<tr style="background: #fff8e1;">
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-04-16-012435" target="_blank">...2026-04-16-012435</a></td>
<td><span class="badge badge-orange">Force-Accepted</span></td>
<td class="status-fail">Failed (3 retries) &larr; ONSET</td>
<td>0</td>
</tr>
<tr style="background: #fff8e1;">
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-07-213817" target="_blank">...2026-05-07-213817</a></td>
<td><span class="badge badge-orange">Force-Accepted</span></td>
<td class="status-fail">Failed (3 retries)</td>
<td>0</td>
</tr>
<tr>
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-08-111625" target="_blank">...2026-05-08-111625</a></td>
<td><span class="badge badge-red">Rejected</span></td>
<td class="status-fail">Failed (3 retries)</td>
<td>0</td>
</tr>
<tr>
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-09-034016" target="_blank">...2026-05-09-034016</a></td>
<td><span class="badge badge-red">Rejected</span></td>
<td class="status-fail">Failed (3 retries)</td>
<td>0</td>
</tr>
<tr>
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-09-211506" target="_blank">...2026-05-09-211506</a></td>
<td><span class="badge badge-red">Rejected</span></td>
<td class="status-fail">Failed (3 retries)</td>
<td>0</td>
</tr>
<tr style="background: #fef3f3;">
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-10-173114" target="_blank"><strong>...2026-05-10-173114</strong></a></td>
<td><span class="badge badge-red">Rejected</span></td>
<td class="status-fail"><strong>Failed (3 retries) &larr; TARGET</strong></td>
<td>0</td>
</tr>
<tr>
<td><a href="https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2026-05-11-095134" target="_blank">...2026-05-11-095134</a></td>
<td><span class="badge badge-red">Rejected</span></td>
<td class="status-fail">Pending/Failed (2 retries)</td>
<td>0</td>
</tr>
</tbody>
</table>
<h2>Failed Job Analysis: gcp-ovn-rt-upgrade-4.14-minor</h2>
<details open>
<summary style="background: #fef3f3;">
<span class="status-fail">FAILED</span> &mdash; gcp-ovn-rt-upgrade-4.14-minor
<span style="float:right; font-weight: normal; font-size: 0.85rem;">
periodic-ci-openshift-release-main-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade
</span>
</summary>
<div class="detail-content">
<h3>Classification</h3>
<p>
<span class="badge badge-red">Product Bug</span>
<span class="badge badge-gray" style="margin-left: 4px;">Networking / Multus CNI</span>
<span class="badge badge-gray" style="margin-left: 4px;">Permafail ~27 days</span>
</p>
<h3>Failure Type</h3>
<p><strong>Upgrade test failure</strong> &mdash; The 4.13 cluster installs successfully and the RT kernel tuned profile is applied, but the <strong>4.13&rarr;4.14 upgrade fails</strong> due to the DNS operator becoming permanently degraded.</p>
<h3>Root Cause</h3>
<p>During the 4.13&rarr;4.14 upgrade, worker nodes are rebooted by the MachineConfigDaemon to apply the new 4.14 OS. After reboot, the <strong>Multus CNI plugin fails to initialize its Kubernetes client</strong> because the API server URL in its kubeconfig is malformed with brackets around the hostname.</p>
<div class="error-box">Multus: error getting k8s client: host must be a URL or a host:port pair:
"https://[api-int.ci-op-k3f4fgi6-ad64e.XXXXXXXXXXXXXXXXXXXXXX]:6443"</div>
<h3>Buggy Code</h3>
<p>In <a href="https://github.com/openshift/multus-cni/blob/release-4.14/cmd/thin_entrypoint/main.go" target="_blank"><code>openshift/multus-cni &rarr; cmd/thin_entrypoint/main.go:202</code></a>, the kubeconfig URL was constructed as:</p>
<div class="error-box" style="background: #f8f8f8; border-color: #ddd;">fmt.Sprintf("%s://[%s]:%s", kubeProtocol, kubeHost, kubePort)</div>
<p>This <strong>unconditionally wraps</strong> <code>KUBERNETES_SERVICE_HOST</code> in square brackets <code>[...]</code>, which is only valid for IPv6 addresses. For DNS hostnames (like <code>api-int.ci-op-...</code>), the brackets produce an invalid URL.</p>
<h3>Trigger: Go 1.24.8+ / CVE-2025-47912</h3>
<p>This was a <strong>latent bug</strong> that existed for years but was harmless because older Go versions accepted brackets around any host. When the 4.14 build toolchain picked up <strong>Go 1.24.8+</strong> (around mid-April 2026), the stricter <code>url.Parse()</code> behavior from the <strong>CVE-2025-47912</strong> security fix started rejecting brackets around non-IPv6 addresses, exposing the bug.</p>
<h3>Fix (Already Merged)</h3>
<table>
<thead><tr><th>PR</th><th>Repo</th><th>Jira</th><th>Merged</th><th>Change</th></tr></thead>
<tbody>
<tr>
<td><a href="https://github.com/openshift/multus-cni/pull/287" target="_blank">#287</a></td>
<td>openshift/multus-cni</td>
<td><a href="https://issues.redhat.com/browse/OCPBUGS-85253" target="_blank">OCPBUGS-85253</a></td>
<td>May 7</td>
<td>Replace <code>fmt.Sprintf("[%s]")</code> with <code>net.JoinHostPort()</code> in thin entrypoint kubeconfig generation</td>
</tr>
<tr>
<td><a href="https://github.com/openshift/cluster-network-operator/pull/2996" target="_blank">#2996</a></td>
<td>openshift/cluster-network-operator</td>
<td><a href="https://issues.redhat.com/browse/OCPBUGS-84184" target="_blank">OCPBUGS-84184</a></td>
<td>May 6</td>
<td>Use <code>net.JoinHostPort()</code> in multus admission controller &amp; cloud network config controller</td>
</tr>
</tbody>
</table>
<p style="margin-top: 8px;">The fix correctly uses <code>net.JoinHostPort()</code>, which only adds brackets for actual IPv6 addresses. <strong>These fixes have not yet been built into a payload</strong> &mdash; the 4.14 maintenance branch has a slower build pipeline. Upcoming nightlies should pick them up.</p>
<h3>Failure Cascade</h3>
<ol style="margin: 8px 0 8px 20px;">
<li>Multus CNI cannot initialize &rarr; cannot create/destroy pod sandboxes on affected worker nodes</li>
<li>Pods on affected workers cannot be killed or recreated &rarr; <code>FailedKillPod</code> errors</li>
<li><code>dns-default</code> DaemonSet rollout stuck (readiness probes fail with <code>connection refused</code> on port 8181)</li>
<li>DNS operator reports <code>DNSDegraded</code>, <code>ClusterOperatorDegraded</code> alert fires</li>
<li>Cluster version operator waits 40+ minutes for DNS &rarr; upgrade times out</li>
</ol>
<h3>Timeline (Most Recent Attempt)</h3>
<div class="timeline">
<div class="timeline-item">
<span class="time">06:50 UTC</span> &mdash; RT kernel tuned profile applied, nodes rebooted/updated (pre-upgrade)
</div>
<div class="timeline-item">
<span class="time">06:55 UTC</span> &mdash; 4.13&rarr;4.14 upgrade initiated
</div>
<div class="timeline-item error">
<span class="time">07:27 UTC</span> &mdash; First Multus <code>FailedKillPod</code> errors on worker nodes
</div>
<div class="timeline-item error">
<span class="time">07:44 UTC</span> &mdash; Upgrade reaches 82% (712/861), stuck waiting on DNS operator (degraded)
</div>
<div class="timeline-item error">
<span class="time">08:06 UTC</span> &mdash; Cluster version reports: "Failing: Cluster operator dns is degraded"
</div>
<div class="timeline-item error">
<span class="time">09:25 UTC</span> &mdash; Upgrade times out after exceeding 40 minutes waiting on DNS
</div>
</div>
<h3>Fired Alerts</h3>
<ul class="alert-list">
<li><span class="alert-name">ClusterOperatorDegraded</span> (dns) <span class="alert-duration">&mdash; ~73 minutes</span></li>
<li><span class="alert-name">KubeDaemonSetRolloutStuck</span> (dns-default) <span class="alert-duration">&mdash; ~68 minutes</span></li>
<li><span class="alert-name">KubeDaemonSetRolloutStuck</span> (network-check-target) <span class="alert-duration">&mdash; ~85 minutes</span></li>
<li><span class="alert-name">KubeDeploymentReplicasMismatch</span> (network-check-source) <span class="alert-duration">&mdash; ~100 minutes</span></li>
<li><span class="alert-name">OVNKubernetesNorthdInactive</span> <span class="alert-duration">&mdash; ~5 minutes</span></li>
</ul>
<h3>All Attempts</h3>
<table>
<thead>
<tr><th>#</th><th>Prow Job ID</th><th>Result</th></tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/2053529046070857728" target="_blank">2053529046070857728</a></td>
<td class="status-fail">Failed &mdash; Same root cause (Multus / DNS degraded)</td>
</tr>
<tr>
<td>2</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/2053589708469964800" target="_blank">2053589708469964800</a></td>
<td class="status-fail">Failed &mdash; Same root cause</td>
</tr>
<tr>
<td>3</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/2053654288311259136" target="_blank">2053654288311259136</a></td>
<td class="status-fail">Failed &mdash; Same root cause</td>
</tr>
<tr>
<td>4 (final)</td>
<td><a href="https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/2053715790477135872" target="_blank">2053715790477135872</a></td>
<td class="status-fail">Failed &mdash; Same root cause</td>
</tr>
</tbody>
</table>
<h3>Consistency</h3>
<p>All 4 attempts show the identical failure pattern: DNS operator degraded due to Multus CNI client initialization failure with bracket-malformed API server URL. This is a deterministic, 100% reproducible bug &mdash; not a flake.</p>
</div>
</details>
<h2>Revert Recommendations</h2>
<div class="no-revert-box" style="background: #fff7ed; border-color: #fdba74;">
<h3 style="color: var(--orange);">Fix Merged on 4.14 but Incomplete &mdash; 4.13 Also Needs Fix</h3>
<p>
The fixes have been merged to <code>release-4.14</code> and are <strong>confirmed present in this payload</strong>
(verified via <code>oc adm release info --output json</code>):
</p>
<table>
<thead><tr><th>Image</th><th>Commit in Payload</th><th>Fix PR</th></tr></thead>
<tbody>
<tr>
<td><code>multus-cni</code></td>
<td><code>26b410137fa0</code> (May 7)</td>
<td><a href="https://github.com/openshift/multus-cni/pull/287" target="_blank">#287</a></td>
</tr>
<tr>
<td><code>cluster-network-operator</code></td>
<td><code>af41a362017b</code> (May 6)</td>
<td><a href="https://github.com/openshift/cluster-network-operator/pull/2996" target="_blank">#2996</a></td>
</tr>
</tbody>
</table>
<p style="margin-top: 8px;">
<strong>However, the job still fails despite the fixes being present.</strong> This is because
<code>gcp-ovn-rt-upgrade-4.14-minor</code> is a <strong>4.13&rarr;4.14 upgrade test</strong>. The test installs
a cluster from <strong>stable 4.13</strong> first, then upgrades to the 4.14 nightly. During the initial 4.13 phase,
the <strong>4.13 multus binary</strong> runs and generates the kubeconfig with the bracket-malformed URL.
If the stable 4.13 images have also been rebuilt with Go 1.24.8+ (which includes the CVE-2025-47912 fix),
they hit the same latent bug <em>before</em> the 4.14 upgrade even delivers the fixed binary.
</p>
<p style="margin-top: 8px;">
The <code>release-4.13</code> branch of <code>multus-cni</code> has <strong>not</strong> received this fix &mdash;
its last commit is from April 2024. The same bracket-wrapping code exists there unchanged.
</p>
<h3 style="margin-top: 16px;">Jira Tracking</h3>
<table>
<thead><tr><th>Bug</th><th>Summary</th><th>Status</th></tr></thead>
<tbody>
<tr>
<td><a href="https://issues.redhat.com/browse/OCPBUGS-72411" target="_blank">OCPBUGS-72411</a></td>
<td>CNO fails to start with "host must be a URL or a host:port pair" (parent bug)</td>
<td>Fix available</td>
</tr>
<tr>
<td><a href="https://issues.redhat.com/browse/OCPBUGS-84184" target="_blank">OCPBUGS-84184</a></td>
<td>CNO 4.14 clone &mdash; use <code>net.JoinHostPort</code> for URL construction</td>
<td>Fix merged (May 6)</td>
</tr>
<tr>
<td><a href="https://issues.redhat.com/browse/OCPBUGS-85253" target="_blank">OCPBUGS-85253</a></td>
<td>Multus 4.14 &mdash; Fix server URL in generated kubeconfig</td>
<td>Fix merged (May 7)</td>
</tr>
</tbody>
</table>
<h3 style="margin-top: 16px;">Background</h3>
<p>
The buggy <code>fmt.Sprintf("%s://[%s]:%s", ...)</code> pattern has existed in multus-cni since March 2023
(commit <code>dcf92c8e</code>). It was harmless because older Go versions tolerated brackets around non-IPv6 hosts.
When ART rebuilt builder images with a Go version containing
the <a href="https://github.com/golang/go/issues/75678" target="_blank">CVE-2025-47912</a> fix (around April 14, 2026),
<code>url.Parse()</code> started strictly rejecting the bracketed hostnames, exposing the latent bug.
The fix was backported to <code>release-4.14</code> but <strong>not to <code>release-4.13</code></strong>, which is
the source version for this upgrade test.
</p>
<p style="margin-top: 8px;">
<strong>Recommended actions:</strong>
</p>
<ul style="margin: 8px 0 0 20px;">
<li><strong>Cherry-pick the fixes to <code>release-4.13</code></strong> for both <code>multus-cni</code> and <code>cluster-network-operator</code> so the 4.13 source cluster doesn't hit the bug before the upgrade delivers the 4.14 fix</li>
<li>Consider force-accepting payloads in the interim, since all other 9 blocking jobs pass</li>
<li>Verify the fix also reaches any other z-stream branches where ART has updated the Go toolchain</li>
</ul>
</div>
<h2>Payload Composition</h2>
<p>No new pull requests were included in this payload compared to its predecessor. The 4.14 stream has had zero code changes across the entire rejection streak (May 8&ndash;11, 2026) and beyond (back to at least April 16, 2026).</p>
<h2>Related Changes on release-4.14</h2>
<p>While the payload diff API shows zero PRs (the changes affect build toolchain, not payload image content), the following commits landed on <code>release-4.14</code> branches in the relevant timeframe:</p>
<table>
<thead><tr><th>Date</th><th>Repo</th><th>Change</th><th>Relevance</th></tr></thead>
<tbody>
<tr>
<td>Apr 15</td>
<td><a href="https://github.com/openshift/ovn-kubernetes/pull/3073" target="_blank">openshift/ovn-kubernetes#3073</a></td>
<td>Unpin OVN, consume latest from FDP</td>
<td class="status-pass">Not the cause (coincidental timing)</td>
</tr>
<tr>
<td>~Apr 14</td>
<td>(ART builder image)</td>
<td>Go toolchain updated to include CVE-2025-47912 fix</td>
<td class="status-fail">Trigger &mdash; stricter <code>url.Parse()</code> exposed latent Multus bug</td>
</tr>
<tr>
<td>May 6</td>
<td><a href="https://github.com/openshift/cluster-network-operator/pull/2996" target="_blank">openshift/cluster-network-operator#2996</a></td>
<td>Use <code>net.JoinHostPort</code> for URL construction</td>
<td><span class="badge badge-green">Fix</span></td>
</tr>
<tr>
<td>May 7</td>
<td><a href="https://github.com/openshift/multus-cni/pull/287" target="_blank">openshift/multus-cni#287</a></td>
<td>Fix URL in generated kubeconfig</td>
<td><span class="badge badge-green">Fix</span></td>
</tr>
</tbody>
</table>
<div class="footer">
<p>Generated by <code>/ci:analyze-payload</code> &middot; Claude Code &middot; 2026-05-12</p>
<p>Target: <code>4.14.0-0.nightly-2026-05-10-173114</code> &middot; Lookback: 10 payloads &middot; Analysis based on Prow job artifacts and release controller data</p>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment