Skip to content

Instantly share code, notes, and snippets.

@jmealo
Last active March 16, 2026 23:16
Show Gist options
  • Select an option

  • Save jmealo/63d5c83972e1b9c3d74c1305fd0ce974 to your computer and use it in GitHub Desktop.

Select an option

Save jmealo/63d5c83972e1b9c3d74c1305fd0ce974 to your computer and use it in GitHub Desktop.

Revisions

  1. jmealo revised this gist Mar 16, 2026. 1 changed file with 6 additions and 4 deletions.
    10 changes: 6 additions & 4 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -162,7 +162,7 @@ By event type:
    ---

    ### 6. Two distinct loss mechanisms identified
    ### 6. Two candidate loss mechanisms (not yet verified)

    **Acute — AMQP connection failures**: On Jan 13 at 22:00 UTC, assets-api returned 500s which escalated into AMQP connection death (`CHANNEL_ERROR`, `PRECONDITION_FAILED - unknown delivery tag`, `Failed to ACK message ... 'NoneType'`). This single hour accounts for **1,975 of ~2,000 missed `intel_requests.created` events** on that day ([VLogs][vlogs-asu-errors-jan13]).

    @@ -187,11 +187,13 @@ By event type:
    | `asset_status.updated` | Name→UUID via Redis cache ([source][asu-cache-code]) | 24h TTL | **~5%** |
    | `search.expired` | `intel_request_id` UUID in message | None | **~0%** (sporadic spikes) |

    ## Root Cause: Two Mechanisms
    ## Root Cause: Two Candidates (Not Yet Verified)

    1. **AMQP connection failures** cause bulk loss of all event types during the failure window. Messages in-flight at connection death may fail to ACK, and RabbitMQ's `PRECONDITION_FAILED` on the new channel means the redelivered messages can't be acknowledged either. This is the dominant cause for `intel_requests.created` loss (Cody's original report).
    The following are **correlated with the loss, not proven to cause it**. The March 10 replay (next step) will confirm or rule them out.

    2. **Redis cache misses in `_get_assets_by_name`** cause chronic `asset_status.updated` loss. When the cache returns empty (asset not yet created, or stale TTL), the handler returns `[]` and the event is silently discarded with no API call and no error log.
    1. **AMQP connection failures** correlate with bulk loss during failure windows. On Jan 13 hour 22, assets-api 500s triggered `PRECONDITION_FAILED` and failed ACKs in the same hour that 1,975 `intel_requests.created` events went missing. But correlation ≠ causation — we haven't confirmed that ASU never received those specific messages (vs received but failed to process them).

    2. **`_get_assets_by_name` Redis cache returning empty** can silently discard `asset_status.updated` events — the code path exists ([source][asu-cache-code]) and the replay shows ~5% "new" for that event type. But "new on replay" could also mean the original POST failed for other reasons, not just an empty cache. No production logging currently captures when this code path returns `[]`.

    ## Next Steps

  2. jmealo revised this gist Mar 16, 2026. 1 changed file with 15 additions and 15 deletions.
    30 changes: 15 additions & 15 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -310,7 +310,7 @@ For each correlation_id, search ASU's production logs to see if it logged "Proce
    {service_name="asset-status-updater"} correlation_id:"{cid}" AND "Processing"
    ```

    Time range: March 10 00:00–24:00 UTC (`start=1741564800000&end=1741651200000`).
    Time range: March 10 00:00–24:00 UTC (`start=1773100800000&end=1773187200000`).

    **Batch check** (sample 10 CIDs):
    ```bash
    @@ -370,17 +370,17 @@ Update this RCA with:
    [asu-cache-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L554-634
    [replay-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py
    [replay-409]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py#L126-197
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22message-amplifier%5C%22%7D%20%5C%22c2a627b3%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20intel_request_id%3A%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20%5C%2256a219a6%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20intel_request_id%3A%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20level%3A%5C%22error%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221736805600000%22%2C%22to%22%3A%221736816400000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D%20%5C%22%20%20intel_requests.created%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221736899200000%22%2C%22to%22%3A%221736985600000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D%20%5C%22TOTAL%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221736899200000%22%2C%22to%22%3A%221742169599000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22replay-from-loki-hour-18-4bht6%5C%22%7D%20%5C%22REPLAY%20SUMMARY%5C%22%20OR%20%5C%22TOTAL%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221737072000000%22%2C%22to%22%3A%221737158400000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773150600000%22%2C%22to%22%3A%221773151800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773150600000%22%2C%22to%22%3A%221773151800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773150600000%22%2C%22to%22%3A%221773151800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773144000000%22%2C%22to%22%3A%221773172800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773144000000%22%2C%22to%22%3A%221773172800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773144000000%22%2C%22to%22%3A%221773172800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22message-amplifier%5C%22%7D%20%5C%22c2a627b3%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773150600000%22%2C%22to%22%3A%221773151800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20intel_request_id%3A%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773144000000%22%2C%22to%22%3A%221773172800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20%5C%2256a219a6%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773144000000%22%2C%22to%22%3A%221773172800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20intel_request_id%3A%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221773144000000%22%2C%22to%22%3A%221773172800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20level%3A%5C%22error%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221768338000000%22%2C%22to%22%3A%221768345200000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D%20%5C%22%20%20intel_requests.created%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221768435200000%22%2C%22to%22%3A%221768521600000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D%20%5C%22TOTAL%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221768435200000%22%2C%22to%22%3A%221773705599000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22replay-from-loki-hour-18-4bht6%5C%22%7D%20%5C%22REPLAY%20SUMMARY%5C%22%20OR%20%5C%22TOTAL%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221768608000000%22%2C%22to%22%3A%221768694400000%22%7D%2C%22compact%22%3Afalse%7D%7D
  3. jmealo revised this gist Mar 16, 2026. 1 changed file with 14 additions and 14 deletions.
    28 changes: 14 additions & 14 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -370,17 +370,17 @@ Update this RCA with:
    [asu-cache-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L554-634
    [replay-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py
    [replay-409]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py#L126-197
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22event-recorder%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22ai-prediction-tester%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A~%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%22
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22event-recorder%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A~%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%22
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22ai-prediction-tester%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A~%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%22
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22message-amplifier%22%7D%20%22c2a627b3%22
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20intel_request_id%3A%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20%2256a219a6%22
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22ai-prediction-tester%22%7D%20intel_request_id%3A%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1736805600000&to=1736816400000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20level%3A%22error%22
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1736899200000&to=1736985600000&expr=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22%20%20intel_requests.created%22
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1736899200000&to=1742169599000&expr=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22TOTAL%22
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1737072000000&to=1737158400000&expr=%7Bservice_name%3D%22replay-from-loki-hour-18-4bht6%22%7D%20%22REPLAY%20SUMMARY%22%20OR%20%22TOTAL%22
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20batch_id%3A~%5C%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22message-amplifier%5C%22%7D%20%5C%22c2a627b3%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741614600000%22%2C%22to%22%3A%221741615800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20message_type%3A%5C%22intel_requests.created%5C%22%20AND%20intel_request_id%3A%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20%5C%2256a219a6%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D%20intel_request_id%3A%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221741608000000%22%2C%22to%22%3A%221741636800000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D%20level%3A%5C%22error%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221736805600000%22%2C%22to%22%3A%221736816400000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D%20%5C%22%20%20intel_requests.created%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221736899200000%22%2C%22to%22%3A%221736985600000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D%20%5C%22TOTAL%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221736899200000%22%2C%22to%22%3A%221742169599000%22%7D%2C%22compact%22%3Afalse%7D%7D
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local/explore?schemaVersion=1&panes=%7B%22rca%22%3A%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22datasource%22%3A%7B%22type%22%3A%22victoriametrics-logs-datasource%22%2C%22uid%22%3A%22PD775F2863313E6C7%22%7D%2C%22editorMode%22%3A%22code%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22replay-from-loki-hour-18-4bht6%5C%22%7D%20%5C%22REPLAY%20SUMMARY%5C%22%20OR%20%5C%22TOTAL%5C%22%22%2C%22queryType%22%3A%22instant%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%221737072000000%22%2C%22to%22%3A%221737158400000%22%7D%2C%22compact%22%3Afalse%7D%7D
  4. jmealo revised this gist Mar 16, 2026. 1 changed file with 20 additions and 25 deletions.
    45 changes: 20 additions & 25 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -41,13 +41,8 @@ Counted `intel_requests.created` processing logs per batch per consumer. Each me

    Query (run per service — replace `{service}` with service name):
    ```
    {service_name="{service}"} message_type:exact("intel_requests.created")
    AND (batch_id:exact("c2a627b3-3f7c-4c89-808c-4d79b19c69cb")
    OR batch_id:exact("215181db-45e2-474f-9c05-839eecf61cc5")
    OR batch_id:exact("812f79b7-c50d-4fd2-ac28-cc5d0318ee23")
    OR batch_id:exact("d334c245-898b-4caa-b96c-954b8be31414")
    OR batch_id:exact("2186cea6-3ae5-40c1-805e-878cc123ca5c")
    OR batch_id:exact("7438ef94-89a7-48ae-83d2-71eb8a138f52"))
    {service_name="{service}"} message_type:"intel_requests.created"
    AND batch_id:~"c2a627b3|215181db|812f79b7|d334c245|2186cea6|7438ef94"
    _time:[2026-03-10T13:00:00Z, 2026-03-10T20:00:00Z]
    | stats by (batch_id) count(*) as log_lines
    ```
    @@ -312,7 +307,7 @@ wc -l /tmp/new-correlation-ids.txt # How many missed events
    For each correlation_id, search ASU's production logs to see if it logged "Processing message type" for that CID:

    ```
    {service_name="asset-status-updater"} correlation_id:exact("{cid}") AND "Processing"
    {service_name="asset-status-updater"} correlation_id:"{cid}" AND "Processing"
    ```

    Time range: March 10 00:00–24:00 UTC (`start=1741564800000&end=1741651200000`).
    @@ -322,7 +317,7 @@ Time range: March 10 00:00–24:00 UTC (`start=1741564800000&end=1741651200000`)
    head -10 /tmp/new-correlation-ids.txt | while read cid; do
    echo -n "$cid: "
    curl -s 'http://vlogs-production.monitoring.svc.cluster.local/select/logsql/query' \
    --data-urlencode "query={service_name=\"asset-status-updater\"} correlation_id:exact(\"$cid\") AND \"Processing\"" \
    --data-urlencode "query={service_name=\"asset-status-updater\"} correlation_id:\"$cid\" AND \"Processing\"" \
    --data-urlencode 'start=2026-03-10T00:00:00Z' \
    --data-urlencode 'end=2026-03-11T00:00:00Z' \
    --data-urlencode 'limit=1' | wc -c
    @@ -343,7 +338,7 @@ done
    Check if the missed events cluster around AMQP error windows:

    ```
    {service_name="asset-status-updater"} level:exact("error") _time:[2026-03-10T00:00:00Z, 2026-03-11T00:00:00Z] | stats by (_time:15m) count(*) as errors
    {service_name="asset-status-updater"} level:"error" _time:[2026-03-10T00:00:00Z, 2026-03-11T00:00:00Z] | stats by (_time:15m) count(*) as errors
    ```

    If NEW events cluster in the same 15-minute windows as errors → AMQP connection failure is the dominant cause. If they're spread evenly → application-layer (cache/handler) issue.
    @@ -357,7 +352,7 @@ Update this RCA with:

    ---

    <!-- Reference-style links — Grafana Explore (VictoriaLogs datasource) with time ranges -->
    <!-- Reference-style links — Grafana Explore with VictoriaLogs datasource, time ranges set via from/to epoch ms -->

    [lib-log]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L657-725
    [lib-context]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L632-642
    @@ -375,17 +370,17 @@ Update this RCA with:
    [asu-cache-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L554-634
    [replay-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py
    [replay-409]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py#L126-197
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+%28batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29+OR+batch_id%3Aexact%28%5C%22215181db-45e2-474f-9c05-839eecf61cc5%5C%22%29+OR+batch_id%3Aexact%28%5C%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%5C%22%29+OR+batch_id%3Aexact%28%5C%22d334c245-898b-4caa-b96c-954b8be31414%5C%22%29+OR+batch_id%3Aexact%28%5C%222186cea6-3ae5-40c1-805e-878cc123ca5c%5C%22%29+OR+batch_id%3Aexact%28%5C%227438ef94-89a7-48ae-83d2-71eb8a138f52%5C%22%29%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+%28batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29+OR+batch_id%3Aexact%28%5C%22215181db-45e2-474f-9c05-839eecf61cc5%5C%22%29+OR+batch_id%3Aexact%28%5C%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%5C%22%29+OR+batch_id%3Aexact%28%5C%22d334c245-898b-4caa-b96c-954b8be31414%5C%22%29+OR+batch_id%3Aexact%28%5C%222186cea6-3ae5-40c1-805e-878cc123ca5c%5C%22%29+OR+batch_id%3Aexact%28%5C%227438ef94-89a7-48ae-83d2-71eb8a138f52%5C%22%29%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+%28batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29+OR+batch_id%3Aexact%28%5C%22215181db-45e2-474f-9c05-839eecf61cc5%5C%22%29+OR+batch_id%3Aexact%28%5C%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%5C%22%29+OR+batch_id%3Aexact%28%5C%22d334c245-898b-4caa-b96c-954b8be31414%5C%22%29+OR+batch_id%3Aexact%28%5C%222186cea6-3ae5-40c1-805e-878cc123ca5c%5C%22%29+OR+batch_id%3Aexact%28%5C%227438ef94-89a7-48ae-83d2-71eb8a138f52%5C%22%29%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22message-amplifier%5C%22%7D+%5C%22c2a627b3%5C%22%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+intel_request_id%3Aexact%28%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+%5C%2256a219a6%5C%22%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D+intel_request_id%3Aexact%28%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+level%3Aexact%28%5C%22error%5C%22%29%22%7D%5D%7D&from=1736805600000&to=1736816400000
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D+%5C%22++intel_requests.created%5C%22%22%7D%5D%7D&from=1736899200000&to=1736985600000
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D+%5C%22TOTAL%5C%22%22%7D%5D%7D&from=1736899200000&to=1742169599000
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22replay-from-loki-hour-18-4bht6%5C%22%7D+%5C%22REPLAY+SUMMARY%5C%22+OR+%5C%22TOTAL%5C%22%22%7D%5D%7D&from=1737072000000&to=1737158400000
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22event-recorder%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22ai-prediction-tester%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A~%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%22
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22event-recorder%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A~%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%22
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22ai-prediction-tester%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20batch_id%3A~%22c2a627b3%7C215181db%7C812f79b7%7Cd334c245%7C2186cea6%7C7438ef94%22
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741614600000&to=1741615800000&expr=%7Bservice_name%3D%22message-amplifier%22%7D%20%22c2a627b3%22
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20message_type%3A%22intel_requests.created%22%20AND%20intel_request_id%3A%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20%2256a219a6%22
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1741608000000&to=1741636800000&expr=%7Bservice_name%3D%22ai-prediction-tester%22%7D%20intel_request_id%3A%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1736805600000&to=1736816400000&expr=%7Bservice_name%3D%22asset-status-updater%22%7D%20level%3A%22error%22
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1736899200000&to=1736985600000&expr=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22%20%20intel_requests.created%22
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1736899200000&to=1742169599000&expr=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22TOTAL%22
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local:80/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=1737072000000&to=1737158400000&expr=%7Bservice_name%3D%22replay-from-loki-hour-18-4bht6%22%7D%20%22REPLAY%20SUMMARY%22%20OR%20%22TOTAL%22
  5. jmealo revised this gist Mar 16, 2026. 1 changed file with 24 additions and 27 deletions.
    51 changes: 24 additions & 27 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -41,7 +41,7 @@ Counted `intel_requests.created` processing logs per batch per consumer. Each me

    Query (run per service — replace `{service}` with service name):
    ```
    service_name:exact("{service}") AND message_type:exact("intel_requests.created")
    {service_name="{service}"} message_type:exact("intel_requests.created")
    AND (batch_id:exact("c2a627b3-3f7c-4c89-808c-4d79b19c69cb")
    OR batch_id:exact("215181db-45e2-474f-9c05-839eecf61cc5")
    OR batch_id:exact("812f79b7-c50d-4fd2-ac28-cc5d0318ee23")
    @@ -312,21 +312,20 @@ wc -l /tmp/new-correlation-ids.txt # How many missed events
    For each correlation_id, search ASU's production logs to see if it logged "Processing message type" for that CID:

    ```
    {service_name="asset-status-updater"} correlation_id:exact("{cid}") "Processing"
    {service_name="asset-status-updater"} correlation_id:exact("{cid}") AND "Processing"
    ```

    Time range: March 10 00:00–24:00 UTC.
    Time range: March 10 00:00–24:00 UTC (`start=1741564800000&end=1741651200000`).

    **Batch check** (sample 10 CIDs):
    ```bash
    head -10 /tmp/new-correlation-ids.txt | while read cid; do
    echo -n "$cid: "
    # Use the MCP tool or curl to VictoriaLogs
    curl -s "http://vlogs-production.monitoring.svc.cluster.local/select/logsql/query" \
    --data-urlencode "query={service_name=\"asset-status-updater\"} correlation_id:exact(\"$cid\") \"Processing\"" \
    --data-urlencode "start=2026-03-10T00:00:00Z" \
    --data-urlencode "end=2026-03-11T00:00:00Z" \
    --data-urlencode "limit=1" | wc -c
    curl -s 'http://vlogs-production.monitoring.svc.cluster.local/select/logsql/query' \
    --data-urlencode "query={service_name=\"asset-status-updater\"} correlation_id:exact(\"$cid\") AND \"Processing\"" \
    --data-urlencode 'start=2026-03-10T00:00:00Z' \
    --data-urlencode 'end=2026-03-11T00:00:00Z' \
    --data-urlencode 'limit=1' | wc -c
    # >2 bytes = found, 0-2 bytes = not found
    done
    ```
    @@ -344,9 +343,7 @@ done
    Check if the missed events cluster around AMQP error windows:

    ```
    {service_name="asset-status-updater"} level:exact("error")
    _time:[2026-03-10T00:00:00Z, 2026-03-11T00:00:00Z]
    | stats by (_time:15m) count(*) as errors
    {service_name="asset-status-updater"} level:exact("error") _time:[2026-03-10T00:00:00Z, 2026-03-11T00:00:00Z] | stats by (_time:15m) count(*) as errors
    ```

    If NEW events cluster in the same 15-minute windows as errors → AMQP connection failure is the dominant cause. If they're spread evenly → application-layer (cache/handler) issue.
    @@ -360,7 +357,7 @@ Update this RCA with:

    ---

    <!-- Reference-style links — kept at bottom, GitHub renders them as clickable -->
    <!-- Reference-style links — Grafana Explore (VictoriaLogs datasource) with time ranges -->

    [lib-log]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L657-725
    [lib-context]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L632-642
    @@ -374,21 +371,21 @@ Update this RCA with:
    [vrl-batch]: https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L213-219
    [mr17]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/merge_requests/17
    [mr48]: https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48
    [vlogs-asu]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-er]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-aipt]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-asu-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-msgamp]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22message-amplifier%22%29%20AND%20%22c2a627b3%22&start=1741614600000&end=1741615800000
    [vlogs-er-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-aipt-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-field]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-any]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20%2256a219a6%22&start=1741608000000&end=1741636800000
    [vlogs-aipt-ir]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [mr29]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/merge_requests/29
    [asu-cache-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L554-634
    [vlogs-asu-errors-jan13]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D%22asset-status-updater%22%7D%20level%3Aexact%28%22error%22%29&start=1736805600000&end=1736816400000
    [vlogs-replay-ir-breakdown]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22%20%20intel_requests.created%22&start=1736899200000&end=1736985600000
    [replay-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py
    [replay-409]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py#L126-197
    [vlogs-replay-totals]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22TOTAL%22&start=1736899200000&end=1742169599000
    [vlogs-replay-sample]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D%22replay-from-loki-hour-18-4bht6%22%7D%20%22REPLAY%20SUMMARY%22%20OR%20%22TOTAL%22&start=1737072000000&end=1737158400000
    [vlogs-asu]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-er]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-aipt]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-asu-all]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+%28batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29+OR+batch_id%3Aexact%28%5C%22215181db-45e2-474f-9c05-839eecf61cc5%5C%22%29+OR+batch_id%3Aexact%28%5C%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%5C%22%29+OR+batch_id%3Aexact%28%5C%22d334c245-898b-4caa-b96c-954b8be31414%5C%22%29+OR+batch_id%3Aexact%28%5C%222186cea6-3ae5-40c1-805e-878cc123ca5c%5C%22%29+OR+batch_id%3Aexact%28%5C%227438ef94-89a7-48ae-83d2-71eb8a138f52%5C%22%29%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-er-all]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22event-recorder%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+%28batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29+OR+batch_id%3Aexact%28%5C%22215181db-45e2-474f-9c05-839eecf61cc5%5C%22%29+OR+batch_id%3Aexact%28%5C%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%5C%22%29+OR+batch_id%3Aexact%28%5C%22d334c245-898b-4caa-b96c-954b8be31414%5C%22%29+OR+batch_id%3Aexact%28%5C%222186cea6-3ae5-40c1-805e-878cc123ca5c%5C%22%29+OR+batch_id%3Aexact%28%5C%227438ef94-89a7-48ae-83d2-71eb8a138f52%5C%22%29%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-aipt-all]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+%28batch_id%3Aexact%28%5C%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%5C%22%29+OR+batch_id%3Aexact%28%5C%22215181db-45e2-474f-9c05-839eecf61cc5%5C%22%29+OR+batch_id%3Aexact%28%5C%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%5C%22%29+OR+batch_id%3Aexact%28%5C%22d334c245-898b-4caa-b96c-954b8be31414%5C%22%29+OR+batch_id%3Aexact%28%5C%222186cea6-3ae5-40c1-805e-878cc123ca5c%5C%22%29+OR+batch_id%3Aexact%28%5C%227438ef94-89a7-48ae-83d2-71eb8a138f52%5C%22%29%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-msgamp]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22message-amplifier%5C%22%7D+%5C%22c2a627b3%5C%22%22%7D%5D%7D&from=1741614600000&to=1741615800000
    [vlogs-asu-ir-field]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+message_type%3Aexact%28%5C%22intel_requests.created%5C%22%29+AND+intel_request_id%3Aexact%28%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-asu-ir-any]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+%5C%2256a219a6%5C%22%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-aipt-ir]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22ai-prediction-tester%5C%22%7D+intel_request_id%3Aexact%28%5C%2256a219a6-7597-42c7-85ea-1636cdfa8e53%5C%22%29%22%7D%5D%7D&from=1741608000000&to=1741636800000
    [vlogs-asu-errors-jan13]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22asset-status-updater%5C%22%7D+level%3Aexact%28%5C%22error%5C%22%29%22%7D%5D%7D&from=1736805600000&to=1736816400000
    [vlogs-replay-ir-breakdown]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D+%5C%22++intel_requests.created%5C%22%22%7D%5D%7D&from=1736899200000&to=1736985600000
    [vlogs-replay-totals]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D~%5C%22replay-from-loki-hour-.%2A%5C%22%7D+%5C%22TOTAL%5C%22%22%7D%5D%7D&from=1736899200000&to=1742169599000
    [vlogs-replay-sample]: http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bservice_name%3D%5C%22replay-from-loki-hour-18-4bht6%5C%22%7D+%5C%22REPLAY+SUMMARY%5C%22+OR+%5C%22TOTAL%5C%22%22%7D%5D%7D&from=1737072000000&to=1737158400000
  6. jmealo revised this gist Mar 16, 2026. 1 changed file with 150 additions and 3 deletions.
    153 changes: 150 additions & 3 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -205,11 +205,158 @@ By event type:
    3. **Investigate AMQP reconnection behavior** — the `PRECONDITION_FAILED - unknown delivery tag` error suggests messages processed on the old connection can't be ACKed after reconnect, potentially losing the entire in-flight batch
    4. **Consider adding logging to `_get_assets_by_name`** when it returns `[]` — currently this is a silent discard

    **Verification after replay**: grep `NEW:` lines from replay logs, extract `correlation_id`s, then:
    ---

    ## Runbook: Replay March 10 Data

    ### Prerequisites

    - [MR 29][mr29] merged and image built (`docker.gisual.net/backend/consumers/asset-status-updater:latest`)
    - March 10 data still in Loki retention (verify with step 1)
    - `kubectl` access to production cluster

    ### Step 1: Verify Loki has March 10 data

    ```bash
    kubectl run --rm -it loki-check --image=curlimages/curl --restart=Never -- \
    curl -s 'http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/query_range' \
    --data-urlencode 'query={source="events", name="intel_requests.created"}' \
    --data-urlencode 'start=1741564800' \
    --data-urlencode 'end=1741568400' \
    --data-urlencode 'limit=1' | head -c 200
    ```

    If empty/error → data has aged out, stop here.

    ### Step 2: Run replay for March 10 (24 parallel hourly jobs)

    ```bash
    cd ~/src/gisual/backend/consumers/asset-status-updater/scripts

    # Dry run first — prints manifests without applying
    ./generate-parallel-jobs.sh 2026-03-10 --dry-run | head -50

    # Apply (ensure IMAGE_TAG points to the build with MR 29)
    IMAGE_TAG=latest ./generate-parallel-jobs.sh 2026-03-10
    ```

    ### Step 3: Monitor progress

    ```bash
    # Watch job completion
    kubectl get jobs -l app.kubernetes.io/name=replay-from-loki -w

    # Tail a specific hour's logs
    kubectl logs -f job/replay-from-loki-hour-13

    # Check for errors across all jobs
    kubectl logs -l app.kubernetes.io/name=replay-from-loki --tail=5 | grep -E "^(ERROR|REPLAY SUMMARY)"
    ```

    ### Step 4: Collect REPLAY SUMMARY from all jobs

    Wait for all 24 jobs to complete, then:

    ```bash
    # Dump all summaries
    for hour in $(seq -w 0 23); do
    echo "=== Hour $hour ==="
    kubectl logs job/replay-from-loki-hour-$hour 2>/dev/null | grep -A 20 "REPLAY SUMMARY"
    echo
    done > /tmp/replay-march10-summaries.txt
    ```
    {service_name="asset-status-updater"} correlation_id:exact("{cid}") AND "Processing"

    Or via VictoriaLogs (after logs are shipped):
    ```
    {service_name=~"replay-from-loki-hour-.*"} "TOTAL"
    ```
    Present = ASU received it but processing failed silently. Absent = RabbitMQ never delivered it.
    Time range: from job start to +4 hours.

    ### Step 5: Extract NEW event IDs

    ```bash
    # Get all NEW lines (events ASU missed)
    for hour in $(seq -w 0 23); do
    kubectl logs job/replay-from-loki-hour-$hour 2>/dev/null | grep "^.*NEW:"
    done > /tmp/replay-march10-new-events.txt

    # Count by event type
    cat /tmp/replay-march10-new-events.txt | grep -oP 'NEW: \S+' | sort | uniq -c | sort -rn
    ```

    ### Step 6: Clean up jobs

    ```bash
    kubectl delete jobs -l app.kubernetes.io/name=replay-from-loki
    ```

    ---

    ## Runbook: Cross-Reference NEW Events Against ASU Production Logs

    Once the replay completes and `NEW:` lines are collected:

    ### Step 1: Extract correlation_ids from NEW lines

    ```bash
    # Parse correlation_ids from the enhanced log format
    grep "NEW:" /tmp/replay-march10-new-events.txt \
    | grep -oP 'correlation_id=\K[a-f0-9-]{36}' \
    | sort -u > /tmp/new-correlation-ids.txt

    wc -l /tmp/new-correlation-ids.txt # How many missed events
    ```

    ### Step 2: Check if ASU ever received these messages

    For each correlation_id, search ASU's production logs to see if it logged "Processing message type" for that CID:

    ```
    {service_name="asset-status-updater"} correlation_id:exact("{cid}") "Processing"
    ```

    Time range: March 10 00:00–24:00 UTC.

    **Batch check** (sample 10 CIDs):
    ```bash
    head -10 /tmp/new-correlation-ids.txt | while read cid; do
    echo -n "$cid: "
    # Use the MCP tool or curl to VictoriaLogs
    curl -s "http://vlogs-production.monitoring.svc.cluster.local/select/logsql/query" \
    --data-urlencode "query={service_name=\"asset-status-updater\"} correlation_id:exact(\"$cid\") \"Processing\"" \
    --data-urlencode "start=2026-03-10T00:00:00Z" \
    --data-urlencode "end=2026-03-11T00:00:00Z" \
    --data-urlencode "limit=1" | wc -c
    # >2 bytes = found, 0-2 bytes = not found
    done
    ```

    ### Step 3: Interpret results

    | ASU logged "Processing" for CID? | Meaning | Root cause |
    |---|---|---|
    | **No** | RabbitMQ never delivered the message to ASU | AMQP connection failure or queue-level loss |
    | **Yes**, and handler made API calls | ASU processed it but the POST failed (transient error) and retries exhausted | Retry/error-exchange exhaustion |
    | **Yes**, but 0 API calls (DISCARDED in replay) | `_get_assets_by_name` returned empty at original time | Redis cache miss or asset not yet created |

    ### Step 4: Correlate with AMQP errors

    Check if the missed events cluster around AMQP error windows:

    ```
    {service_name="asset-status-updater"} level:exact("error")
    _time:[2026-03-10T00:00:00Z, 2026-03-11T00:00:00Z]
    | stats by (_time:15m) count(*) as errors
    ```

    If NEW events cluster in the same 15-minute windows as errors → AMQP connection failure is the dominant cause. If they're spread evenly → application-layer (cache/handler) issue.

    ### Step 5: Document findings

    Update this RCA with:
    - Total NEW count for March 10 (compare to Cody's 2,798)
    - Breakdown: delivery-layer vs application-layer loss
    - Whether the pattern matches Jan 13 (hour-22 spike) or is different

    ---

  7. jmealo revised this gist Mar 16, 2026. 1 changed file with 41 additions and 9 deletions.
    50 changes: 41 additions & 9 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -167,21 +167,49 @@ By event type:
    ---

    ## Root Cause: Unknown
    ### 6. Two distinct loss mechanisms identified

    ASU loses messages at a consistent low rate (~0.08% overall, spiking to ~1% on specific days). event-recorder and AIPT (same library, same exchange) get 100%. Unverified theories:
    **Acute — AMQP connection failures**: On Jan 13 at 22:00 UTC, assets-api returned 500s which escalated into AMQP connection death (`CHANNEL_ERROR`, `PRECONDITION_FAILED - unknown delivery tag`, `Failed to ACK message ... 'NoneType'`). This single hour accounts for **1,975 of ~2,000 missed `intel_requests.created` events** on that day ([VLogs][vlogs-asu-errors-jan13]).

    1. **Silent drop in ASU handler** — early return without logging
    2. **Backpressure** — ASU is 3x slower per message (0.12-0.15s vs 0.05s)
    3. **Queue config** — different prefetch, replicas, or resource limits
    <details>
    <summary><strong>Evidence:</strong> Per-hour intel_requests.created replay results (Jan 13 data)</summary>

    | Hour (UTC) | New (missed) | 409s | Loss rate | Note |
    |---|---|---|---|---|
    | 00-21 | 0-4 each | 5,000-16,000 | <0.05% | Steady state |
    | **22** | **1,975** | **6,553** | **23.2%** | AMQP connection death |
    | 23 | 13 | 6,711 | 0.19% | Tail of recovery |

    > **Verify**: `{service_name=~"replay-from-loki-hour-.*"} " intel_requests.created"` on [Jan 15][vlogs-replay-ir-breakdown]
    </details>
    **Chronic — `asset_status.updated` code path**: The `_process_asset_status_updated_message` handler has a name-to-UUID resolution path ([processor.py:554-634][asu-cache-code]) that uses a Redis cache with 24h TTL. When the cache is cold or the asset doesn't exist yet, `_get_assets_by_name` returns `[]` → zero API calls → event silently discarded. Other event types (`power_status.updated`, `intel_requests.created`) use stable UUIDs from the message body and never hit this cache — explaining why they have 0% loss at steady state.

    | Event type | Asset identity | Cache dependency | Steady-state loss |
    |---|---|---|---|
    | `power_status.updated` | `intel_request_id` UUID in message | None | **0%** |
    | `intel_requests.created` | `intel_request_id` UUID in message | None | **~0%** (only during AMQP failures) |
    | `asset_status.updated` | Name→UUID via Redis cache ([source][asu-cache-code]) | 24h TTL | **~5%** |
    | `search.expired` | `intel_request_id` UUID in message | None | **~0%** (sporadic spikes) |

    ## Root Cause: Two Mechanisms

    1. **AMQP connection failures** cause bulk loss of all event types during the failure window. Messages in-flight at connection death may fail to ACK, and RabbitMQ's `PRECONDITION_FAILED` on the new channel means the redelivered messages can't be acknowledged either. This is the dominant cause for `intel_requests.created` loss (Cody's original report).

    2. **Redis cache misses in `_get_assets_by_name`** cause chronic `asset_status.updated` loss. When the cache returns empty (asset not yet created, or stale TTL), the handler returns `[]` and the event is silently discarded with no API call and no error log.

    ## Next Steps

    1. **Deploy [MR 17][mr17] on ASU** with `LOG_ON_MESSAGE=true` — proves whether messages reach ASU's callback
    2. **Audit ASU handler** for silent return paths on `intel_requests.created`
    3. **Compare ASU queue config** to event-recorder (prefetch, replicas, limits)
    1. **Replay March 10 data** with enhanced logging ([MR 29][mr29]) — the `NEW:` lines now include `correlation_id` and `batch_id`, letting us cross-reference against ASU production logs to confirm whether each missed event was a delivery failure (CID absent from ASU logs) or a processing failure (CID present but no API call)
    2. **Deploy [MR 17][mr17] on ASU** with `LOG_ON_MESSAGE=true` — proves whether messages reach ASU's AMQP callback at all
    3. **Investigate AMQP reconnection behavior** — the `PRECONDITION_FAILED - unknown delivery tag` error suggests messages processed on the old connection can't be ACKed after reconnect, potentially losing the entire in-flight batch
    4. **Consider adding logging to `_get_assets_by_name`** when it returns `[]` — currently this is a silent discard

    **Verification after MR 17**: compare `"Received message"` count on ASU vs event-recorder for same time window. Equal = handler bug. Lower = queue/delivery issue.
    **Verification after replay**: grep `NEW:` lines from replay logs, extract `correlation_id`s, then:
    ```
    {service_name="asset-status-updater"} correlation_id:exact("{cid}") AND "Processing"
    ```
    Present = ASU received it but processing failed silently. Absent = RabbitMQ never delivered it.

    ---

    @@ -209,6 +237,10 @@ ASU loses messages at a consistent low rate (~0.08% overall, spiking to ~1% on s
    [vlogs-asu-ir-field]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-any]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20%2256a219a6%22&start=1741608000000&end=1741636800000
    [vlogs-aipt-ir]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [mr29]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/merge_requests/29
    [asu-cache-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L554-634
    [vlogs-asu-errors-jan13]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D%22asset-status-updater%22%7D%20level%3Aexact%28%22error%22%29&start=1736805600000&end=1736816400000
    [vlogs-replay-ir-breakdown]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22%20%20intel_requests.created%22&start=1736899200000&end=1736985600000
    [replay-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py
    [replay-409]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py#L126-197
    [vlogs-replay-totals]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22TOTAL%22&start=1736899200000&end=1742169599000
  8. jmealo revised this gist Mar 16, 2026. 1 changed file with 50 additions and 1 deletion.
    51 changes: 50 additions & 1 deletion 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -120,11 +120,56 @@ Vector's VRL pipeline ([`07-extract-ids.vrl`][vrl-file]) also extracts IDs from
    - The `_on_message` diagnostic logging in [MR 17][mr17] is still the right next diagnostic step
    - Consumer-side loss IS happening — just only in ASU, not everywhere

    ### 5. Replay experiment independently confirms ASU-specific loss

    A [replay tool][replay-code] was built into the ASU image and run as K8s Jobs in production. It reads events from Loki (the same events event-recorder stored), replays them through ASU's `Processor`, and counts: **409 = already processed** (ASU had it), **2xx = new** (ASU missed it). The tool intentionally does NOT add 409 to `allowed_errors` so duplicates are counted separately ([source][replay-409]).

    200 replay jobs ran across 4 dates, covering multiple days of production data:

    | Run date | Jobs | New (missed) | 409s (had it) | Loss rate |
    |---|---|---|---|---|
    | 2026-01-15 | 35 | 4,438 | 428,977 | 1.024% |
    | 2026-01-16 | 112 | 608 | 2,968,829 | 0.021% |
    | 2026-01-17 | 5 | 272 | 3,024,879 | 0.009% |
    | 2026-02-16 | 48 | 183 | 522,793 | 0.035% |
    | **Total** | **200** | **5,501** | **6,945,478** | **0.079%** |

    > **Verify**: `{service_name=~"replay-from-loki-hour-.*"} "TOTAL"` over [Jan 15 – Mar 16][vlogs-replay-totals]
    The Jan 15 run had the highest rate (1.02%) because it was the first replay — subsequent runs found fewer new events as the replay itself backfilled gaps. The steady-state rate of ~0.02-0.04% on reruns suggests the loss is ongoing but at a lower rate than Cody's Mar 10 sample (0.93%).

    <details>
    <summary><strong>Evidence:</strong> Sample replay output (hour 18, Dec 29 data)</summary>

    Pod: `replay-from-loki-hour-18-4bht6`, ran 2026-01-17T13:47Z, replaying Dec 29 18:00-19:00 UTC

    ```
    REPLAY SUMMARY
    Total events fetched from Loki: 952,391
    Successfully processed (new): 104
    Duplicates (409): 872,513
    Discarded (no API call): 79,774
    Errors: 0
    By event type:
    Event Type New 409s Skip
    asset_status.updated 13 788 2047
    incidents.deleted 0 5 7
    incidents.updated 0 532 22060
    intel_requests.created 1 41112 68
    power_status.updated 0 62941 55574
    search.expired 90 767135 18
    TOTAL 104 872513 79774
    ```

    > **Verify**: `{service_name="replay-from-loki-hour-18-4bht6"} "REPLAY SUMMARY" OR "TOTAL"` on [Jan 17][vlogs-replay-sample]
    </details>
    ---

    ## Root Cause: Unknown

    ASU loses ~4% of `intel_requests.created` messages consistently. event-recorder and AIPT (same library, same exchange) get 100%. Unverified theories:
    ASU loses messages at a consistent low rate (~0.08% overall, spiking to ~1% on specific days). event-recorder and AIPT (same library, same exchange) get 100%. Unverified theories:

    1. **Silent drop in ASU handler** — early return without logging
    2. **Backpressure** — ASU is 3x slower per message (0.12-0.15s vs 0.05s)
    @@ -164,3 +209,7 @@ ASU loses ~4% of `intel_requests.created` messages consistently. event-recorder
    [vlogs-asu-ir-field]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-any]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20%2256a219a6%22&start=1741608000000&end=1741636800000
    [vlogs-aipt-ir]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [replay-code]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py
    [replay-409]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/main/asset_status_updater/cli/replay_from_loki.py#L126-197
    [vlogs-replay-totals]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D~%22replay-from-loki-hour-.%2A%22%7D%20%22TOTAL%22&start=1736899200000&end=1742169599000
    [vlogs-replay-sample]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=%7Bservice_name%3D%22replay-from-loki-hour-18-4bht6%22%7D%20%22REPLAY%20SUMMARY%22%20OR%20%22TOTAL%22&start=1737072000000&end=1737158400000
  9. jmealo revised this gist Mar 16, 2026. 1 changed file with 10 additions and 10 deletions.
    20 changes: 10 additions & 10 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -154,13 +154,13 @@ ASU loses ~4% of `intel_requests.created` messages consistently. event-recorder
    [vrl-batch]: https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L213-219
    [mr17]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/merge_requests/17
    [mr48]: https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48
    [vlogs-asu]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-er]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-aipt]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-asu-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-msgamp]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22message-amplifier%22%29%20AND%20%22c2a627b3%22&start=1741614600000&end=1741615800000
    [vlogs-er-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-aipt-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-field]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-any]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20%2256a219a6%22&start=1741608000000&end=1741636800000
    [vlogs-aipt-ir]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-er]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-aipt]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-asu-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-msgamp]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22message-amplifier%22%29%20AND%20%22c2a627b3%22&start=1741614600000&end=1741615800000
    [vlogs-er-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-aipt-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-field]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-any]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20%2256a219a6%22&start=1741608000000&end=1741636800000
    [vlogs-aipt-ir]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
  10. jmealo revised this gist Mar 16, 2026. 1 changed file with 13 additions and 6 deletions.
    19 changes: 13 additions & 6 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -39,7 +39,7 @@ Counted `intel_requests.created` processing logs per batch per consumer. Each me
    <details>
    <summary><strong>Evidence:</strong> Raw VictoriaLogs stats query + output</summary>

    Query (run per service):
    Query (run per service — replace `{service}` with service name):
    ```
    service_name:exact("{service}") AND message_type:exact("intel_requests.created")
    AND (batch_id:exact("c2a627b3-3f7c-4c89-808c-4d79b19c69cb")
    @@ -52,6 +52,8 @@ _time:[2026-03-10T13:00:00Z, 2026-03-10T20:00:00Z]
    | stats by (batch_id) count(*) as log_lines
    ```

    View raw logs: [event-recorder][vlogs-er-all] | [AIPT][vlogs-aipt-all] | [ASU][vlogs-asu-all]

    | service | `c2a627b3` | `215181db` | `812f79b7` | `d334c245` | `2186cea6` | `7438ef94` | **total** | **÷2** |
    |---|---|---|---|---|---|---|---|---|
    | **event-recorder** | 348 | 44 | 136 | 128 | 44 | 108 | 808 | **404** |
    @@ -100,11 +102,11 @@ WHERE intel_request_id IN ('56a219a6-...', 'bc2d6cef-...', ...)

    Why searching by `intel_request_id` is misleading:

    | Service | `intel_request_id` in VictoriaLogs? | Source | Appears on `intel_requests.created` lines? |
    |---|---|---|---|
    | **ASU** | Yes | Vector extracts from `(Intel Request {uuid})` in `_msg` ([VRL][vrl-ir], [Python][asu-discard]) | **No** — only on `power_status.updated` discard lines |
    | **AIPT** | Yes | Python logs it on `fetch prediction` step ([source][aipt-fetch]) | **No**on a different log line, not Processing/Processed |
    | **event-recorder** | **No** | Only stores `correlation_id`, `organization_id` as metadata ([source][er-metadata]) | N/A |
    | Service | `intel_request_id` in VictoriaLogs? | Source | On `intel_requests.created` lines? | Proof |
    |---|---|---|---|---|
    | **ASU** | Yes | Vector extracts from `(Intel Request {uuid})` in `_msg` ([VRL][vrl-ir], [Python][asu-discard]) | **No** — only on `power_status.updated` discard lines | [search returns 0 results][vlogs-asu-ir-field], [free-text finds only discard line][vlogs-asu-ir-any] |
    | **AIPT** | Yes | Python logs it on `fetch prediction` step ([source][aipt-fetch]) | **No** — different log line | [finds `fetch prediction`, not Processing/Processed][vlogs-aipt-ir] |
    | **event-recorder** | **No** | Only stores `correlation_id`, `organization_id` as metadata ([source][er-metadata]) | N/A ||

    The shared amqp-consumer library ([source][lib-log]) logs `correlation_id`, `request_id`, and `batch_id` on every Processing/Processed line. It never logs `intel_request_id`. These context fields are set at [L632-642][lib-context].

    @@ -157,3 +159,8 @@ ASU loses ~4% of `intel_requests.created` messages consistently. event-recorder
    [vlogs-aipt]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-asu-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-msgamp]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22message-amplifier%22%29%20AND%20%22c2a627b3%22&start=1741614600000&end=1741615800000
    [vlogs-er-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-aipt-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-field]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
    [vlogs-asu-ir-any]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20%2256a219a6%22&start=1741608000000&end=1741636800000
    [vlogs-aipt-ir]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20intel_request_id%3Aexact%28%2256a219a6-7597-42c7-85ea-1636cdfa8e53%22%29&start=1741608000000&end=1741636800000
  11. jmealo revised this gist Mar 16, 2026. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -37,7 +37,7 @@ Counted `intel_requests.created` processing logs per batch per consumer. Each me
    > **Verify**: [ASU batch c2a627b3][vlogs-asu] (expect 336 lines) vs [event-recorder][vlogs-er] (expect 348 lines) vs [AIPT][vlogs-aipt] (expect 348 lines) — or [ASU all 6 batches][vlogs-asu-all] (expect 776 lines)
    <details>
    <summary>Raw VictoriaLogs stats output</summary>
    <summary><strong>Evidence:</strong> Raw VictoriaLogs stats query + output</summary>

    Query (run per service):
    ```
    @@ -71,7 +71,7 @@ intel_request_id → metadata->>'correlation_id' (per-location, logged as CID)
    The `batch_id` is set to `self.correlation_id` in intel-requests-api ([source][batch-id-code]). The mapping query was run on `production-app-db-2-restore` partition `intel_requests_p20260301`.

    <details>
    <summary>DB query and results</summary>
    <summary><strong>Evidence:</strong> DB query and ID mapping results</summary>

    ```sql
    SELECT intel_request_id::text,
  12. jmealo revised this gist Mar 16, 2026. 1 changed file with 5 additions and 3 deletions.
    8 changes: 5 additions & 3 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -52,9 +52,11 @@ _time:[2026-03-10T13:00:00Z, 2026-03-10T20:00:00Z]
    | stats by (batch_id) count(*) as log_lines
    ```

    - **event-recorder**: c2a627b3=348, 215181db=44, 812f79b7=136, d334c245=128, 2186cea6=44, 7438ef94=108 (total 808 ÷ 2 = 404)
    - **AIPT**: c2a627b3=348, 215181db=44, 812f79b7=136, d334c245=128, 2186cea6=44, 7438ef94=108 (total 808 ÷ 2 = 404)
    - **ASU**: c2a627b3=336, 215181db=42, 812f79b7=132, d334c245=124, 2186cea6=40, 7438ef94=102 (total 776 ÷ 2 = 388)
    | service | `c2a627b3` | `215181db` | `812f79b7` | `d334c245` | `2186cea6` | `7438ef94` | **total** | **÷2** |
    |---|---|---|---|---|---|---|---|---|
    | **event-recorder** | 348 | 44 | 136 | 128 | 44 | 108 | 808 | **404** |
    | **AIPT** | 348 | 44 | 136 | 128 | 44 | 108 | 808 | **404** |
    | **ASU** | 336 | 42 | 132 | 124 | 40 | 102 | 776 | **388** |
    </details>

    ### 2. The previous RCA searched by the wrong ID
  13. jmealo revised this gist Mar 16, 2026. 1 changed file with 7 additions and 4 deletions.
    11 changes: 7 additions & 4 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,12 @@
    # RCA: Missing intel_requests.created (ENG-3349) — Corrected

    **Status**: PROVISIONAL — ASU-specific loss confirmed, root cause unknown
    **Supersedes**: [Previous RCA](https://gist.github.com/jmealo/d81250648d67089111676b13de6120e7)
    **Incident date**: 2026-03-10 | **Analysis date**: 2026-03-16
    **Reverted**: [intel-requests-api !48](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48) (batch_id change from !47)
    | | |
    |---|---|
    | **Status** | PROVISIONAL — ASU-specific loss confirmed, root cause unknown |
    | **Supersedes** | [Previous RCA](https://gist.github.com/jmealo/d81250648d67089111676b13de6120e7) |
    | **Incident date** | 2026-03-10 |
    | **Analysis date** | 2026-03-16 |
    | **Reverted** | [intel-requests-api !48](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48) (batch_id change from !47) |

    ---

  14. jmealo revised this gist Mar 16, 2026. 1 changed file with 104 additions and 107 deletions.
    211 changes: 104 additions & 107 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -1,54 +1,27 @@
    # RCA: Missing intel_requests.created — Corrected Analysis (ENG-3349)
    # RCA: Missing intel_requests.created (ENG-3349) — Corrected

    **Status**: PROVISIONAL — ASU-specific loss confirmed, root cause unknown
    **Date**: 2026-03-10 incident, 2026-03-16 re-analysis
    **JIRA**: [ENG-3349](https://gisual.atlassian.net/browse/ENG-3349)
    **Previous RCA**: [Gist d81250648d](https://gist.github.com/jmealo/d81250648d67089111676b13de6120e7) (superseded by this document)
    **Supersedes**: [Previous RCA](https://gist.github.com/jmealo/d81250648d67089111676b13de6120e7)
    **Incident date**: 2026-03-10 | **Analysis date**: 2026-03-16
    **Reverted**: [intel-requests-api !48](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48) (batch_id change from !47)

    ---

    ## Summary
    ## TL;DR

    2,798 of 300,839 intel requests on March 10 never had facts created (~0.93%). Re-analysis of 6 sample batches (404 IRs) using correct identifiers shows:
    2,798/300,839 intel requests on March 10 never had facts created (~0.93%).

    - **event-recorder**: 404/404 — **0% loss**[^stats-er]
    - **AIPT**: 404/404 — **0% loss**[^stats-aipt]
    - **ASU**: 388/404 — **3.96% loss**, consistent across every batch[^stats-asu]
    The previous RCA said all consumers were losing messages due to a shutdown race in amqp-consumer. **Wrong.** Only ASU is losing them. event-recorder and AIPT received 100%. The previous analysis searched logs by `intel_request_id` — a field that doesn't exist on the log lines that prove delivery.

    **The loss is ASU-specific.** The previous RCA incorrectly attributed loss to all consumers due to a search methodology error[^prev-error].

    ## What the Previous RCA Got Wrong

    | Claim | Correction |
    |---|---|
    | "event-recorder missed 8,146" | 0% loss — received every message in all sampled batches[^stats-er] |
    | "AIPT received but ASU didn't" | AIPT also received 100%[^stats-aipt] |
    | "Independent consumer-side shutdown race" | Only ASU loses messages — shared library bug would affect all equally |
    | MR 47: separate batch_id fixes lock contention | Broke traceability, didn't address ASU loss — reverted[^mr48] |

    The error: consumer logs index messages by `correlation_id` and `batch_id`, not `intel_request_id`. The original searched by `intel_request_id`, found nothing, and concluded messages were lost. In reality, the messages were there under different identifiers[^id-mapping].

    ## Tracing Methodology

    ### ID Mapping

    Cody provided `intel_request_id` values. Consumer logs don't contain this field on processing lines[^field-proof]. The mapping is:

    ```
    intel_request_id → DB metadata column[^db-query]
    └→ correlation_id (unique per location, logged as CID)
    └→ batch_id (shared per API request, = original correlation_id)[^batch-id-code]
    ```

    10 sample IRs mapped to 6 distinct batches, all org `9de5d801`[^db-query].
    ---

    ### Counting Method
    ## Findings

    Each processed message produces exactly 2 log lines (`Processing...` + `Processed...`) from the shared amqp-consumer library[^lib-logging]. Counting `message_type:exact("intel_requests.created")` lines per `batch_id` per service, divided by 2, gives IRs processed[^stats-query].
    ### 1. Only ASU drops messages

    ## Results
    Counted `intel_requests.created` processing logs per batch per consumer. Each message = 2 log lines (Processing + Processed) from the shared amqp-consumer library ([source][lib-log]).

    | batch_id | event-recorder | AIPT | ASU | ASU missing |
    | batch_id | event-recorder | AIPT | ASU | ASU lost |
    |---|---|---|---|---|
    | `c2a627b3` | 174 | 174 | 168 | 6 (3.4%) |
    | `215181db` | 22 | 22 | 21 | 1 (4.5%) |
    @@ -58,100 +31,124 @@ Each processed message produces exactly 2 log lines (`Processing...` + `Processe
    | `7438ef94` | 54 | 54 | 51 | 3 (5.6%) |
    | **Total** | **404** | **404** | **388** | **16 (3.96%)** |

    ## Why `intel_request_id` Appears as a Structured Field (But Misleads)

    Vector's VRL pipeline[^vrl-extract-ids] extracts IDs from unstructured log text into structured VictoriaLogs fields:

    | Log pattern | Extracted field | VRL source |
    |---|---|---|
    | `(Intel Request {uuid})` | `intel_request_id` | [^vrl-ir] |
    | `(CID {uuid})` | `correlation_id` | [^vrl-cid] |
    | `(Batch ID {uuid})` | `batch_id` | [^vrl-batch] |

    ASU logs `(Intel Request {id}): Received power_status.updated...`[^asu-discard-code] on discard lines — Vector extracts this into `intel_request_id`. But the generic `Processing/Processed message type: intel_requests.created` lines[^lib-logging] don't contain that pattern, so **`intel_request_id` is absent on the lines that prove message delivery**.

    This creates a trap: searching `intel_request_id:exact("56a219a6...")` finds ASU's `power_status.updated` discard (hours later, different batch) but misses the `intel_requests.created` processing entirely.

    Event-recorder never logs individual IR IDs at all[^er-metadata] — it only stores `event_type`, `correlation_id`, and `organization_id` as structured metadata.

    ## What We Don't Know
    > **Verify**: [ASU batch c2a627b3][vlogs-asu] (expect 336 lines) vs [event-recorder][vlogs-er] (expect 348 lines) vs [AIPT][vlogs-aipt] (expect 348 lines) — or [ASU all 6 batches][vlogs-asu-all] (expect 776 lines)
    The root cause of ASU's ~4% loss is unidentified. Theories (all unverified):

    1. **Silent discard in ASU handler** — an early return path that doesn't log[^asu-discard-code]
    2. **Backpressure** — ASU processing time (0.12-0.15s/msg) is 3x slower than event-recorder (0.05s/msg)
    3. **Queue configuration** — different prefetch/QoS or replica count

    ## Next Steps

    1. **Deploy `_on_message` logging (MR 17[^mr17]) on ASU** — determines whether messages reach ASU's callback at all
    2. **Audit ASU handler for silent drops** — check all return paths in `intel_requests.created` processing
    3. **Compare ASU queue config** to event-recorder and AIPT (prefetch, replicas, resource limits)

    ### Verification Query (post MR 17 deployment)
    <details>
    <summary>Raw VictoriaLogs stats output</summary>

    Query (run per service):
    ```
    service_name:exact("asset-status-updater") AND "Received message" AND "intel_requests.created"
    service_name:exact("{service}") AND message_type:exact("intel_requests.created")
    AND (batch_id:exact("c2a627b3-3f7c-4c89-808c-4d79b19c69cb")
    OR batch_id:exact("215181db-45e2-474f-9c05-839eecf61cc5")
    OR batch_id:exact("812f79b7-c50d-4fd2-ac28-cc5d0318ee23")
    OR batch_id:exact("d334c245-898b-4caa-b96c-954b8be31414")
    OR batch_id:exact("2186cea6-3ae5-40c1-805e-878cc123ca5c")
    OR batch_id:exact("7438ef94-89a7-48ae-83d2-71eb8a138f52"))
    _time:[2026-03-10T13:00:00Z, 2026-03-10T20:00:00Z]
    | stats by (batch_id) count(*) as log_lines
    ```

    Compare count to event-recorder's count for the same time window. If equal → loss is post-delivery (handler bug). If lower → loss is pre-delivery (RabbitMQ/queue issue).

    ## Actions Taken

    - **Reverted MR 47**[^mr48] — restored `correlation_id` as `batch_id` (traceability fix)
    - MR 17[^mr17] still open — diagnostic logging, pending Diego's review of shutdown protection

    ---
    - **event-recorder**: c2a627b3=348, 215181db=44, 812f79b7=136, d334c245=128, 2186cea6=44, 7438ef94=108 (total 808 ÷ 2 = 404)
    - **AIPT**: c2a627b3=348, 215181db=44, 812f79b7=136, d334c245=128, 2186cea6=44, 7438ef94=108 (total 808 ÷ 2 = 404)
    - **ASU**: c2a627b3=336, 215181db=42, 812f79b7=132, d334c245=124, 2186cea6=40, 7438ef94=102 (total 776 ÷ 2 = 388)
    </details>

    ## Grounding
    ### 2. The previous RCA searched by the wrong ID

    ### Database
    Cody provided `intel_request_id` values. These don't appear on message-processing log lines — those lines only have `correlation_id` (CID) and `batch_id`. To trace, you must map through the DB:

    [^db-query]: `SELECT intel_request_id::text, (metadata->>'correlation_id')::text, (metadata->>'batch_id')::text FROM intel_requests.intel_requests_p20260301 WHERE intel_request_id IN (...)` on `production-app-db-2-restore`. 10 IRs → 6 batches, all org `9de5d801`.

    [^id-mapping]: `intel_request_id` is stored in the DB row. `correlation_id` and `batch_id` are in `metadata` JSONB. Consumer logs only contain the latter two on processing lines.

    ### VictoriaLogs Queries
    ```
    intel_request_id → metadata->>'correlation_id' (per-location, logged as CID)
    → metadata->>'batch_id' (per-request, = original correlation_id)
    ```

    [^stats-query]: `victorialogs_stats_query` with `service_name:exact("{service}") AND message_type:exact("intel_requests.created") AND (batch_id:exact("c2a627b3-...") OR ...) _time:[2026-03-10T13:00:00Z, 2026-03-10T20:00:00Z] | stats by (batch_id) count(*) as log_lines` run per service at `time=2026-03-10T20:00:00Z`.
    The `batch_id` is set to `self.correlation_id` in intel-requests-api ([source][batch-id-code]). The mapping query was run on `production-app-db-2-restore` partition `intel_requests_p20260301`.

    [^stats-er]: event-recorder raw: `c2a627b3`=348, `215181db`=44, `812f79b7`=136, `d334c245`=128, `2186cea6`=44, `7438ef94`=108. Total=808 lines ÷2 = 404 IRs.
    <details>
    <summary>DB query and results</summary>

    [^stats-aipt]: AIPT raw: `c2a627b3`=348, `215181db`=44, `812f79b7`=136, `d334c245`=128, `2186cea6`=44, `7438ef94`=108. Total=808 lines ÷2 = 404 IRs.
    ```sql
    SELECT intel_request_id::text,
    (metadata->>'correlation_id')::text as correlation_id,
    (metadata->>'batch_id')::text as batch_id,
    created::text
    FROM intel_requests.intel_requests_p20260301
    WHERE intel_request_id IN ('56a219a6-...', 'bc2d6cef-...', ...)
    ```

    [^stats-asu]: ASU raw: `c2a627b3`=336, `215181db`=42, `812f79b7`=132, `d334c245`=124, `2186cea6`=40, `7438ef94`=102. Total=776 lines ÷2 = 388 IRs.
    | intel_request_id | correlation_id | batch_id | created (UTC) |
    |---|---|---|---|
    | `56a219a6` | `11c84002` | `c2a627b3` | 13:55:46 |
    | `bc2d6cef` | `e12e237a` | `c2a627b3` | 13:55:46 |
    | `eac57da4` | `cec14379` | `215181db` | 13:55:50 |
    | `27b6e33c` | `e64cd0a6` | `215181db` | 13:55:50 |
    | `57007346` | `e5234837` | `812f79b7` | 15:18:44 |
    | `819448ae` | `59355b93` | `812f79b7` | 15:18:44 |
    | `55941d68` | `ab310e6b` | `812f79b7` | 15:18:44 |
    | `9f3454be` | `af861b31` | `d334c245` | 15:19:14 |
    | `2cfbd887` | `a578785d` | `2186cea6` | 15:20:27 |
    | `90125c20` | `602b7ea5` | `7438ef94` | 15:20:27 |
    </details>

    [^field-proof]: `victorialogs_field_names` for `message_type:exact("intel_requests.created")` per service, 2026-03-10T13:00-14:00Z. ASU: `intel_request_id` present (27,478 hits on co-stored lines, zero hits when filtered to `intel_requests.created` + specific IR ID). event-recorder: field absent entirely.
    ### 3. Logging is inconsistent across services — and Vector adds a layer

    ### Grafana Verification Links
    Why searching by `intel_request_id` is misleading:

    - [ASU batch c2a627b3](http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=2026-03-10T13%3A50%3A00Z&to=2026-03-10T14%3A10%3A00Z&expr=service_name%3Aexact%28%22asset-status-updater%22%29+AND+message_type%3Aexact%28%22intel_requests.created%22%29+AND+batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29) — expect 336 lines
    - [event-recorder batch c2a627b3](http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=2026-03-10T13%3A50%3A00Z&to=2026-03-10T14%3A10%3A00Z&expr=service_name%3Aexact%28%22event-recorder%22%29+AND+message_type%3Aexact%28%22intel_requests.created%22%29+AND+batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29) — expect 348 lines
    | Service | `intel_request_id` in VictoriaLogs? | Source | Appears on `intel_requests.created` lines? |
    |---|---|---|---|
    | **ASU** | Yes | Vector extracts from `(Intel Request {uuid})` in `_msg` ([VRL][vrl-ir], [Python][asu-discard]) | **No** — only on `power_status.updated` discard lines |
    | **AIPT** | Yes | Python logs it on `fetch prediction` step ([source][aipt-fetch]) | **No** — on a different log line, not Processing/Processed |
    | **event-recorder** | **No** | Only stores `correlation_id`, `organization_id` as metadata ([source][er-metadata]) | N/A |

    ### Source Code
    The shared amqp-consumer library ([source][lib-log]) logs `correlation_id`, `request_id`, and `batch_id` on every Processing/Processed line. It never logs `intel_request_id`. These context fields are set at [L632-642][lib-context].

    [^lib-logging]: amqp-consumer generic logging — [`gisual-amqp-consumer/app.py:657-664`](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L657-664) (Processing) and [`L717-725`](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L717-725) (Processed). Logs `correlation_id`, `request_id`, `batch_id` — never `intel_request_id`. Logger context fields set at [`L632-642`](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L632-642).
    Vector's VRL pipeline ([`07-extract-ids.vrl`][vrl-file]) also extracts IDs from unstructured `_msg` text — `(CID {uuid})` `correlation_id` ([VRL][vrl-cid]), `(Batch ID {uuid})``batch_id` ([VRL][vrl-batch]). All patterns use `if !exists(...)` guards so they don't conflict with structured JSON fields.

    [^batch-id-code]: `batch_id = self.correlation_id` in intel-requests-api — [`intel_requests.py:1079`](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/blob/f8d1813/intel_requests_api/intel_requests.py#L1079) (post-revert).
    **Impact**: The structured fields in VictoriaLogs are correct for our batch-counting methodology. But searching by `intel_request_id` will find the wrong log lines (discard messages, not delivery proof).

    [^asu-discard-code]: ASU `power_on` discard logging — [`processor.py:248-252`](https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L248-252). Logs `intel_request_id` in `_msg` text as `(Intel Request %s)`.
    ### 4. What the previous RCA got right

    [^er-metadata]: event-recorder Loki structured metadata — [`app.py:372-387`](https://gitlab.com/gisual/backend/consumers/event-recorder/-/blob/d8e53d560ef911d55a1029f95853ded12b47beed/event_recorder/app.py#L372-387). Only `event_type`, `correlation_id`, `organization_id`, `organization_short_name`.
    - Lock contention between `incidents.received` and `intel_requests.created` in message-amplifier IS real (confirmed in [message-amplifier logs for batch c2a627b3][vlogs-msgamp])
    - The `_on_message` diagnostic logging in [MR 17][mr17] is still the right next diagnostic step
    - Consumer-side loss IS happening — just only in ASU, not everywhere

    ### Vector VRL (Log Field Extraction)
    ---

    [^vrl-extract-ids]: Vector VRL ID extraction pipeline — [`vector/vrl/07-extract-ids.vrl`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl). All patterns use `if !exists(...)` guards (only extract from text when field absent from structured JSON).
    ## Root Cause: Unknown

    [^vrl-ir]: `intel_request_id` from `(Intel Request {uuid})`[`07-extract-ids.vrl:198-203`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L198-203).
    ASU loses ~4% of `intel_requests.created` messages consistently. event-recorder and AIPT (same library, same exchange) get 100%. Unverified theories:

    [^vrl-cid]: `correlation_id` from `(CID {uuid})`[`07-extract-ids.vrl:205-211`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L205-211).
    1. **Silent drop in ASU handler** — early return without logging
    2. **Backpressure** — ASU is 3x slower per message (0.12-0.15s vs 0.05s)
    3. **Queue config** — different prefetch, replicas, or resource limits

    [^vrl-batch]: `batch_id` from `(Batch ID {uuid})`[`07-extract-ids.vrl:213-219`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L213-219).
    ## Next Steps

    ### MRs
    1. **Deploy [MR 17][mr17] on ASU** with `LOG_ON_MESSAGE=true` — proves whether messages reach ASU's callback
    2. **Audit ASU handler** for silent return paths on `intel_requests.created`
    3. **Compare ASU queue config** to event-recorder (prefetch, replicas, limits)

    [^mr48]: Revert MR [intel-requests-api !48](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48) (merged 2026-03-16). Restores `self.correlation_id` as `batch_id`.
    **Verification after MR 17**: compare `"Received message"` count on ASU vs event-recorder for same time window. Equal = handler bug. Lower = queue/delivery issue.

    [^mr17]: Diagnostic logging MR — [gisual-amqp-consumer !17](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/merge_requests/17) (open, pending review).
    ---

    [^prev-error]: Original RCA gist did not preserve exact queries used. The gist states findings like "No log for IR 56a219a6" without query syntax. Based on field analysis, the search likely used `intel_request_id` or free-text IR UUID, which doesn't match `intel_requests.created` processing lines.
    <!-- Reference-style links — kept at bottom, GitHub renders them as clickable -->

    [lib-log]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L657-725
    [lib-context]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L632-642
    [batch-id-code]: https://gitlab.com/gisual/backend/apis/intel-requests-api/-/blob/f8d1813/intel_requests_api/intel_requests.py#L1079
    [asu-discard]: https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L248-252
    [aipt-fetch]: https://gitlab.com/gisual/backend/consumers/ai-prediction-tester/-/blob/1ed1fc570cb3904bc79373caf5f5b1008c93621e/ai_prediction_tester/app.py#L219-224
    [er-metadata]: https://gitlab.com/gisual/backend/consumers/event-recorder/-/blob/d8e53d560ef911d55a1029f95853ded12b47beed/event_recorder/app.py#L372-387
    [vrl-file]: https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl
    [vrl-ir]: https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L198-203
    [vrl-cid]: https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L205-211
    [vrl-batch]: https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L213-219
    [mr17]: https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/merge_requests/17
    [mr48]: https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48
    [vlogs-asu]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-er]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22event-recorder%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-aipt]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22ai-prediction-tester%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29&start=1741614600000&end=1741615800000
    [vlogs-asu-all]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22asset-status-updater%22%29%20AND%20message_type%3Aexact%28%22intel_requests.created%22%29%20AND%20%28batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29%20OR%20batch_id%3Aexact%28%22215181db-45e2-474f-9c05-839eecf61cc5%22%29%20OR%20batch_id%3Aexact%28%22812f79b7-c50d-4fd2-ac28-cc5d0318ee23%22%29%20OR%20batch_id%3Aexact%28%22d334c245-898b-4caa-b96c-954b8be31414%22%29%20OR%20batch_id%3Aexact%28%222186cea6-3ae5-40c1-805e-878cc123ca5c%22%29%20OR%20batch_id%3Aexact%28%227438ef94-89a7-48ae-83d2-71eb8a138f52%22%29%29&start=1741608000000&end=1741636800000
    [vlogs-msgamp]: http://vlogs-production.monitoring.svc.cluster.local/select/vmui/#/logsql?query=service_name%3Aexact%28%22message-amplifier%22%29%20AND%20%22c2a627b3%22&start=1741614600000&end=1741615800000
  15. jmealo created this gist Mar 16, 2026.
    157 changes: 157 additions & 0 deletions 2026-03-16-rca-missing-intel-requests-created.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,157 @@
    # RCA: Missing intel_requests.created — Corrected Analysis (ENG-3349)

    **Status**: PROVISIONAL — ASU-specific loss confirmed, root cause unknown
    **Date**: 2026-03-10 incident, 2026-03-16 re-analysis
    **JIRA**: [ENG-3349](https://gisual.atlassian.net/browse/ENG-3349)
    **Previous RCA**: [Gist d81250648d](https://gist.github.com/jmealo/d81250648d67089111676b13de6120e7) (superseded by this document)

    ---

    ## Summary

    2,798 of 300,839 intel requests on March 10 never had facts created (~0.93%). Re-analysis of 6 sample batches (404 IRs) using correct identifiers shows:

    - **event-recorder**: 404/404 — **0% loss**[^stats-er]
    - **AIPT**: 404/404 — **0% loss**[^stats-aipt]
    - **ASU**: 388/404 — **3.96% loss**, consistent across every batch[^stats-asu]

    **The loss is ASU-specific.** The previous RCA incorrectly attributed loss to all consumers due to a search methodology error[^prev-error].

    ## What the Previous RCA Got Wrong

    | Claim | Correction |
    |---|---|
    | "event-recorder missed 8,146" | 0% loss — received every message in all sampled batches[^stats-er] |
    | "AIPT received but ASU didn't" | AIPT also received 100%[^stats-aipt] |
    | "Independent consumer-side shutdown race" | Only ASU loses messages — shared library bug would affect all equally |
    | MR 47: separate batch_id fixes lock contention | Broke traceability, didn't address ASU loss — reverted[^mr48] |

    The error: consumer logs index messages by `correlation_id` and `batch_id`, not `intel_request_id`. The original searched by `intel_request_id`, found nothing, and concluded messages were lost. In reality, the messages were there under different identifiers[^id-mapping].

    ## Tracing Methodology

    ### ID Mapping

    Cody provided `intel_request_id` values. Consumer logs don't contain this field on processing lines[^field-proof]. The mapping is:

    ```
    intel_request_id → DB metadata column[^db-query]
    └→ correlation_id (unique per location, logged as CID)
    └→ batch_id (shared per API request, = original correlation_id)[^batch-id-code]
    ```

    10 sample IRs mapped to 6 distinct batches, all org `9de5d801`[^db-query].

    ### Counting Method

    Each processed message produces exactly 2 log lines (`Processing...` + `Processed...`) from the shared amqp-consumer library[^lib-logging]. Counting `message_type:exact("intel_requests.created")` lines per `batch_id` per service, divided by 2, gives IRs processed[^stats-query].

    ## Results

    | batch_id | event-recorder | AIPT | ASU | ASU missing |
    |---|---|---|---|---|
    | `c2a627b3` | 174 | 174 | 168 | 6 (3.4%) |
    | `215181db` | 22 | 22 | 21 | 1 (4.5%) |
    | `812f79b7` | 68 | 68 | 66 | 2 (2.9%) |
    | `d334c245` | 64 | 64 | 62 | 2 (3.1%) |
    | `2186cea6` | 22 | 22 | 20 | 2 (9.1%) |
    | `7438ef94` | 54 | 54 | 51 | 3 (5.6%) |
    | **Total** | **404** | **404** | **388** | **16 (3.96%)** |

    ## Why `intel_request_id` Appears as a Structured Field (But Misleads)

    Vector's VRL pipeline[^vrl-extract-ids] extracts IDs from unstructured log text into structured VictoriaLogs fields:

    | Log pattern | Extracted field | VRL source |
    |---|---|---|
    | `(Intel Request {uuid})` | `intel_request_id` | [^vrl-ir] |
    | `(CID {uuid})` | `correlation_id` | [^vrl-cid] |
    | `(Batch ID {uuid})` | `batch_id` | [^vrl-batch] |

    ASU logs `(Intel Request {id}): Received power_status.updated...`[^asu-discard-code] on discard lines — Vector extracts this into `intel_request_id`. But the generic `Processing/Processed message type: intel_requests.created` lines[^lib-logging] don't contain that pattern, so **`intel_request_id` is absent on the lines that prove message delivery**.

    This creates a trap: searching `intel_request_id:exact("56a219a6...")` finds ASU's `power_status.updated` discard (hours later, different batch) but misses the `intel_requests.created` processing entirely.

    Event-recorder never logs individual IR IDs at all[^er-metadata] — it only stores `event_type`, `correlation_id`, and `organization_id` as structured metadata.

    ## What We Don't Know

    The root cause of ASU's ~4% loss is unidentified. Theories (all unverified):

    1. **Silent discard in ASU handler** — an early return path that doesn't log[^asu-discard-code]
    2. **Backpressure** — ASU processing time (0.12-0.15s/msg) is 3x slower than event-recorder (0.05s/msg)
    3. **Queue configuration** — different prefetch/QoS or replica count

    ## Next Steps

    1. **Deploy `_on_message` logging (MR 17[^mr17]) on ASU** — determines whether messages reach ASU's callback at all
    2. **Audit ASU handler for silent drops** — check all return paths in `intel_requests.created` processing
    3. **Compare ASU queue config** to event-recorder and AIPT (prefetch, replicas, resource limits)

    ### Verification Query (post MR 17 deployment)

    ```
    service_name:exact("asset-status-updater") AND "Received message" AND "intel_requests.created"
    ```

    Compare count to event-recorder's count for the same time window. If equal → loss is post-delivery (handler bug). If lower → loss is pre-delivery (RabbitMQ/queue issue).

    ## Actions Taken

    - **Reverted MR 47**[^mr48] — restored `correlation_id` as `batch_id` (traceability fix)
    - MR 17[^mr17] still open — diagnostic logging, pending Diego's review of shutdown protection

    ---

    ## Grounding

    ### Database

    [^db-query]: `SELECT intel_request_id::text, (metadata->>'correlation_id')::text, (metadata->>'batch_id')::text FROM intel_requests.intel_requests_p20260301 WHERE intel_request_id IN (...)` on `production-app-db-2-restore`. 10 IRs → 6 batches, all org `9de5d801`.

    [^id-mapping]: `intel_request_id` is stored in the DB row. `correlation_id` and `batch_id` are in `metadata` JSONB. Consumer logs only contain the latter two on processing lines.

    ### VictoriaLogs Queries

    [^stats-query]: `victorialogs_stats_query` with `service_name:exact("{service}") AND message_type:exact("intel_requests.created") AND (batch_id:exact("c2a627b3-...") OR ...) _time:[2026-03-10T13:00:00Z, 2026-03-10T20:00:00Z] | stats by (batch_id) count(*) as log_lines` — run per service at `time=2026-03-10T20:00:00Z`.

    [^stats-er]: event-recorder raw: `c2a627b3`=348, `215181db`=44, `812f79b7`=136, `d334c245`=128, `2186cea6`=44, `7438ef94`=108. Total=808 lines ÷2 = 404 IRs.

    [^stats-aipt]: AIPT raw: `c2a627b3`=348, `215181db`=44, `812f79b7`=136, `d334c245`=128, `2186cea6`=44, `7438ef94`=108. Total=808 lines ÷2 = 404 IRs.

    [^stats-asu]: ASU raw: `c2a627b3`=336, `215181db`=42, `812f79b7`=132, `d334c245`=124, `2186cea6`=40, `7438ef94`=102. Total=776 lines ÷2 = 388 IRs.

    [^field-proof]: `victorialogs_field_names` for `message_type:exact("intel_requests.created")` per service, 2026-03-10T13:00-14:00Z. ASU: `intel_request_id` present (27,478 hits on co-stored lines, zero hits when filtered to `intel_requests.created` + specific IR ID). event-recorder: field absent entirely.

    ### Grafana Verification Links

    - [ASU batch c2a627b3](http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=2026-03-10T13%3A50%3A00Z&to=2026-03-10T14%3A10%3A00Z&expr=service_name%3Aexact%28%22asset-status-updater%22%29+AND+message_type%3Aexact%28%22intel_requests.created%22%29+AND+batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29) — expect 336 lines
    - [event-recorder batch c2a627b3](http://grafana-production.monitoring.svc.cluster.local/explore?left=%7B%22datasource%22%3A%22PD775F2863313E6C7%22%7D&from=2026-03-10T13%3A50%3A00Z&to=2026-03-10T14%3A10%3A00Z&expr=service_name%3Aexact%28%22event-recorder%22%29+AND+message_type%3Aexact%28%22intel_requests.created%22%29+AND+batch_id%3Aexact%28%22c2a627b3-3f7c-4c89-808c-4d79b19c69cb%22%29) — expect 348 lines

    ### Source Code

    [^lib-logging]: amqp-consumer generic logging — [`gisual-amqp-consumer/app.py:657-664`](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L657-664) (Processing) and [`L717-725`](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L717-725) (Processed). Logs `correlation_id`, `request_id`, `batch_id` — never `intel_request_id`. Logger context fields set at [`L632-642`](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/blob/f2b9dc354f360c7489a2e0aef3f044cc6e1b3f69/gisual_amqp_consumer/app.py#L632-642).

    [^batch-id-code]: `batch_id = self.correlation_id` in intel-requests-api — [`intel_requests.py:1079`](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/blob/f8d1813/intel_requests_api/intel_requests.py#L1079) (post-revert).

    [^asu-discard-code]: ASU `power_on` discard logging — [`processor.py:248-252`](https://gitlab.com/gisual/backend/consumers/asset-status-updater/-/blob/d70ce19a11b0cf28f51b65c26584e5b98bb6fea9/asset_status_updater/processor.py#L248-252). Logs `intel_request_id` in `_msg` text as `(Intel Request %s)`.

    [^er-metadata]: event-recorder Loki structured metadata — [`app.py:372-387`](https://gitlab.com/gisual/backend/consumers/event-recorder/-/blob/d8e53d560ef911d55a1029f95853ded12b47beed/event_recorder/app.py#L372-387). Only `event_type`, `correlation_id`, `organization_id`, `organization_short_name`.

    ### Vector VRL (Log Field Extraction)

    [^vrl-extract-ids]: Vector VRL ID extraction pipeline — [`vector/vrl/07-extract-ids.vrl`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl). All patterns use `if !exists(...)` guards (only extract from text when field absent from structured JSON).

    [^vrl-ir]: `intel_request_id` from `(Intel Request {uuid})`[`07-extract-ids.vrl:198-203`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L198-203).

    [^vrl-cid]: `correlation_id` from `(CID {uuid})`[`07-extract-ids.vrl:205-211`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L205-211).

    [^vrl-batch]: `batch_id` from `(Batch ID {uuid})`[`07-extract-ids.vrl:213-219`](https://gitlab.com/gisual/configuration-management/kubernetes/-/blob/main/applications/infrastructure/charts/vector/vrl/07-extract-ids.vrl#L213-219).

    ### MRs

    [^mr48]: Revert MR — [intel-requests-api !48](https://gitlab.com/gisual/backend/apis/intel-requests-api/-/merge_requests/48) (merged 2026-03-16). Restores `self.correlation_id` as `batch_id`.

    [^mr17]: Diagnostic logging MR — [gisual-amqp-consumer !17](https://gitlab.com/gisual/backend/libraries/gisual-amqp-consumer/-/merge_requests/17) (open, pending review).

    [^prev-error]: Original RCA gist did not preserve exact queries used. The gist states findings like "No log for IR 56a219a6" without query syntax. Based on field analysis, the search likely used `intel_request_id` or free-text IR UUID, which doesn't match `intel_requests.created` processing lines.