Skip to content

Instantly share code, notes, and snippets.

@szymdzum
Last active November 26, 2025 22:00
Show Gist options
  • Select an option

  • Save szymdzum/304645336c57c53d59a6b7e4ba00a7a6 to your computer and use it in GitHub Desktop.

Select an option

Save szymdzum/304645336c57c53d59a6b7e4ba00a7a6 to your computer and use it in GitHub Desktop.

Revisions

  1. szymdzum revised this gist Nov 17, 2025. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion glab.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,6 @@ Use this skill when investigating GitLab CI/CD pipeline issues.
    - User reports pipeline failures (e.g., "Pipeline #2961721 failed")
    - Questions about job failures or CI/CD errors
    - Investigating UI test failures
    - Checking Node 16 vs Node 20 pipeline differences
    - Analyzing job logs or error messages

    ## Quick Start: Finding Pipelines
  2. szymdzum renamed this gist Nov 17, 2025. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. szymdzum revised this gist Oct 26, 2025. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions SKILL.md
    Original file line number Diff line number Diff line change
    @@ -180,8 +180,8 @@ Load these files as needed for detailed information:
    ## Project Context

    - **Project ID**: 2558
    - **GitLab Instance**: gitlab.kfplc.com
    - **Repository**: next-gen/kf-ng-web
    - **GitLab Instance**:
    - **Repository**:
    - **Typical Pipeline**: 80+ jobs across 12 stages
    - **Common Child Pipelines**: UI Tests, Deploy

  4. szymdzum revised this gist Oct 26, 2025. 1 changed file with 0 additions and 1 deletion.
    1 change: 0 additions & 1 deletion gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1 +0,0 @@
    Shows agent how to use glab, agent says ok and does `glab --help` already knows more than you.
  5. szymdzum created this gist Oct 26, 2025.
    194 changes: 194 additions & 0 deletions SKILL.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,194 @@
    ---
    name: Pipeline Investigation
    description: Debug GitLab CI/CD pipeline failures using glab CLI. Investigate failed jobs, analyze error logs, trace child pipelines, and compare Node version differences. Use for pipeline failures, job errors, build issues, or when the user mentions GitLab pipelines, CI/CD problems, specific pipeline IDs, failed builds, or job logs.
    ---

    # Investigating GitLab Pipelines

    Use this skill when investigating GitLab CI/CD pipeline issues.

    ## When to Use

    - User reports pipeline failures (e.g., "Pipeline #2961721 failed")
    - Questions about job failures or CI/CD errors
    - Investigating UI test failures
    - Checking Node 16 vs Node 20 pipeline differences
    - Analyzing job logs or error messages

    ## Quick Start: Finding Pipelines

    ### If user asks about "latest failed pipeline" for a branch:

    ```bash
    # Get current branch
    BRANCH=$(git branch --show-current)

    # Find latest failed pipeline for this branch
    glab api "projects/2558/pipelines?ref=$BRANCH&status=failed&per_page=3" | jq '.[] | {id, status, created_at}'

    # Or for specific branch
    glab api "projects/2558/pipelines?ref=feat/node20-migration&status=failed&per_page=3" | jq '.[0]'

    # For merge request pipelines, use the MR ref format:
    glab api "projects/2558/pipelines?ref=refs/merge-requests/<MR_ID>/head&per_page=3" | jq '.[] | {id, status, created_at}'

    # Find latest pipelines (any status) for current branch
    glab api "projects/2558/pipelines?ref=$BRANCH&per_page=5" | jq '.[] | {id, status, created_at}'
    ```

    ### If user provides pipeline ID directly:

    Start with step 1 below.

    ## Core Workflow

    ### 1. Get Pipeline Overview

    ```bash
    # Quick check if pipeline exists and get basic status
    glab api "projects/2558/pipelines/<PIPELINE_ID>" | jq -r '.status // "Pipeline not found"'

    # Get full pipeline status and metadata
    glab api "projects/2558/pipelines/<PIPELINE_ID>" | jq '{status, ref, created_at, duration, web_url}'

    # Verify pipeline has jobs (old pipelines may be cleaned up)
    glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | jq '. | length'
    # If returns 0, pipeline data is unavailable - try a more recent one
    ```

    ### 2. List Failed Jobs

    **ALWAYS use --paginate** when getting jobs (pipelines have 80+ jobs):

    ```bash
    # Get ALL failed jobs
    glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | jq -r '.[] | select(.status == "failed") | "\(.name) - Job \(.id)"'
    ```

    ### 3. Get Job Logs

    ```bash
    # Get last 100 lines of job log (capture stderr with 2>&1)
    glab ci trace <job-id> 2>&1 | tail -100

    # Search for errors
    glab ci trace <job-id> 2>&1 | grep -E "error|Error|failed|FAIL"
    ```

    ### 4. Check for Child Pipelines

    Jobs like UI Tests and Deploy trigger child pipelines. **Always check bridges**:

    ```bash
    # Find child pipelines
    glab api "projects/2558/pipelines/<PIPELINE_ID>/bridges" | jq '.[] | {name, status, child: .downstream_pipeline.id}'

    # If child pipeline exists, get its jobs
    glab api "projects/2558/pipelines/<CHILD_PIPELINE_ID>/jobs" --paginate | jq -r '.[] | "\(.name) | \(.status) | Job \(.id)"'
    ```

    ## Common Patterns

    ### Pattern: Child Pipeline Failures

    ```bash
    # Step 1: Find failed child pipeline
    CHILD_ID=$(glab api "projects/2558/pipelines/<PIPELINE_ID>/bridges" | jq -r '.[] | select(.status == "failed") | .downstream_pipeline.id')

    # Step 2: Get failed jobs from child pipeline
    glab api "projects/2558/pipelines/$CHILD_ID/jobs" --paginate | jq -r '.[] | select(.status == "failed") | "\(.name) - Job \(.id)"'

    # Step 3: Get one job's log (they're usually identical)
    glab ci trace <job-id> 2>&1 | tail -100
    ```

    ### Pattern: Multiple Failed Jobs

    When many jobs fail (e.g., all Image builds), check ONE representative job first - they often have identical errors.

    ```bash
    # Get first failed job
    FIRST_FAILED=$(glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | jq -r '.[] | select(.status == "failed") | .id' | head -1)

    # Check its log
    glab ci trace $FIRST_FAILED 2>&1 | tail -100

    # If needed, check if error is identical across all failed jobs
    glab api "projects/2558/pipelines/<PIPELINE_ID>/jobs" --paginate | \
    jq -r '.[] | select(.status == "failed") | .id' | head -3 | while read job_id; do
    echo "=== Job $job_id ==="
    glab ci trace $job_id 2>&1 | grep -E "ERROR|Error:|error:" | head -5
    done
    ```

    ## Critical Best Practices

    1. **Always use --paginate** for job queries (pipelines have 80+ jobs)
    2. **Always capture stderr** with `2>&1` when getting logs
    3. **Always check for child pipelines** via bridges API
    4. **Limit log output** to avoid overwhelming context (use `tail -100` or `head -50`)
    5. **Use project ID 2558** explicitly (never rely on context)

    ## Common Pitfalls to Avoid

    - ❌ Forgetting `--paginate` (only gets first 20 jobs)
    - ❌ Not checking child pipelines (missing UI Test/Deploy jobs)
    - ❌ Confusing Pipeline IDs (~2M) with Job IDs (~20M+)
    - ❌ Missing stderr output (forgetting `2>&1`)
    - ❌ Dumping entire logs (use tail/head/grep)
    - ❌ Investigating old pipelines with no jobs (check job count first)

    ## Common Error Patterns

    When analyzing logs, look for these signatures:

    **Missing Docker Image:**
    ```
    manifest for <image> not found: manifest unknown
    ```
    → Base runner image not available in ECR (common during Node version transitions)

    **BundleMon Credentials:**
    ```
    bad project credentials
    {"message":"forbidden"}
    ```
    → BundleMon service access issue (doesn't fail the build, but shows in logs)

    **Build Timeout:**
    ```
    ERROR: Job failed: execution took longer than <time>
    ```
    → Checkout server builds can take 44+ minutes (known issue)

    **Test Failures:**
    ```
    FAIL <test-name>
    Expected: <value>
    Received: <value>
    ```
    → Unit test assertion failure (check test logs for specifics)

    ## Reference Files

    Load these files as needed for detailed information:

    - **`cli-reference.md`** - Complete glab command syntax, API patterns, jq examples, and advanced queries
    - **`pipeline-stages.md`** - Stage dependencies, timing, critical paths, and optimization strategies
    - **`job-catalog.md`** - Full job descriptions, configurations, durations, and dependencies (all 80+ jobs)

    ## Project Context

    - **Project ID**: 2558
    - **GitLab Instance**: gitlab.kfplc.com
    - **Repository**: next-gen/kf-ng-web
    - **Typical Pipeline**: 80+ jobs across 12 stages
    - **Common Child Pipelines**: UI Tests, Deploy

    ### Common Job Names in This Project:

    - **Install And Build**: Install, WebRunner, Wiremock, Reportportal_Setup
    - **Static Analysis**: ESlint, Typescript, Format, Stylelint
    - **Test**: UnitTests:Main, UnitTests:App, UnitTests:Checkout, UnitTests:Utils, UnitTests:Miscellaneous, Sonar
    - **Image**: Create:Image:{banner}:kits:bbm:app, Create:Image:{banner}:kits:checkout:server, Create:Image:{banner}:kits:pim
    - **Banners**: bquk, bqie, tpuk, cafr, capl
    1 change: 1 addition & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    Shows agent how to use glab, agent says ok and does `glab --help` already knows more than you.