lissahyacinth/audit_SKILL.md

name	description	argument-hint	allowed-tools
audit	Multi-phase codebase consistency audit. Scans for signals (TODOs, ignored tests, rot), builds shadow documentation of Claude's understanding per-file, then cross-references everything to surface inconsistencies and questions for the human.	[scan\|shadow\|crossref\|all\|reset\|answer N]	Read, Grep, Glob, Bash, Agent, Write, Edit, AskUserQuestion

Codebase Consistency Audit

You maintain a set of audit documents in .claude/audit/ that represent YOUR understanding of this codebase. These documents are yours — the human can read them, but you own them. Never edit the user's source files during an audit.

Run the phase specified by $ARGUMENTS. Default to all if no argument given.

scan — Phase 1 only
shadow — Phase 2 only (runs Phase 1 first if signals.md doesn't exist)
crossref — Phase 3 only (runs Phase 2 first if shadow docs don't exist)
all — All three phases sequentially
answer — Present the top 1-3 unanswered questions interactively
answer N — Bring up question Q-{N} specifically
fix — Re-apply any unapplied auto-fixes from Tier 1
reset — Delete all files in .claude/audit/ except acknowledged.md (that's the human's), then recreate empty directories. Confirms to the user what was removed.

Directory structure

.claude/audit/
  signals.md              # Phase 1: Raw signals found in codebase
  shadow/                 # Phase 2: One doc per source file
    <flattened-path>.md   # e.g. src--main.rs.md, src--lib--config.rs.md
  domains/                # Domain sub-skills: understanding of external systems
    _index.md             # Maps detection patterns → domain files
    <domain-name>.md      # Per-domain understanding (DB schema, API contracts, etc.)
  findings.md             # Phase 3: Cross-reference results
  questions.md            # Phase 3: Questions needing human domain knowledge
  acknowledged.md         # Human-maintained — never modify this file

Create .claude/audit/ and subdirectories if they don't exist.

Phase 1: Scan

Fast parallel sweep for signals — markers in the code that indicate known debt, deferred work, or potential rot. This phase uses Grep extensively and should be quick.

Signal categories to scan for

Deferred work:

TODO, FIXME, HACK, XXX, WORKAROUND, TEMPORARY, TECH_DEBT
Language-specific: #[ignore], #[allow(...)] (Rust), @Ignore/@Disabled (Java/Kotlin), .skip/xit/xdescribe (JS), @pytest.mark.skip (Python), #if false/#if 0 (C/C++)
Commented-out code blocks (3+ consecutive commented lines that look like code, not prose)

Staleness indicators:

References to file paths in comments or docs — note these for Phase 3 verification
Version numbers in comments (e.g. "as of v2.3", "since 1.0")
Date references in comments (e.g. "added 2024-01-01", "temporary until Q2")
Dead links in markdown files

External system boundaries:

Detect imports/usage of database clients, HTTP clients, message queues, cloud SDKs, API clients, ORMs, schema definitions
For each detected boundary, record the domain (e.g. "postgresql", "redis", "stripe-api") and which files touch it
Check .claude/audit/domains/_index.md — if this domain already has a sub-skill, note the association. If not, this is a new domain that needs a stub created.

Convention breaks:

Files without any tests when sibling files have tests
Mixed conventions (e.g. some files use one error pattern, others use another — just note the signal, don't judge)

Output

Write .claude/audit/signals.md. Format:

# Signals

Last scan: {date}
Files scanned: {count}

## Deferred Work
| Signal | File | Line | Text |
|--------|------|------|------|
| TODO   | src/foo.rs | 42 | // TODO: handle timeout case |

## Staleness Indicators
| Type | File | Line | Text |
|------|------|------|------|
| path-ref | README.md | 15 | See `src/old_module.rs` for details |

## External System Boundaries
| Domain | Domain File | Files | Status |
|--------|-------------|-------|--------|
| postgresql | postgresql.md | src/db/users.py, src/db/orders.py | known |
| stripe-api | (none) | src/billing/charge.py | NEW — stub needed |

## Convention Observations
Brief notes on patterns observed — what the "usual" convention seems to be and where it deviates. Keep this short. This is not a style guide — it's noting what you see so Phase 3 can use it.

Domain stub creation

For each NEW domain detected, create a stub at .claude/audit/domains/<domain-name>.md:

# Domain: {domain-name}

Status: STUB — needs human input
Detected in: {list of files}
Detected via: {import patterns or API usage that triggered detection}

## What we think this is
{Brief description based on what the code appears to do with this system — e.g. "The code creates and deletes Kubernetes Node objects via the kube client library"}

## What we don't know
{List specific questions — not generic ones. Based on what the code assumes:}
- "The code assumes field `spec.nodeID` is a DNS-1123 subdomain — is this enforced by the external system or only by our validation?"
- "The code retries on 409 Conflict — is this the correct behavior for this API?"

## Schema / Contract
(To be filled in by human or by reading external documentation)

Also update .claude/audit/domains/_index.md to include the new domain mapping.

The _index.md format:

# Domain Index

Maps detection patterns to domain sub-skill files.

| Pattern | Domain File | Description |
|---------|-------------|-------------|
| `psycopg`, `sqlalchemy`, `SELECT`, `INSERT` | postgresql.md | PostgreSQL database interactions |
| `stripe.`, `stripe_client` | stripe-api.md | Stripe payment API |

Phase 2: Shadow Documentation

Build or update Claude's understanding of each significant source file. Use subagents to analyze files in parallel.

Selecting files to shadow

Glob for all source files (detect language from project config files)
Exclude: generated files, lock files, vendored dependencies, build output
Prioritize: files referenced in signals.md, files with doc comments, files that are entry points or define public APIs
Skip files under ~20 lines unless they contain signals

Shadow doc format

For each file, create .claude/audit/shadow/<flattened-path>.md where the path uses -- as separator (e.g. src/controller/mod.rs becomes src--controller--mod.rs.md).

The unit of analysis is the block, not the file. A block is a function, method, class, type definition, test case, or any named code unit. The file is just the namespace that contains blocks.

Each shadow doc is produced by a subagent that reads the source file.

The subagent should write:

# Shadow: {original path}

Last updated: {date}
Git checkpoint: {short commit hash}

## Imports
What this file imports from other files in this project (not external packages). List the specific names.

## Blocks

### `function_name(args)` — {type} (lines N-M)
**Comments say:** {exact quotes from doc comments or inline comments on this block}
**Code does:** {brief factual description of what the code actually does — inputs, outputs, branches}
**Internal deps:** {other blocks in this project this block calls or references}
**Signals:** {any TODOs, ignores, or other signals within this block}

### `ClassName` — {type} (lines N-M)
...

### `test_case_name` — test (lines N-M)
**Tests:** {what expressions/inputs are tested and what outputs are asserted}
**Coverage:** {what the test exercises — and notably, what it doesn't}
...

Key principles:

"Comments say" vs "Code does" are always separate. The whole point is finding where they disagree.
Test blocks record what they actually test, not what they claim to test. Quote the actual assertions.
Skip trivial blocks (simple re-exports, empty __init__, one-line accessors) unless they contain signals.

Domain-aware shadowing

Before spawning subagents, check .claude/audit/domains/_index.md and signals.md (External System Boundaries table). For each file being shadowed:

Check if it touches any known domain (via the index patterns)
If yes, read the domain sub-skill file and pass its content to the subagent as context
For blocks that interact with an external system, add a Domain field noting what the block assumes about the system and whether it matches the domain doc

This is how domain knowledge propagates: the human seeds a domain doc once, and every block that touches that domain gets checked against it.

Example of a domain-aware block entry:

### `create_node(offering, config)` — function (lines 45-80)
**Comments say:** "Provisions a node via the Hetzner API and waits for it to join"
**Code does:** POSTs to /servers, polls status until "running", then waits for K8s node to appear
**Domain (stripe-api):**
  Assumes: POST /v1/charges returns 200 with `charge.id`
  Assumes: idempotency key prevents duplicate charges
  Matches domain doc: Yes / No / Partially / Domain doc is a stub

Subagent prompt template

When spawning subagents for shadow docs, use this prompt structure:

Read {file_path} completely. You are creating a shadow document — Claude's own understanding of this file, stored separately.

The unit of analysis is the BLOCK (function, class, type, test case), not the file. For each non-trivial block, extract:

What the comments/docs SAY it does (quote them exactly)

What the code ACTUALLY does (brief factual description)

What other blocks in this project it calls or depends on

Any signals: TODOs, ignored tests, commented-out code

If it interacts with an external system, what does it assume about that system?

Separate "comments say" from "code does" — they are recorded independently because we are looking for disagreements between them.

For test blocks specifically: record what inputs are tested and what assertions are made. Note what IS tested and what is NOT (e.g. "tests Cond with literal values only — no tests with variable references in branches").

Write the shadow doc to {shadow_path}.

If domain context is available, append to the prompt:

This file interacts with the following external system(s). Here is our current understanding — check whether each block's assumptions match: {domain doc content}

Incremental updates via git

Shadow docs track the git commit hash they were built from. On each run:

Run git log -1 --format=%H to get the current HEAD commit
For each existing shadow doc, compare its recorded commit against HEAD
Run git diff {shadow_commit}..HEAD -- {file_path} to check if the source file changed
Skip files with no diff. Regenerate shadow docs for changed files.
New shadow docs record the current HEAD as their checkpoint.

Shadow doc header becomes:

# Shadow: {original path}

Last updated: {date}
Git checkpoint: {commit hash (short)}

This is faster than hashing file contents and gives you a meaningful reference point — you can see exactly when Claude last looked at a file relative to the project history. It also means git log {checkpoint}..HEAD -- {path} shows you every change Claude hasn't processed yet.

Phase 3: Cross-reference

Compare everything against everything. This is where inconsistencies surface.

Checks to perform

Shadow vs Shadow:

File A's dependency list says it imports X from File B. Does File B actually export X?
File A's claim says "must run after init_pool()". Does any caller actually ensure this ordering?
Do two files that implement similar patterns (found via Convention Observations) actually agree on the pattern?

Shadow vs User Docs:

Read all user-maintained documentation (README, CLAUDE.md, ADRs, RFCs, etc. — discover these dynamically via Glob for *.md at the project root and common doc directories)
For each claim in user docs, check if any shadow doc contradicts it
For each architectural description, check if the shadow docs collectively tell a different story

Shadow vs Domains:

For each shadow doc with a Domain Interactions section, compare its assumptions against the domain sub-skill
If the domain doc says the API returns field X but the code expects field Y, that's a Contradiction
If the domain doc is a STUB, generate a Question asking the human to verify the code's assumptions
If multiple files interact with the same domain but assume different things about it (e.g. one file expects a 201, another expects a 200), that's a shadow-vs-shadow finding mediated by domain knowledge

Signals vs Shadow:

For each TODO/FIXME: does the shadow doc's "Purpose" suggest the TODO is still relevant, or has the surrounding code changed enough that the TODO might be stale?
For each ignored/skipped test: does the shadow doc explain why? If not, flag it.
For each file path reference found in Phase 1: does the referenced file actually exist?

ACKNOWLEDGED.md filter: Read .claude/audit/acknowledged.md (if it exists). Any finding ID listed there is suppressed — do not include it in findings.md. Do mention the count of suppressed findings in the summary.

Output — Resolution tiers

Every finding must be classified into one of three resolution tiers. This determines how it's presented and what happens next.

Tier 1: Auto-fix

The correct resolution is unambiguous — there's only one reasonable fix and it's mechanical.

Criteria:

One artifact is clearly stale/wrong, the other is clearly current
The fix is a simple text replacement, not a behavioral change
No domain knowledge is needed to determine which side is correct

Examples:

Doc comment references a file path that was renamed/moved
README says the config key is db_host → code reads database_url
Comment says "returns a list" → function returns a generator
Stale comment about conditional compilation when the module is always compiled

The finding includes the proposed diff — the exact before/after text. This is both the documentation of the finding and the spec for applying it.

### [F-{NNN}] {Short title}
**Resolution:** Auto-fix
**Source:** {file:line} says: "{quoted text}"
**Target:** {file:line} says/does: "{quoted text}"
**Proposed fix:**
In `{file}:{line}`:
- {old text}
+ {new text}

Auto-fixes are proposed, not applied during Phase 3. The summary presents them as a batch. The human can:

Approve all → Claude applies the batch
Reject individual fixes → those get promoted to Suggest
Reject all → nothing changes, findings stay in findings.md for later

/audit fix re-presents any unapplied auto-fix proposals.

Tier 2: Suggest

The contradiction is clear, but there are 2+ valid resolutions. Claude presents the options; the human picks one.

Criteria:

Both artifacts could be "the right one" depending on intent
The fix requires choosing between alternatives
A reasonable developer could go either way

Examples:

Comment says "4 pods" but code creates 3 → update the comment, or add a 4th pod?
delete error maps to CreationFailed → rename the variant, or add a DeletionFailed variant?
active_count is defined but never written → remove it, or implement the writes?

### [F-{NNN}] {Short title}
**Resolution:** Suggest
**Source:** {file:line} says: "{quoted text}"
**Target:** {file:line} says/does: "{quoted text}"
**Options:**
  A. {first resolution — describe the change}
  B. {second resolution — describe the change}

Present Suggest findings via AskUserQuestion with the options. Record the human's choice.

Tier 3: Raise

Genuinely needs domain knowledge. Can't determine the correct resolution from the codebase alone.

Criteria:

The inconsistency might be intentional (a design decision)
Resolution depends on planned features, external system behavior, or business logic
Claude cannot determine which side is "correct" without context the codebase doesn't contain

Examples:

Is this enum variant intentionally absent from the API schema, or was it missed?
Is this type planned-but-unimplemented, or dead code?
Does this config struct reflect intended future work or abandoned design?

### [F-{NNN}] {Short title}
**Resolution:** Raise
**Source:** {file:line} says: "{quoted text}"
**Target:** {file:line} says/does: "{quoted text}"
**Context:** {Why this can't be resolved from the codebase alone}
**Question:** {Specific question requiring domain knowledge}

Present Raise findings via AskUserQuestion. Record the human's answer.

Tier assignment rules

If in doubt between Auto-fix and Suggest, choose Suggest. Don't auto-fix anything where the "correct" side is ambiguous.
If in doubt between Suggest and Raise, choose Suggest and include "keep both as-is (this is intentional)" as an option.
Never auto-fix behavioral code changes. Auto-fix is for comments, docs, and references only.

findings.md format:

# Audit Findings

Last audit: {date}
Total: {n} findings ({a} auto-fix, {s} suggest, {r} raise, {k} suppressed)

## Auto-fix
{list of Tier 1 findings}

## Suggest
{list of Tier 2 findings}

## Raise
{list of Tier 3 findings}

questions.md:

Only Tier 3 (Raise) findings go here. Preserve any that already have an answer recorded.

### [Q-{NNN}] {Question title}
**Found in:** {file:line}
**Context:** {What you found and why it's ambiguous — quote both sides}
**Question:** {Specific question framed to elicit domain knowledge}
**Status:** Open | Answered
**Answer:** {recorded when human responds via `/audit answer`}

Numbering

Continue numbering from the highest existing F-NNN or Q-NNN in the audit files. Don't restart from 1.

Human interaction flow

After Phase 3 (end of `crossref` or `all`)

Present findings by tier, then act on them:

1. Auto-fix batch. List all Tier 1 fixes as a summary table, then apply them all. The user's normal tool-approval flow handles consent — each edit goes through their permission settings. If a fix is rejected, move it to Suggest tier.

2. Suggest choices. For each Tier 2 finding, use AskUserQuestion to present the options. Apply the chosen resolution immediately. Record the choice in findings.md.

3. Raise questions. For the top 1-3 Tier 3 findings, use AskUserQuestion to present them. Record answers in questions.md. Remaining Raise findings stay open for /audit answer.

`/audit answer` (no number)

Read questions.md, find all questions with Status: Open. Present the top 1-3 via AskUserQuestion.

`/audit answer N`

Read questions.md, find Q-{N} specifically. Present it with full context via AskUserQuestion. If Q-{N} is already answered, show the prior answer and ask if they want to revise it.

`/audit fix`

Re-run just the auto-fix tier — find any unfixed Tier 1 findings and apply them. Useful after a reset or when new auto-fixes were found by a fresh scan.

Summary

Phases 1 and 2 are internal work. Keep their output minimal:

Scan: "Scan complete. {N} files, {M} signals, {K} domains detected."
Shadow: "Shadow docs updated. {N} files documented."

Phase 3 flow:

Print a one-line summary: "{A} auto-fixes, {S} suggestions, {R} questions raised"
Apply auto-fixes (user approves via normal tool permissions)
Present suggestions as choices
Ask the top questions
Record everything back to findings.md and questions.md

The human should never need to manually edit any audit file.

Rules

Never edit user source files. You only write to .claude/audit/.
Never suggest fixes. Surface inconsistencies. The human decides what's correct.
Never modify acknowledged.md. That file belongs to the human.

The Two-Artifact Rule

Every finding and every question MUST cite two specific artifacts that disagree. An "artifact" is a concrete, quotable thing: a line of code, a doc comment, a README section, a test case, a config entry.

Valid finding: "Line 30 of interp.py passes env to interp(). Line 47 does not. These are both recursive calls in the same function."
Valid question: "The comment on line 52 says 'we might not ever use the value in conditionals.' The Cond branch on line 47 evaluates both arms before selecting — is the comment referring to this behavior?"
Invalid finding: "Lambdas don't capture closures." (Nothing in the codebase claims they should. This is projecting expectations from other languages.)
Invalid question: "Is lazy evaluation planned?" (Speculative. The audit surfaces what IS, not what COULD BE.)

If you cannot point to two artifacts that disagree, you do not have a finding. Your general knowledge of how languages/frameworks/databases "usually" work is NOT an artifact. Only things written in THIS codebase count.

Evidence, not assertion

When a finding references behavior (e.g. "the test suite only passes because tests use literals"), you must quote the evidence. Show the actual test cases. Show the actual lines. If you can't produce the quote, downgrade the claim or drop it.

Contradictions vs Questions

A finding is a Contradiction when both sides are unambiguous and they factually conflict. "Every other branch passes env, this one doesn't" is a Contradiction — there's no interpretation where both are correct.

A finding is a Question only when both sides could be intentionally different and you need domain knowledge to know which. Frame questions to help the human document their intent, not to interrogate their decisions. "Is this the behavior the comment on line 52 refers to?" is good. "Should you fix this?" is not your job.

What is NOT a finding

Expectations from other languages, frameworks, or "best practices" that nothing in this codebase claims to follow
Speculation about future plans or intended features
Style preferences or "this could be better" observations
Things a linter would catch (unused imports, empty files, placeholder text) — note these in signals.md for completeness but do NOT elevate them to findings or questions

lissahyacinth/audit_SKILL.md

Select an option

No results found

Select an option

No results found

Codebase Consistency Audit

Directory structure

Phase 1: Scan

Signal categories to scan for

Output

Domain stub creation

Phase 2: Shadow Documentation

Selecting files to shadow

Shadow doc format

Domain-aware shadowing

Subagent prompt template

Incremental updates via git

Phase 3: Cross-reference

Checks to perform

Output — Resolution tiers

Tier 1: Auto-fix

Tier 2: Suggest

Tier 3: Raise

Tier assignment rules

Numbering

Human interaction flow

After Phase 3 (end of `crossref` or `all`)

`/audit answer` (no number)

`/audit answer N`

`/audit fix`

Summary

Rules

The Two-Artifact Rule

Evidence, not assertion

Contradictions vs Questions

What is NOT a finding

lissahyacinth/audit_SKILL.md

Codebase Consistency Audit

Directory structure

Phase 1: Scan

Signal categories to scan for

Output

Domain stub creation

Phase 2: Shadow Documentation

Selecting files to shadow

Shadow doc format

Domain-aware shadowing

Subagent prompt template

Incremental updates via git

Phase 3: Cross-reference

Checks to perform

Output — Resolution tiers

Tier 1: Auto-fix

Tier 2: Suggest

Tier 3: Raise

Tier assignment rules

Numbering

Human interaction flow

After Phase 3 (end of crossref or all)

/audit answer (no number)

/audit answer N

/audit fix

Summary

Rules

The Two-Artifact Rule

Evidence, not assertion

Contradictions vs Questions

What is NOT a finding

After Phase 3 (end of `crossref` or `all`)

`/audit answer` (no number)

`/audit answer N`

`/audit fix`