Skip to content

Instantly share code, notes, and snippets.

@danwashusen
Last active September 14, 2025 00:53
Show Gist options
  • Select an option

  • Save danwashusen/087ea9e4e554b4a1713bdfc984a0dcc3 to your computer and use it in GitHub Desktop.

Select an option

Save danwashusen/087ea9e4e554b4a1713bdfc984a0dcc3 to your computer and use it in GitHub Desktop.
spec-kit commands to implement and validate tasks

Intelligently implement tasks from a tasks.md file with analysis, validation, and progress tracking.

Given the tasks document path as an argument (e.g., "specs/002-feature/tasks.md"), perform:

  • Pre-implementation analysis to understand current state
  • Smart task selection respecting dependencies and priorities
  • Implementation with validation and quality gates
  • Progress tracking with checkbox updates
  • Post-implementation testing and verification

Inputs

  • Required: path to tasks.md
  • Optional:
    • Task range or specific tasks (e.g., "T001-T010" or "T001,T005,T009")
    • Phase filter (e.g., "Phase 3.2" or "Tests")
    • Category filter for review tasks (e.g., "[Security]" or "[Critical]")
    • Skip completed flag (--skip-completed, default: true)
    • Dry run mode (--dry-run, show what would be done)

Early Gates (stop if any fail)

  1. Tasks Document Gate

    • Verify tasks.md exists and is valid
    • Parse all tasks and their dependencies
    • If invalid format, output Status: "Invalid Tasks Document". STOP.
  2. Completion Audit Gate

    • Run intent-based analysis on all tasks (per validate-tasks.md logic)
    • Identify: βœ… Complete, 🟑 Partial, πŸ”Ά Stub, ❌ Not Started
    • Build implementation queue of incomplete tasks
    • If all tasks complete, output Status: "All Tasks Complete". STOP.
  3. Dependency Analysis Gate

    • Map task dependencies from document structure and explicit notes
    • Verify prerequisites are met for each task
    • Order tasks respecting: Setup β†’ Tests β†’ Implementation β†’ Integration β†’ Polish
    • If circular dependencies found, output Status: "Circular Dependencies". STOP.
  4. TDD Compliance Gate

    • For implementation tasks, verify corresponding tests exist and fail
    • For test tasks, ensure they will run before implementation
    • If TDD violated, output Status: "TDD Violation - Tests Must Fail First". STOP.

Pre-Implementation Analysis (for each task) Before starting any task, perform comprehensive analysis:

  1. Current State Assessment:

    • What already exists for this task?
    • Is there partial implementation to build upon?
    • Are there related files that provide patterns to follow?
    • Check git history for previous attempts
  2. Context Gathering:

    • Load related design documents (plan.md, research.md, data-model.md)
    • Identify patterns from similar completed tasks
    • Check for code review feedback (T0XX tasks) affecting this task
    • Review constitutional requirements applicable to this task
  3. Implementation Planning:

    • Determine exact files to create/modify
    • Identify required imports and dependencies
    • Plan test scenarios if implementing features
    • Note integration points with existing code

Task Implementation Strategies

By Task Type:

Setup Tasks (package.json, configs):

  • Check for existing config files to extend
  • Use established patterns from project
  • Validate against TypeScript/ESLint after creation
  • Ensure all dependencies are properly versioned

Test Tasks (*.test.ts, *.test.tsx):

  • MUST be written to fail initially (TDD)
  • Include comprehensive test cases:
    • Happy path scenarios
    • Error conditions
    • Edge cases
    • Security validations if applicable
  • Use existing test setup/utilities
  • Ensure proper async handling
  • Add meaningful assertions, not just existence checks

Implementation Tasks (features, services):

  • Follow established patterns in codebase
  • Include proper error handling
  • Add structured logging with context
  • Implement security checks (auth, validation)
  • Use dependency injection via service locator
  • Include TypeScript types/interfaces
  • Add JSDoc comments for public APIs

Fix/Review Tasks ([Category] fixes):

  • First understand the specific issue
  • Locate exact code needing change
  • Apply minimal fix that resolves issue
  • Verify fix doesn't break existing functionality
  • Add/update tests to prevent regression
  • Update related documentation

Integration Tasks (middleware, routes):

  • Ensure proper connection between layers
  • Add request/response validation
  • Include correlation ID propagation
  • Implement proper error boundaries
  • Add integration tests

Configuration Tasks:

  • Use environment variables for secrets
  • Provide sensible defaults
  • Add validation for required settings
  • Document all configuration options

Quality Gates (apply to each implementation)

  1. Code Quality:

    • Passes TypeScript compilation
    • No ESLint errors
    • Follows project conventions
    • No console.log statements (use logger)
    • No commented-out code
    • No TODO/FIXME without ticket reference
  2. Security:

    • Input validation on all user data
    • No hardcoded secrets
    • Proper authentication checks
    • SQL injection prevention (parameterized queries)
    • XSS prevention (output encoding)
  3. Testing:

    • Unit tests for new functions
    • Integration tests for endpoints
    • Tests actually test functionality, not just run
    • Error cases are tested
    • Minimum 80% code coverage for new code
  4. Constitutional Compliance:

    • Library-first: Features as libraries with CLI
    • TDD: Tests written and failing first
    • Service Locator: No singletons
    • Structured Logging: JSON format with context
    • Repository Pattern: Database access abstracted
    • SOC 2: Audit fields, logging, error handling

Implementation Workflow

For each task in the implementation queue:

  1. Pre-Implementation:

    πŸ“‹ Task: T001 - Create monorepo structure
    πŸ” Analyzing current state...
    βœ“ Found partial implementation: package.json exists
    ⚠️ Missing: pnpm-workspace.yaml, turbo.json
    πŸ“š Loading patterns from completed tasks...
    🎯 Implementation plan ready
    
  2. Implementation:

    πŸš€ Implementing T001...
    βœ“ Created pnpm-workspace.yaml
    βœ“ Updated package.json with workspace config
    βœ“ Added required dependencies
    πŸ§ͺ Running validation...
    
  3. Validation:

    βœ“ TypeScript: No errors
    βœ“ ESLint: Passed
    βœ“ Tests: N/A (config file)
    βœ“ Integration: pnpm install successful
    
  4. Progress Update:

    βœ… T001 Complete - Updating tasks.md
    πŸ“Š Progress: 1/50 tasks complete (2%)
    

Progress Tracking

  • Update task checkboxes in real-time:

    • Change - [ ] T001 to - [x] T001 when complete
    • Add completion timestamp comment: <!-- completed: 2024-01-15 14:30 -->
  • Maintain implementation log:

    ## Implementation Log - <YYYY-MM-DD HH:MM>
    
    ### Session Summary
    - Tasks Attempted: 10
    - Tasks Completed: 8
    - Tasks Failed: 2 (T045, T046 - missing dependencies)
    - Time Elapsed: 45 minutes
    
    ### Completed Tasks
    βœ… T001: Monorepo structure (5 min)
    βœ… T002: Root package.json (3 min)
    [...]
    
    ### Failed Tasks
    ❌ T045: Missing Clerk SDK configuration
    ❌ T046: Database connection not available
    
    ### Next Steps
    - Configure Clerk authentication
    - Set up database connection
    - Retry failed tasks
    

Error Handling

When implementation fails:

  1. Log detailed error with context
  2. Attempt automatic recovery if possible
  3. Mark task as 🟑 Partial if some progress made
  4. Document blockers in implementation log
  5. Continue with non-dependent tasks
  6. Provide clear remediation steps

Post-Implementation Actions

After completing all possible tasks:

  1. Run Test Suite:

    pnpm test
    pnpm typecheck
    pnpm lint
  2. Generate Summary Report:

    ## Implementation Summary
    
    ### Statistics
    - Total Tasks: 50
    - Completed: 35 (70%)
    - Partial: 5 (10%)
    - Blocked: 3 (6%)
    - Not Started: 7 (14%)
    
    ### Quality Metrics
    - Test Coverage: 85%
    - TypeScript Errors: 0
    - ESLint Warnings: 3
    - Build Status: βœ… Passing
    
    ### Blockers
    - Missing external dependencies
    - Unclear requirements for T047
    - Database setup required for T040-T043
    
  3. Update Documentation:

    • Update CLAUDE.md with new patterns
    • Add implementation notes to relevant tasks
    • Document any workarounds or decisions made

Output Format

  • Summary: Implementation Complete | Partial Implementation | Blocked by Dependencies | Implementation Failed
  • Progress: XX/XX tasks implemented (XX%)
    • βœ… Completed: [count]
    • 🟑 Partial: [count]
    • πŸ”Ά Stub: [count]
    • ❌ Failed: [count]
  • Quality Gates: TypeScript βœ“ | ESLint βœ“ | Tests βœ“ | Coverage XX%
  • Session Metrics:
    • Time: XX minutes
    • Files Created: XX
    • Files Modified: XX
    • Lines Added: XXXX
  • Blockers: [list any blocking issues]
  • Next Steps: [recommended actions]

Important Notes

  • Always run in project root unless otherwise specified
  • Respect .gitignore patterns when creating files
  • Use atomic commits with descriptive messages
  • If uncertain about implementation, mark as 🟑 Partial and document uncertainty
  • Never skip tests unless explicitly directed
  • Keep security and performance in mind for all implementations

Validate the tasks document and perform an S‑tier code review for a specific feature.

Given the tasks document path as an argument (e.g., "specs/002-feature/tasks.md"), perform:

  • A scope‑correct validation aligned with the /tasks command intent and .specify/templates/tasks-template.md.
  • An S‑tier code review assuming the reviewer LLM is more capable than the implementer/fixer LLM; include detailed reasoning, evidence, and actionable fixes.

Inputs

  • Required: path to tasks.md.
  • Optional (for code review scope):
    • PR number OR commit range (e.g., BASE..HEAD) OR branch to compare against default branch.
    • File filters (globs) to narrow the review set.
    • Known environment or reproduction notes (if any).

Early Gates (stop if any fail)

  1. Design Documents Gate

    • Verify required design documents exist in the feature directory:
      • research.md contains technical decisions and architecture patterns
      • plan.md exists with implementation roadmap
      • data-model.md exists if data entities are involved
      • contracts/ directory exists if API endpoints are defined
    • If critical documents are missing, output Status: "Missing Design Docs" with list of missing files. STOP.
  2. Plan-of-Record Gate

    • Verify <feature>/plan.md exists in the same directory as the tasks doc (or as referenced in Primary Sources).
    • If missing or not referenced, output Status: "Blocked by Plan" with remediation to generate/locate the plan. STOP.
  3. Unknowns Gate

    • Scan the tasks doc for any remaining "[NEEDS CLARIFICATION: …]" items.
    • If any remain, output Status: "Needs Clarification" with a grouped list and suggested, succinct follow-up questions. STOP.
  4. TDD Ordering Gate

    • Validate that test tasks precede implementation tasks:
      • Contract and integration tests appear before related implementation tasks.
      • Where contracts exist in contracts/ directory, there is at least one corresponding contract test task.
    • If violated, output Status: "TDD Violations" with examples and specific reorder suggestions. STOP.
  5. Code Review Scope Gate

    • Establish a concrete review scope:
      • If PR number provided: fetch PR diff.
      • Else if commit range provided: use git diff <range>.
      • Else: compute diff from the feature branch to the repository's default branch (merge‑base to HEAD).
    • If unable to determine scope automatically and none provided, request the user to supply PR/range. STOP.
  6. Task Completion Audit Gate

    • For each task in tasks.md, analyze intent and verify implementation status:
      • Parse task ID, description, and target file path from task text
      • Understand the task intent: what should be accomplished?
      • Check for expected artifacts and functionality
      • Classify completion state: Complete/Partial/Stub/Not Started
    • Build completion map: {taskId: completionState, evidence: string}
    • Calculate overall completion percentage
  7. Quickstart Verification Gate

    • Check if quickstart.md exists in the same directory as tasks.md
    • If quickstart.md exists, parse all verification steps from sections like "Verification Steps", "Backend Health Check", "Frontend Access", etc.
    • Map each quickstart verification scenario to corresponding tasks in tasks.md:
      • Health check endpoints β†’ endpoint implementation and test tasks
      • CLI verification commands β†’ CLI interface and test tasks
      • Frontend access flows β†’ frontend component and integration test tasks
      • Database operations β†’ repository and migration test tasks
      • Build/test commands β†’ build configuration and test setup tasks
    • Calculate quickstart coverage: (covered scenarios / total scenarios) * 100
    • If coverage < 80%, output Status: "Insufficient Quickstart Coverage" with list of unmapped scenarios. STOP.
    • If quickstart.md exists but no integration test tasks reference quickstart scenarios, output Status: "Missing Quickstart Integration Tests". STOP.

Scope and Sources

  • Primary input: the provided tasks.md.
  • Sibling artifacts (same directory): evaluate only those explicitly referenced or expected by the tasks doc: plan.md, data-model.md, contracts/*, quickstart.md, research.md.
  • Note: Architecture details should be extracted from research.md which contains feature-specific architectural decisions, NOT from primary architecture documents which are too generic.
  • Alignment references: spec.md (WHAT/WHY scope), CONSTITUTION.md (constitutional constraints).
  • Do not require or load docs/architecture.md or docs/ui-architecture.md unless specifically debugging architectural violations.
  • Do not scan unrelated files.

Validation Criteria (when gates pass)

  • Structure & Completeness:

    • Title references correct feature name consistent with plan.md.
    • Tasks are numbered sequentially (T001, T002, …) with unique IDs.
    • Each task includes concrete file paths and clear outcomes; avoid vague actions.
    • Parallelization markers [P] used only when tasks touch different files or independent subsystems.
  • Artifact Mapping:

    • Contracts β†’ at least one contract test task per contract file; endpoint impl tasks exist and depend on prior tests.
    • Data-model β†’ model or schema tasks for each key entity.
    • Quickstart β†’ comprehensive mapping of verification scenarios to test tasks:
      • Health check scenarios β†’ endpoint implementation tasks with corresponding integration tests
      • CLI verification commands β†’ CLI interface tasks with test coverage for help flags and core functionality
      • Frontend access flows β†’ UI component tasks, authentication integration tasks, and end-to-end tests
      • Database setup steps β†’ migration tasks, repository implementation tasks, and connection tests
      • Build/test verification β†’ build configuration tasks, test setup tasks, and deployment pipeline tasks
      • Environment configuration β†’ config file tasks, environment variable validation tasks
      • Logging verification β†’ structured logging implementation tasks and log format validation tests
      • Error handling scenarios β†’ error middleware tasks and error response format tests
      • Each quickstart verification step should have at least one corresponding test task that validates the expected behavior.
  • Architecture Alignment (HOW):

    • Tasks do not cross service boundaries improperly; respect routing/state patterns from UI Architecture.
    • Observability, error handling, and auth constraints from Architecture are represented as task acceptance notes or checklist items.
  • Constitution Check (WHAT/WHY level constraints):

    • High-level requirements (authn, RBAC, logging, input validation, safe errors) are captured as constraints or acceptance criteria without leaking low-level HOW unrelated to tasks.
  • Execution Readiness:

    • Tasks are immediately executable by an agent: specific, unambiguous, and scoped.
    • Dependencies are explicit; examples of parallel groups are provided where feasible.

Task Completion Audit (Intent-Based Analysis)

  • For each task (T001-TXXX), determine completion through intent analysis:

    • Parse task description to understand the expected outcome
    • Identify target artifacts: files, directories, configurations, tests
    • Verify completion through multiple signals:
      • Primary: Does the main artifact exist?
      • Secondary: Does it contain expected functionality?
      • Tertiary: Is it integrated with the rest of the system?
  • Completion State Classification:

    • βœ… Complete: All indicators positive, meaningful implementation exists
    • 🟑 Partial: Main artifact exists but missing key elements
    • πŸ”Ά Stub: File/directory exists but only placeholder content
    • ❌ Not Started: No evidence of implementation
  • Smart Detection Patterns by Task Type:

    • "Create X package structure in /path/" β†’ Check: directory exists, has package.json, has src/ structure
    • "X CLI interface in /path/cli.ts" β†’ Check: file exists, exports CLI class/function, responds to --help
    • "Contract/integration test for X" β†’ Check: test file exists, contains describe/test blocks, tests are not skipped
    • "X model and repository in /path/" β†’ Check: file exists, exports model class/schema, has CRUD operations
    • "X endpoints (GET/POST) in /path/routes.ts" β†’ Check: file exists, exports route handlers, handlers have proper signatures
    • "Configure X in /config.file" β†’ Check: config file exists, contains expected settings, valid syntax
    • "Service/middleware implementation in /path/" β†’ Check: file exists, exports expected functions/classes, has core logic
    • "Health check endpoint in /path/" β†’ Check: endpoint file exists, returns status information, includes service metadata
    • "Environment setup task for X" β†’ Check: environment files exist, contain required variables, have proper format
    • "Database migration/setup in /path/" β†’ Check: migration files exist, contain schema definitions, have proper versioning
    • "Build configuration in /path/" β†’ Check: build config exists, has proper targets/scripts, includes all dependencies
    • "Authentication integration in /path/" β†’ Check: auth middleware exists, handles tokens/sessions, includes error cases
    • "Logging setup/configuration in /path/" β†’ Check: logger config exists, structured format defined, includes log levels
    • "Error handling middleware in /path/" β†’ Check: error handler exists, catches exceptions, returns consistent format
  • Review/Fix Task Patterns:

    • "TXYZ: [Category] Summary β€” File: path[:line-range]" β†’ Parse category, severity, and fix requirements from task structure β†’ Identify task type: Initial implementation vs Fix/Review task β†’ For fix tasks, check if specific issue is resolved:
      • [Security]: Vulnerable pattern removed, secure alternative present Examples: Math.random() β†’ crypto.randomUUID(), plaintext β†’ hashed
      • [Correctness]: Logic error fixed, correct implementation exists Examples: Missing null checks added, wrong calculations corrected
      • [Performance]: Optimization applied, inefficiency removed Examples: Race conditions fixed, memory leaks plugged, caching added
      • [Testing]: Tests added/fixed, coverage improved Examples: Import paths corrected, assertions added, tests not skipped
      • [API/Contract]: Endpoint compliance, response format correct Examples: Missing routes implemented, response structure matches spec
      • [Observability]: Logging/metrics added where needed Examples: Correlation IDs added, structured logging implemented
      • [Maintainability]: Code quality improved, patterns consistent Examples: Mixed imports unified, magic numbers replaced with constants β†’ Line-specific verification when :line-number provided:
      • Read specific line range (line Β± 5)
      • Verify exact issue at that location is fixed β†’ File-level verification when no line number:
      • Check entire file for pattern fixes
      • Verify all instances of issue are resolved β†’ Confidence levels for fix verification:
      • βœ… High: Anti-pattern gone, fix implemented, no related TODOs
      • 🟑 Medium: Issue addressed differently, partial fix, or missing tests
      • πŸ”Ά Low: File modified but fix unclear
      • ❌ Not Fixed: Original issue still present
  • Quality Indicators:

    • Non-trivial implementation (not just empty functions)
    • No TODO/FIXME/NOT_IMPLEMENTED comments
    • Exports match expected interface
    • For tests: Contains actual assertions
    • For configs: Not just boilerplate defaults

S‑Tier Code Review (when scope established)

  • Review depth and reasoning:
    • Provide detailed reasoning for each finding; include evidence (code excerpts with file:line), impact analysis, and suggested fixes.
    • Treat the reviewer as more capable than the implementer; challenge design choices and test adequacy.
    • Classify findings: Correctness, Security, Performance, Reliability, API/Contract, Observability, Testing, Accessibility, Maintainability, Style.
  • Coverage and mapping:
    • Map findings to Architecture/UI boundaries (HOW) and Spec acceptance criteria (WHAT/WHY) to detect scope drift or boundary violations.
    • Verify tests meaningfully exercise critical paths; propose additional tests where coverage is insufficient.
  • Constitution alignment:
    • Evaluate authn/RBAC, logging, input validation, error handling, and data protection practices against CONSTITUTION.md.
  • Remediation quality:
    • For each finding, propose a concrete fix plan with minimal, safe diffs; include test additions/updates, observability hooks, and migration notes when relevant.

Write‑Back Behavior (update checkboxes and append feedback to tasks.md)

  • FIRST: Update task completion status based on audit:

    • For each task, update checkbox based on completion state:
      • Change - [ ] T001 to - [x] T001 for βœ… Complete tasks
      • Keep - [ ] T002 for 🟑 Partial, πŸ”Ά Stub, or ❌ Not Started
    • Preserve all other task text exactly as-is
  • SECOND: Add completion status report as a new section:

    ## Task Completion Status - <YYYY-MM-DD HH:MM>
    
    ### Summary
    - Total Tasks: XX
    - βœ… Completed: XX (XX%)
    - 🟑 Partial: XX (XX%)
    - πŸ”Ά Stubs: XX (XX%)
    - ❌ Not Started: XX (XX%)
    
    ### Phase Breakdown
    - Phase 3.1 Setup: X/X complete
    - Phase 3.2 Tests: X/X complete
    - Phase 3.3 Core: X/X complete
    - Phase 3.4 Integration: X/X complete
    - Phase 3.5 Polish: X/X complete
    
    ### Completed Tasks (Evidence)
    βœ… T001: Monorepo structure created (package.json, pnpm-workspace.yaml exist)
    βœ… T014: Service locator implemented (service-locator.ts with full implementation)
    [List all completed with evidence...]
    
    ### Incomplete Tasks (Missing)
    ❌ T040: Base repository not found (/packages/shared-data/src/repositories/base-repository.ts)
    🟑 T020: Package structure partial (missing repository subdirectory)
    [List all incomplete with reasons...]
    
  • THIRD: After completion updates, write actionable feedback as additional tasks in the same tasks.md:

    • Insert a new phase section titled: ## Phase 3.<N>: Code Review Feedback from <YYYY-MM-DD HH:MM> (short local time; 24‑hour)
    • Determine <N> by scanning existing headings ## Phase 3.<n>:; if none, start at 3.1.
    • Continue task numbering from the highest existing T### in the file (preserve zero‑padding).
    • For each finding, add a task with this structure:
      • TXYZ: [Category] Summary β€” File: path[:line-range]
        • Why: concise impact rationale (user/system risk)
        • Severity: Critical | Major | Minor
        • Fix: concrete steps (tests first, then implementation)
        • Links: spec/architecture anchors, commits/PR references
    • Respect TDD: include or reference a preceding test task for each implementation fix.
  • If file is write‑protected or editing is not permitted, output a ready‑to‑apply patch diff instead of modifying files.

Output Format

  • Summary: Ready for execution | Missing Context | Blocked by Plan | Needs Clarification | TDD Violations | Insufficient Quickstart Coverage | Missing Quickstart Integration Tests | Alignment Issues | Review Complete (XX% tasks implemented) | Review Pending (no scope).
  • Implementation Progress: XX/XX tasks complete (XX%)
    • βœ… Completed phases: [list completed phases]
    • 🚧 In-progress phases: [list partial phases]
    • ⏳ Not started phases: [list pending phases]
  • Quickstart Coverage: XX/XX scenarios covered (XX%)
    • βœ… Covered scenarios: [list scenarios with corresponding tasks]
    • ❌ Missing scenarios: [list unmapped verification steps]
    • πŸ” Recommended tests: [suggest integration tests for uncovered scenarios]
  • Gates: pass/fail for Required Context, Plan-of-Record, Unknowns, TDD Ordering, Task Completion Audit, Quickstart Verification (with notes).
  • Checklist Results: map to Structure & Completeness, Artifact Mapping, Architecture Alignment, Constitution Check, Execution Readiness, Task Completion.
  • Strengths: concise positives to preserve.
  • Gaps & Risks: findings with severity (Critical | Major | Minor), rationale, and section/file references.
  • Proposed Improvements: concrete task-level rewrites or reorderings.
  • Open Questions: any remaining items in "[NEEDS CLARIFICATION: …]" format.
  • Alignment Notes: plan/spec/architecture/Constitution consistency or conflicts.
  • Code Review Report: categorized findings with reasoning, evidence, and proposed diffs/tests.

Important

  • Non-destructive by default; if permitted, append feedback tasks to the same tasks.md as a new Phase 3. section. If not permitted, emit a patch for user approval.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment