Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save z0rs/8cc15cb1a256b254d3b2d05c7cdd8833 to your computer and use it in GitHub Desktop.

Select an option

Save z0rs/8cc15cb1a256b254d3b2d05c7cdd8833 to your computer and use it in GitHub Desktop.

🤖 CrewAgent — Master AI Prompt Collection

Project: Pentest Crew — Multi-agent web application penetration testing pipeline
Stack: Python 3.10+, CrewAI, Burp Suite MCP, 30+ security tool modules
Modes: Single-agent (1 LLM key) or Multi-agent (2–8 LLM keys, 8 specialist agents)


📋 Table of Contents

  1. Master Context Prompt
  2. Code Review Prompt
  3. Architecture Review Prompt
  4. Feature / Improvement Prompt
  5. New Security Tool Creation Prompt
  6. Agent & Task Config Prompt
  7. Bug Fix & Debug Prompt
  8. Test Coverage Prompt
  9. Security Hardening Prompt
  10. Refactoring Prompt

1. Master Context Prompt

Gunakan ini sebagai pembuka di setiap sesi. Paste ini pertama kali sebelum task lainnya agar AI punya konteks penuh tentang codebase.

You are an expert Python developer and security engineer working on **Pentest Crew** — a multi-agent web application penetration testing pipeline.

## Project Overview
Pentest Crew is built with CrewAI + Burp Suite MCP. It reads HTTP scope/history/scanner state directly from Burp Suite via MCP (Model Context Protocol) and executes request validation through Burp tools without needing manual target input.

## Two Operating Modes
- **Single-Agent Mode** (1 API key set): One `pentester` agent runs all 4 stages sequentially with ALL 118 tools.
- **Multi-Agent Mode** (2+ API keys): 8 specialist agents in a strict sequential pipeline:
  scope_discovery_agent → http_analyst → auth_agent → fuzzing_agent
  → validation_executor → lead_pentester → exploitation_agent → report_generator

## Project Structure
src/pentest_crew/
├── main.py              # Entry point, env validation, input building
├── crew.py              # CrewAI agent/task definitions, LLM selection with fallback
├── llm_mode.py          # Single vs multi-agent detection, model override resolution
├── pipeline_gates.py    # Pre-flight gates that skip stages when preconditions not met
├── config/
│   ├── agents.yaml      # Agent roles, goals, backstories with step-by-step instructions
│   └── tasks.yaml       # Task definitions for all 8 pipeline stages
└── tools/
    ├── __init__.py              # Tool singletons + agent tool groups (ANALYST_TOOLS, EXECUTOR_TOOLS, etc.)
    ├── burp_mcp_client.py       # MCP SSE client, 60s timeout, retry logic, stealth mode
    ├── burp_output_sanitizer.py # Credential redaction, binary suppression, attack-surface extraction
    ├── burp_proxy_tools.py      # HTTP history, WebSocket, scanner, scope, intercept control
    ├── burp_request_tools.py    # HTTP/1, HTTP/2 replay, Repeater, Intruder, editor
    ├── burp_collaborator_tools.py # OOB testing, encoding, random string generation
    ├── autorize_tools.py        # Session-swap bypass detection (no Autorize plugin required)
    ├── [30+ security tool modules covering every OWASP Top 10 category]
    └── ...

## Key Tool Groups
| Group | Agent | Count | Purpose |
|---|---|---|---|
| SCOPE_DISCOVERY_TOOLS | scope_discovery_agent | 11 | robots.txt, favicon, path enum, JS analysis, DNS |
| ANALYST_TOOLS | http_analyst | 17 | HTTP history triage, scanner cross-ref |
| AUTH_TOOLS | auth_agent | 8 | Auth endpoint discovery, session extraction |
| FUZZING_TOOLS | fuzzing_agent | 9 | Parameter discovery and fuzzing |
| EXECUTOR_TOOLS | validation_executor | 105 | All vulnerability validation (SSRF, XSS, SQLi, etc.) |
| REVIEWER_TOOLS | lead_pentester | 14 | CVSS scoring, coverage gap, false positive tracking |
| EXPLOITATION_TOOLS | exploitation_agent | 16 | Post-confirmation data extraction, exploit chaining |
| REPORTER_TOOLS | report_generator | 5 | PoC generation, evidence bundler, report filter |

## Core Conventions (NEVER violate)
1. All tools are **singleton instances** in `tools/__init__.py` — never instantiate inline in crew.py
2. `BurpMCPClient` uses `_blocking_call` with 60s timeout via fresh event loop per call
3. `REPORTER_TOOLS` is intentionally empty — report_generator MUST NOT call Burp tools
4. Pipeline is always `Process.sequential` — no parallelism between tasks
5. `call_with_retry` only retries transient errors (timeout, connection refused/reset)
6. CrewAI telemetry is disabled: `CREWAI_DISABLE_TELEMETRY=true`
7. Tests mock `get_client()` — real Burp MCP is never called in tests
8. Autorize bypass uses relative body-size delta (<2%) + structural content match
9. XSS detection: exact unencoded reflection match only (no case-insensitive) to avoid FP
10. SQL UNION detection requires DB fingerprint alongside response changes

## LLM Provider Fallback Chains (Multi-Agent)
- http_analyst: Google → OpenAI → Anthropic → OpenRouter
- validation_executor: OpenAI → Anthropic → Google → OpenRouter
- lead_pentester: Anthropic → OpenAI → Google → OpenRouter
- report_generator: Anthropic → OpenAI → Google → OpenRouter

## Pipeline Gates (pipeline_gates.py)
Pre-flight checks that skip stages with no actionable work:
- `scope_non_empty` → skip scope_discovery if Burp already has traffic
- `auth_endpoints_exist` → skip auth_agent if no auth tokens in history
- `parameters_exist` → skip fuzzing_agent if no parameters discovered
- `confirmed_findings_exist` → skip exploitation_agent if no confirmed findings

## Environment Variables
- LLM: GOOGLE_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY
- Burp: BURP_MCP_HOST (default: 127.0.0.1), BURP_MCP_PORT (default: 9876)
- Engagement: ENGAGEMENT_ID, TARGET_URL, CLIENT_NAME, TEST_TYPE, TESTER_NAME, REPORT_OUTPUT_DIR
- Tuning: COLLABORATOR_WAIT_SECS (default: 30)
- Stealth: STEALTH_MODE, STEALTH_MIN_DELAY_SECS, STEALTH_MAX_DELAY_SECS
- Model overrides: MODEL_PENTESTER, MODEL_ANALYST, MODEL_EXECUTOR, MODEL_REVIEWER, MODEL_REPORTER

When I ask you to work on this project, apply all of the above context automatically.

2. Code Review Prompt

Gunakan ini untuk minta AI mereview kualitas kode, pattern, dan potensi masalah.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Code Review

Please perform a comprehensive code review of the following file(s):

**File(s) to review:**
[PASTE FILE CONTENT or specify filename]

Review against these criteria:

### 1. Correctness & Logic
- Does the logic correctly implement the intended behavior?
- Are edge cases handled (empty input, None returns, connection failures)?
- Does `_blocking_call` always use a fresh event loop (not the main thread loop)?
- Are tool verdicts (CONFIRMED / NOT_CONFIRMED / INCONCLUSIVE / NEEDS_ESCALATION) returned consistently?

### 2. Security of the Tool Itself
- Are credentials/secrets properly redacted before reaching the LLM agents?
- Does any tool accidentally leak Authorization headers, cookies, or API keys in output?
- Are destructive operations (SQLi payloads, command injection) payload-limited to avoid DoS?

### 3. Pattern Compliance (project conventions)
- Are tools defined as class-based subclasses of `BaseTool` with `name`, `description`, and `_run()`?
- Are all tool instances singletons registered in `tools/__init__.py`?
- Does `call_with_retry` handle permanent vs transient errors correctly?
- Does the tool return structured dicts, not raw strings?

### 4. Performance & Reliability
- Is there any synchronous I/O blocking the main thread?
- Are Collaborator payloads always generated BEFORE sending the request?
- Is `COLLABORATOR_WAIT_SECS` env var respected in OOB wait logic?

### 5. Test Coverage Gaps
- Which code paths are not covered by existing tests in `tests/`?
- Which mock patches are missing for reliable isolation?

**Output format:**
- Severity: CRITICAL / HIGH / MEDIUM / LOW / INFO for each finding
- File + line reference
- Explanation of the issue
- Suggested fix with code example

3. Architecture Review Prompt

Gunakan ini untuk evaluasi desain dan arsitektur pipeline secara menyeluruh.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Architecture Review

Review the overall architecture of Pentest Crew across these dimensions:

### 1. Pipeline Integrity
- Analyze the 8-stage sequential pipeline (scope_discovery → ... → report_generator).
- Identify any stages where context bleed between agents could cause issues (e.g., validation_executor receiving unverified candidates from fuzzing_agent).
- Are the pipeline gates in `pipeline_gates.py` comprehensive enough? Are there missing preconditions?
- What happens if a mid-pipeline agent crashes? Is there recovery logic?

### 2. Tool Group Isolation
- Review whether each agent's tool group is properly scoped (no unnecessary capabilities exposed).
- Flag any tools that should NOT be in EXECUTOR_TOOLS (105 tools) for security isolation reasons.
- Is REPORTER_TOOLS intentionally empty — and is that invariant enforced anywhere?

### 3. LLM Fallback Chain
- Review `llm_mode.py` — are provider fallback chains resilient to partial key availability?
- What happens when the preferred provider for a role is unavailable at runtime?
- Is there a risk of a weaker model (e.g., free OpenRouter tier) handling high-stakes validation tasks?

### 4. MCP Client Architecture
- Review `burp_mcp_client.py` singleton pattern: thread safety, event loop lifecycle, retry logic.
- Is the 60s timeout per call appropriate for all tool types? (e.g., Collaborator polling may need longer)
- What is the failure mode if Burp MCP disconnects mid-pipeline?

### 5. Output Sanitizer
- Review `burp_output_sanitizer.py` for completeness of redaction.
- Could any binary data or credential pattern slip through to LLM context?
- Is the `_TEXT_PREVIEW_LIMIT = 2000` and `_BODY_PREVIEW_LIMIT = 1200` appropriate for all response types?

### 6. Single-Agent vs Multi-Agent Parity
- Are there tool behaviors that differ between single-agent (ALL_TOOLS, 118) and multi-agent modes?
- Could context overflow in single-agent mode cause truncation of later pipeline stages?

Provide architectural recommendations with justification for each concern, prioritized by impact.

4. Feature / Improvement Prompt

Gunakan ini untuk menambah fitur baru atau memperbaiki fitur yang sudah ada.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Feature Implementation

### Feature Request
[DESCRIBE THE FEATURE — e.g., "Add WebAuthn/FIDO2 bypass testing capability" or "Add rate-limit detection to fuzzing_agent"]

### Scope of Changes Expected
Based on the project architecture, implement this feature following these rules:

1. **New Tool Module** (if a new tool category):
   - Create `src/pentest_crew/tools/<feature>_tools.py`
   - Each tool must subclass `BaseTool` with `name`, `description`, `_run(self, **kwargs)`
   - Return structured dict: `{"status": "...", "findings": [...], "raw": "..."}`
   - Handle the case when Burp MCP returns empty or error — never crash

2. **Tool Registration** (always required when adding tools):
   - Add singleton instances to `tools/__init__.py`
   - Add to the appropriate tool group constant (EXECUTOR_TOOLS, REVIEWER_TOOLS, etc.)
   - If creating a new group, add it to `__all__` and document the agent it belongs to

3. **TOOL_CATEGORIES Update** (if adding validation category):
   - Register the new category in `TOOL_CATEGORIES` dict in `tools/__init__.py`
   - Map category name → list of tool instances

4. **agents.yaml / tasks.yaml Update**:
   - If the feature changes an agent's role or capabilities, update backstory/goal
   - If a new pipeline stage is needed, add to tasks.yaml with clear context_from_previous

5. **Pipeline Gate** (if the feature requires a precondition):
   - Add a gate function to `pipeline_gates.py` returning `(bool, reason_str)`
   - Register it in `PIPELINE_GATES` dict

6. **Tests**:
   - Create `tests/test_<feature>.py`
   - Mock `get_client()` — never call real Burp MCP
   - Test: happy path, empty response, connection failure, malformed input

### Design Constraints
- Do not add new mandatory environment variables without updating `.env.example`
- Do not expose REPORTER_TOOLS to any Burp tools
- Keep tools idempotent — calling the same tool twice should not cause side effects
- Payloads must be bounded — no unbounded fuzzing loops

Please implement the feature end-to-end with all required changes.

5. New Security Tool Creation Prompt

Gunakan ini khusus ketika ingin menambahkan tool keamanan baru (vulnerability class baru).

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Create New Security Tool

### Vulnerability Class
[SPECIFY — e.g., "HTTP Parameter Pollution (HPP)", "Insecure Deserialization", "OAuth Token Leakage"]

### Template to Follow

Create a new file: `src/pentest_crew/tools/<vuln_name>_tools.py`

Use this exact structure:

```python
"""
<vuln_name>_tools.py
────────────────────
<One-paragraph description of what these tools test and why.>
"""
from __future__ import annotations
from typing import Any
from crewai.tools import BaseTool
from pentest_crew.tools.burp_mcp_client import get_client


class <VulnName>BasicTestTool(BaseTool):
    name: str = "<vuln_name>_basic_test"
    description: str = (
        "Tests for <Vulnerability>. "
        "Input: target_url (str), parameter (str), method (str, default POST). "
        "Returns dict with status, verdict, evidence, request_sent, response_received."
    )

    def _run(self, target_url: str, parameter: str, method: str = "POST", **kwargs: Any) -> dict:
        client = get_client()
        
        # Step 1: Get baseline
        baseline = client.call_with_retry("send_http1_request", {
            "url": target_url,
            "method": method,
            "parameters": {parameter: "baseline_value"},
        })
        
        # Step 2: Send test payload(s) — BOUNDED, max 5 iterations
        payloads = self._get_payloads(parameter)  # max 5 payloads
        results = []
        for payload in payloads[:5]:
            resp = client.call_with_retry("send_http1_request", {
                "url": target_url,
                "method": method,
                "parameters": {parameter: payload},
            })
            verdict = self._analyze(baseline, resp)
            if verdict == "CONFIRMED":
                return {
                    "status": "CONFIRMED",
                    "finding": f"<Vulnerability> at {target_url} param={parameter}",
                    "payload": payload,
                    "evidence": resp,
                }
        
        return {"status": "NOT_CONFIRMED", "tested_payloads": len(payloads)}

    def _get_payloads(self, parameter: str) -> list[str]:
        # Return bounded list of test payloads specific to this vulnerability
        return []

    def _analyze(self, baseline: dict, response: dict) -> str:
        # Compare baseline vs mutated response
        # Return: CONFIRMED / NOT_CONFIRMED / INCONCLUSIVE
        return "NOT_CONFIRMED"

### Required Deliverables
1. The full tool module with at least 3 tool classes (basic, advanced/OOB, full_test)
2. Registration in `tools/__init__.py` (import + singleton + group assignment)
3. Entry in `TOOL_CATEGORIES` if this is a new category
4. Test file `tests/test_<vuln_name>_tools.py` with at minimum:
   - `test_basic_tool_confirmed()` — mock returns exploitable response
   - `test_basic_tool_not_confirmed()` — mock returns normal response
   - `test_connection_failure()` — mock raises ConnectionError

### Quality Gates
- All payloads must be minimal PoC (no destructive, no DoS)
- OOB tests must use `generate_collaborator_payload` before sending the request
- Return consistent structure: `{"status": str, "finding": str, "evidence": dict}`
- Never hardcode secrets or target IPs

6. Agent & Task Config Prompt

Gunakan ini untuk memodifikasi atau menambah agent baru di agents.yaml / tasks.yaml.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Update Agent / Task Configuration

### What to Change
[DESCRIBE — e.g., "Add a new mobile_api_agent focused on Android/iOS API testing" or "Improve auth_agent backstory to better handle OAuth 2.0 PKCE flows"]

### agents.yaml Template

```yaml
<agent_name>:
  role: <Descriptive Role Title>
  
  goal: |
    <One clear statement of what this agent must accomplish for engagement {engagement_id}.>
    You MUST call tools before drawing conclusions.
    Do NOT produce findings without observable evidence from tool calls.

  backstory: |
    ## Your Identity
    <2-3 sentences describing expertise and methodology.>
    
    ## Step-by-Step Execution (follow in order)
    STEP 1 — <First action, always a tool call>
    STEP 2 — <Second action>
    ...
    STEP N — Produce structured output for the next pipeline stage.
    
    ## Escalation
    If a case cannot be tested (missing permissions, unsupported protocol, rate-limited):
    - Mark verdict as NEEDS_ESCALATION
    - Document what was attempted and why it could not be completed
    
    ## Output Format
    Return a JSON-serializable dict:
    {
      "agent": "<agent_name>",
      "engagement_id": "{engagement_id}",
      "findings": [...],
      "candidates_for_next_stage": [...],
      "skipped": [...]
    }

### tasks.yaml Template

```yaml
<task_name>:
  description: |
    <Clear, directive description of what to accomplish.>
    Engagement: {engagement_id} | Target: {target_url} | Client: {client_name}
    
    ## Required Actions
    1. <First required action>
    2. <Second required action>
    
    ## Output Requirements
    - List only CONFIRMED or INCONCLUSIVE findings
    - Include: endpoint, parameter, payload, response delta, CVSS score estimate
    - Do NOT include speculative findings without evidence

  expected_output: |
    A structured Markdown report with:
    ## Summary
    ## Findings (CONFIRMED)
    ## Candidates for Next Stage
    ## Untested / Escalated
    
  agent: <agent_name>

### Rules When Modifying YAML Configs
1. Never remove `{engagement_id}` template variable — it's interpolated at runtime
2. Agent backstories must include explicit "call tools first" instruction
3. Tasks must specify `expected_output` clearly so the next stage agent knows what to parse
4. If adding a new agent to multi-agent pipeline, also:
   - Add corresponding LLM fallback chain in `crew.py`
   - Add tool group constant in `tools/__init__.py`
   - Add pipeline gate in `pipeline_gates.py` if there's a precondition
   - Update README.md Agent Reference table

Please implement the requested changes and explain the reasoning for each decision.

7. Bug Fix & Debug Prompt

Gunakan ini ketika ada error atau behavior yang tidak diharapkan.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Debug and Fix

### Bug Report

**Symptom:**
[DESCRIBE WHAT'S HAPPENING — e.g., "validation_executor crashes with KeyError: 'statusCode' on some Burp responses"]

**Error / Stack Trace:**

[PASTE FULL TRACEBACK HERE]

**Reproduction Steps:**
1. [Step 1]
2. [Step 2]

**Expected Behavior:**
[WHAT SHOULD HAPPEN]

**Affected File(s):**
[PASTE FILE CONTENT OR FILENAME]

### Debug Checklist — Apply These First

1. **MCP Response Key Access**: Always use `response.get("statusCode") or response.get("status")` — never bare dict access. Check `_get_status()` helper in autorize_tools.py for the canonical pattern.

2. **Event Loop Safety**: `_blocking_call` creates a fresh event loop per call. If the error is `RuntimeError: This event loop is already running`, a coroutine is being called from within an async context. Use `asyncio.run_coroutine_threadsafe()` instead.

3. **Empty Burp Responses**: Burp MCP returns `{}` or `{"data": ""}` when history/scope is empty. All tools must handle this gracefully — return `{"status": "NO_DATA", "reason": "..."}`, never raise.

4. **Singleton Import Order**: If getting `ImportError` or `AttributeError` from `tools/__init__.py`, check for circular imports — all tool files must import `get_client` from `burp_mcp_client`, never from `tools/__init__`.

5. **YAML Interpolation**: If seeing `KeyError: 'engagement_id'` in CrewAI, the `{engagement_id}` template in agents.yaml/tasks.yaml must match the key in `inputs` dict from `main.py`.

6. **Test Isolation**: Tests must `patch("pentest_crew.tools.<module>.get_client")` — patching the wrong module path means the real client gets called.

Please:
1. Identify the root cause
2. Show the minimal fix
3. Add a regression test that would have caught this bug
4. Flag any related code paths with the same vulnerability

8. Test Coverage Prompt

Gunakan ini untuk menambah atau memperbaiki test coverage.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Improve Test Coverage

### Target File(s) for Testing
[SPECIFY — e.g., "src/pentest_crew/tools/jwt_security_tools.py" or "src/pentest_crew/pipeline_gates.py"]

### Test Writing Rules for This Project

1. **Always mock the MCP client — never hit real Burp:**
```python
from unittest.mock import MagicMock, patch

@patch("pentest_crew.tools.jwt_security_tools.get_client")
def test_jwt_none_bypass_confirmed(mock_get_client):
    mock_client = MagicMock()
    mock_get_client.return_value = mock_client
    mock_client.call_with_retry.return_value = {
        "statusCode": 200,
        "body": '{"user": "admin", "role": "admin"}'
    }
    # ... rest of test

2. **Required test scenarios for each tool:**
   - ✅ Happy path: tool returns CONFIRMED with mock exploitable response
   - ✅ Not confirmed: tool returns NOT_CONFIRMED with normal baseline response
   - ✅ Empty/no data: Burp returns `{}` — tool must not crash
   - ✅ Connection failure: `get_client()` raises `ConnectionRefusedError` — tool returns error dict
   - ✅ Malformed response: Burp returns unexpected schema — tool handles gracefully
   - ✅ Rate limit / 429 response: tool recognizes and marks INCONCLUSIVE

3. **Required test scenarios for pipeline gates:**
   - Gate returns `(True, reason)` when precondition met
   - Gate returns `(False, reason)` when precondition not met
   - Gate returns `(True, reason)` when MCP call fails (fail-open, not fail-closed)

4. **Assertions to always include:**
   - `result["status"]` is one of: CONFIRMED, NOT_CONFIRMED, INCONCLUSIVE, NEEDS_ESCALATION, NO_DATA, ERROR
   - Tool never returns `None`
   - Tool never raises unhandled exception

5. **Test file naming:** `tests/test_<module_name>.py`

### Generate Tests For
[SPECIFY WHICH SCENARIOS / METHODS TO TEST]

Please generate complete, runnable test code with all imports. Each test must be isolated and not depend on execution order.

9. Security Hardening Prompt

Gunakan ini untuk audit keamanan dari tool itu sendiri — memastikan pentest tool tidak bisa disalahgunakan.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Security Hardening Audit

Review the Pentest Crew codebase for security weaknesses IN THE TOOL ITSELF (meta-security audit).

### 1. Credential Leakage to LLM Context
Audit `burp_output_sanitizer.py`:
- Are ALL of these header patterns redacted: Authorization, Cookie, Set-Cookie, X-API-Key, Bearer, JWT, PHPSESSID, session_id, API-Key, X-Auth-Token?
- Is `_SENSITIVE_HEADER_RE` regex comprehensive and case-insensitive?
- What happens with custom auth headers like `X-Internal-Token` or `Tenant-Secret`?
- Are response body patterns (JSON fields `"password"`, `"secret"`, `"apiKey"`) also redacted?

### 2. Payload Injection (Prompt Injection via Burp Responses)
- Could a malicious web server return a crafted HTTP response containing "Ignore previous instructions" or similar prompt injection payloads that manipulate the LLM agents?
- Is there any sanitization of HTTP response bodies before they reach the agent's context?
- Review `burp_output_sanitizer.py` `_TEXT_PREVIEW_LIMIT` and `_BODY_PREVIEW_LIMIT` — do they prevent oversized injections?

### 3. Path Traversal in Evidence/Report Storage
Review `evidence_capture_tools.py` and `fp_tracker_tools.py`:
- Is `engagement_id` sanitized before use in file paths like `evidence_store/{engagement_id}_*.json`?
- Could a crafted `ENGAGEMENT_ID=../../../../etc/passwd` cause path traversal?
- Are file writes bounded (no unbounded evidence bundle size)?

### 4. SSRF via Stealth Mode
Review `burp_mcp_client.py` stealth mode (UA rotation):
- Does the stealth mode call `set_project_options` to inject User-Agents? Could a malicious UA string cause issues?
- Is `BURP_MCP_HOST` validated to only allow localhost/127.0.0.1?

### 5. Scope Enforcement
- Is there any check that `target_url` in `.env` matches the actual Burp scope?
- Could an agent be manipulated to test out-of-scope targets via prompt injection in HTTP responses?

### 6. Log Security
- Does `logs/pentest_crew_log.txt` contain redacted credentials? Or are raw requests/responses logged?
- Is log rotation implemented to prevent disk exhaustion?

For each finding, provide:
- Severity: CRITICAL / HIGH / MEDIUM / LOW
- Attack scenario
- Code location
- Recommended fix with code

10. Refactoring Prompt

Gunakan ini untuk refaktor kode agar lebih maintainable, DRY, atau idiomatic Python.

[PASTE MASTER CONTEXT PROMPT FIRST]

## Task: Refactoring

### Refactoring Goal
[DESCRIBE — e.g., "Eliminate code duplication across sql_injection_tools.py, xss_bypass_tools.py, and ssrf_tools.py — they all have similar _run() boilerplate" or "Extract common HTTP baseline+mutate+compare pattern into a reusable mixin"]

### Refactoring Principles for This Project

1. **Keep tools as BaseTool subclasses** — do not change the API contract
2. **Preserve singleton pattern** — after refactoring, `tools/__init__.py` must still export the same singleton names
3. **No behavior changes** — existing tests must still pass unchanged
4. **Extract common patterns** into a shared base class or utility function in `tools/_base.py` (create if needed)

### Common Duplication Patterns to Address

**Pattern A — Baseline + Mutate + Compare:**
```python
# BEFORE (repeated in 20+ tools):
baseline = client.call_with_retry("send_http1_request", {...baseline params...})
mutated = client.call_with_retry("send_http1_request", {...mutated params...})
if mutated.get("statusCode") != baseline.get("statusCode"):
    return {"status": "CONFIRMED", ...}

# AFTER (extracted utility):
from pentest_crew.tools._base import baseline_mutate_compare
result = baseline_mutate_compare(client, url, param, payload, method)

**Pattern B — OOB Collaborator flow:**
```python
# Repeated in ssrf_tools, xxe_tools, command_injection_tools:
collab = client.call_with_retry("generate_collaborator_payload", {})
# ... send request with collab payload ...
interactions = client.call_with_retry("poll_collaborator_with_wait", {...})

**Pattern C — Standard tool return dict:**
```python
# Inconsistent across tools — standardize to:
{
  "status": "CONFIRMED" | "NOT_CONFIRMED" | "INCONCLUSIVE" | "NEEDS_ESCALATION" | "NO_DATA" | "ERROR",
  "finding": str,        # human-readable finding description
  "evidence": dict,      # raw request/response pair
  "cvss_hint": str,      # optional: e.g. "High (8.1)"
  "wstg_id": str,        # optional: e.g. "WSTG-INPV-04"
}

### Deliverables
1. Refactored code with extracted utility/base class
2. Updated `tools/__init__.py` if new modules added
3. Confirmation that all existing tests still pass (no behavior changes)
4. Brief explanation of what was extracted and why

Apply changes incrementally — one pattern at a time — so each step can be reviewed independently.

💡 Tips Penggunaan Prompt

Urutan yang Direkomendasikan untuk Sesi Baru

  1. Paste Prompt #1 (Master Context) → Pastikan AI mengerti arsitektur
  2. Jalankan task spesifik (Review, Feature, Bug Fix, dll.)
  3. Untuk perubahan besar, akhiri dengan Prompt #8 (Test Coverage)

Kombinasi Prompt yang Powerful

Tujuan Kombinasi
Tambah vulnerability class baru #1 → #5 → #8
Review dan perbaiki kode existing #1 → #2 → #7 → #8
Audit keamanan menyeluruh #1 → #3 → #9
Optimasi & cleanup kode #1 → #2 → #10 → #8
Tambah agent baru ke pipeline #1 → #6 → #4 → #8

Variabel yang Perlu Diganti

Setiap prompt menggunakan placeholder [BRACKET] yang harus diganti:

  • [PASTE MASTER CONTEXT PROMPT FIRST] → Paste isi Prompt #1 secara literal
  • [PASTE FILE CONTENT or specify filename] → Konten file yang direview
  • [DESCRIBE THE FEATURE] → Deskripsi fitur yang diinginkan
  • [SPECIFY] → Nama file atau vulnerability class

Generated for Pentest Crew — CrewAI + Burp Suite MCP multi-agent penetration testing pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment