vibe-pentest-ai-security-testing

AI-powered automated penetration testing tool using multi-agent architecture for web applications, APIs, and admin panels
Skill file

Preview skill file↓↑
---
name: vibe-pentest-ai-security-testing
description: AI-powered automated penetration testing tool using multi-agent architecture for web applications, APIs, and admin panels
triggers:
  - run automated penetration test on web application
  - perform AI-based security testing with vibe-pentest
  - execute multi-agent black box security scan
  - scan web app for vulnerabilities using AI agents
  - conduct automated pentest with business logic testing
  - generate security assessment report with vibe-pentest
  - test web application security using AI automation
  - perform comprehensive web vulnerability scan
---

# vibe-pentest-ai-security-testing

> Skill by [ara.so](https://ara.so) — Security Skills collection.

Vibe Pentest is an AI Agent-based automated penetration testing tool that uses a multi-agent parallel execution architecture to perform comprehensive black-box penetration testing (including business logic vulnerability assessment) on web applications, APIs, and admin backends. It outputs stable and reliable security reports with actionable remediation recommendations.

## Overview

Vibe Pentest orchestrates multiple AI agents to:
- Fingerprint web technologies and frameworks
- Crawl and map attack surfaces
- Execute parallel vulnerability testing across multiple categories
- Identify business logic flaws
- Generate comprehensive security reports in HTML and DOCX formats

**Version**: v1.0.7  
**License**: AGPL-3.0  
**Primary Language**: Python

## Installation

### Prerequisites

1. **Git** (required for auto-update mechanism):
```bash
# Clone from Gitee (recommended for better access)
git clone https://gitee.com/ok-helloworld/vibe-pentest
cd vibe-pentest

# Or from GitHub
git clone https://github.com/ok-helloworld/vibe-pentest
cd vibe-pentest
```

2. **Python 3.10+** and dependencies:
```bash
pip install playwright python-docx matplotlib requests urllib3 argparse httpx charset-normalizer chardet
playwright install chromium
```

3. **Katana Crawler** (included for Windows, download for other OS):
   - Windows version included in `tools/katana`
   - For other OS, see `tools/katana_downloads.json` for download links

### Automated Installation

You can also ask your AI coding agent to install everything:

```text
Install vibe-pentest skill including all runtime dependencies from: https://gitee.com/ok-helloworld/vibe-pentest
```

## Project Structure

```
vibe-pentest/
├── scripts/              # Core testing scripts
│   ├── run_katana.py    # Crawler wrapper
│   ├── prepare_agent_findings.py  # Multi-agent orchestration
│   ├── generate_report.py         # Report generation
│   └── ...
├── tools/               # External tools (katana, etc.)
├── workspace/           # Test outputs (created during execution)
│   ├── sessions/        # Browser session data
│   ├── findings/        # Vulnerability findings
│   └── report_result/   # Final reports
└── prompts/             # AI prompt templates
```

## Core Testing Workflow

Vibe Pentest follows a 7-phase workflow:

### Phase 0: Fingerprinting
Identify web technologies, frameworks, and server information.

```python
# Example: Running fingerprint detection
import subprocess
import json

result = subprocess.run(
    ["python", "scripts/fingerprint.py", "--url", "https://example.com"],
    capture_output=True,
    text=True
)

fingerprint = json.loads(result.stdout)
print(f"Detected: {fingerprint.get('framework')}, {fingerprint.get('server')}")
```

### Phase 0.5: Backend Entry Scanning
Scan for admin panels and sensitive endpoints.

### Phase 1: Authorization Confirmation
Verify written authorization before proceeding.

### Phase 2: Browser Login & Credential Extraction
Launch browser for manual login, extract session cookies/tokens.

```python
# Example: Browser session extraction
from playwright.sync_api import sync_playwright

def extract_session(target_url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context()
        page = context.new_page()
        
        page.goto(target_url)
        input("Press Enter after logging in manually...")
        
        # Extract cookies and local storage
        cookies = context.cookies()
        storage = page.evaluate("() => Object.assign({}, localStorage)")
        
        browser.close()
        return {"cookies": cookies, "storage": storage}
```

### Phase 3: Katana Crawling
Use Katana crawler to discover all endpoints and parameters.

```python
# Example: Running Katana crawler via script
import subprocess
import time

# Run crawler (must execute outside sandbox)
proc = subprocess.Popen(
    ["python", "scripts/run_katana.py", 
     "--url", "https://example.com",
     "--cookies", "session=abc123"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

# Monitor for max 20 minutes
timeout = 1200
start_time = time.time()
while proc.poll() is None and (time.time() - start_time) < timeout:
    time.sleep(10)

if proc.poll() is None:
    proc.terminate()
    time.sleep(5)  # Wait for results to flush

# Read crawl results
with open("workspace/crawl_summary.json") as f:
    crawl_data = json.load(f)
```

### Phase 4: Data Cleaning
Process crawler output, deduplicate URLs, extract parameters.

### Phase 4.5: Attack Surface Mapping
Map discovered endpoints to vulnerability test categories.

### Phase 5: Multi-Agent Parallel Testing
Distribute testing across 6 specialized agents using prepared skeleton files.

```python
# Example: Preparing agent findings skeleton
import subprocess
import json

# Generate skeleton findings for 6 agents
subprocess.run([
    "python", "scripts/prepare_agent_findings.py",
    "--targets", "workspace/targets.txt",
    "--fingerprint", "workspace/fingerprint.json",
    "--output", "workspace/findings"
])

# Each agent gets assigned specific test categories:
# Agent 1: Authentication & Authorization
# Agent 2: Injection Attacks (SQLi, XSS, etc.)
# Agent 3: Business Logic & IDOR
# Agent 4: File Upload & Path Traversal
# Agent 5: API Security & Rate Limiting
# Agent 6: Information Disclosure & Misconfigurations
```

### Phase 5.5: Attack Chain Analysis
Identify cross-agent attack chains and compound vulnerabilities.

### Phase 5.6: Evidence Verification
Re-verify confirmed vulnerabilities with HTTP evidence.

```python
# Example: Verifying SQLi finding
import httpx

def verify_sqli(endpoint, payload, original_response_time):
    # Time-based SQLi verification
    url = f"{endpoint}?id={payload}"
    
    start = time.time()
    response = httpx.get(url, timeout=30)
    elapsed = time.time() - start
    
    if elapsed > original_response_time + 5:
        return {
            "verified": True,
            "method": "GET",
            "url": url,
            "response_time": elapsed,
            "status_code": response.status_code
        }
    return {"verified": False}
```

### Phase 6: Report Generation
Generate comprehensive reports in multiple formats.

```python
# Example: Generating final report
import subprocess
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

subprocess.run([
    "python", "scripts/generate_report.py",
    "--findings-dir", "workspace/findings",
    "--fingerprint", "workspace/fingerprint.json",
    "--output-json", f"workspace/report_{timestamp}.json",
    "--output-html", f"workspace/report_{timestamp}.html",
    "--output-docx", f"workspace/report_{timestamp}.docx"
])
```

## Configuration

### Environment Variables

```bash
# Set custom workspace directory
export VIBE_WORKSPACE="/path/to/workspace"

# Configure crawler timeout (seconds)
export KATANA_TIMEOUT=1200

# Set max concurrent agents
export MAX_AGENTS=6

# Configure LLM provider (for AI agents)
export OPENAI_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here
```

### Testing Principles

**Critical Rules**:
1. All sub-agents MUST actively investigate, not wait for prompts
2. If Katana runs >20 minutes, terminate and collect results
3. Test ALL discovered functionality, not just entry points
4. Attempt 2-3 bypass techniques on failed tests
5. **Iron Law**: May modify/delete own test data; NEVER modify production data

## Common Usage Patterns

### Standard Authorized Testing

```python
"""
Complete penetration test workflow with single account
"""
import os
import subprocess
import json

def run_standard_pentest(target_url, auth_statement, account_info):
    workspace = "workspace"
    os.makedirs(workspace, exist_ok=True)
    
    # Phase 0: Fingerprinting
    print("[Phase 0] Fingerprinting...")
    subprocess.run(["python", "scripts/fingerprint.py", 
                   "--url", target_url,
                   "--output", f"{workspace}/fingerprint.json"])
    
    # Phase 0.5: Backend scanning
    print("[Phase 0.5] Scanning for admin panels...")
    subprocess.run(["python", "scripts/admin_scanner.py",
                   "--url", target_url,
                   "--output", f"{workspace}/admin_entries.json"])
    
    # Phase 1: Confirm authorization
    print(f"[Phase 1] Authorization: {auth_statement}")
    
    # Phase 2: Extract session
    print("[Phase 2] Launch browser for manual login...")
    session_data = extract_session(target_url)
    with open(f"{workspace}/sessions/session.json", "w") as f:
        json.dump(session_data, f)
    
    # Phase 3: Crawl
    print("[Phase 3] Running Katana crawler...")
    run_katana(target_url, session_data)
    
    # Phase 4: Clean data
    print("[Phase 4] Processing crawler data...")
    subprocess.run(["python", "scripts/clean_crawl_data.py",
                   "--input", f"{workspace}/crawled_anonymous.jsonl",
                   "--output", f"{workspace}/targets.txt"])
    
    # Phase 4.5: Map attack surface
    print("[Phase 4.5] Mapping attack surface...")
    subprocess.run(["python", "scripts/map_attack_surface.py",
                   "--targets", f"{workspace}/targets.txt",
                   "--fingerprint", f"{workspace}/fingerprint.json",
                   "--output", f"{workspace}/attack_surface.json"])
    
    # Phase 5: Multi-agent testing
    print("[Phase 5] Launching 6 parallel agents...")
    subprocess.run(["python", "scripts/prepare_agent_findings.py",
                   "--targets", f"{workspace}/targets.txt",
                   "--fingerprint", f"{workspace}/fingerprint.json",
                   "--output", f"{workspace}/findings"])
    
    # Phase 5.5: Attack chain analysis
    print("[Phase 5.5] Analyzing attack chains...")
    subprocess.run(["python", "scripts/analyze_chains.py",
                   "--findings", f"{workspace}/findings",
                   "--output", f"{workspace}/attack_chains.json"])
    
    # Phase 5.6: Verify evidence
    print("[Phase 5.6] Verifying vulnerability evidence...")
    subprocess.run(["python", "scripts/verify_findings.py",
                   "--findings", f"{workspace}/findings"])
    
    # Phase 6: Generate reports
    print("[Phase 6] Generating final reports...")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    subprocess.run(["python", "scripts/generate_report.py",
                   "--findings-dir", f"{workspace}/findings",
                   "--output-json", f"{workspace}/report_{timestamp}.json",
                   "--output-html", f"{workspace}/report_{timestamp}.html",
                   "--output-docx", f"{workspace}/report_{timestamp}.docx"])
    
    print(f"✓ Reports generated in {workspace}/report_result/")

# Usage
run_standard_pentest(
    target_url="https://example.com",
    auth_statement="Written authorization obtained for full-scope testing",
    account_info={"username": "testuser", "password": "from_env"}
)
```

### Multi-Account Privilege Escalation Testing

```python
"""
Test for privilege escalation and horizontal authorization bypass
using multiple accounts with different permission levels
"""
def run_multiuser_pentest(target_url, accounts):
    workspace = "workspace"
    
    # Extract sessions for all accounts
    sessions = {}
    for role, account in accounts.items():
        print(f"[Phase 2.{role}] Login as {role}...")
        sessions[role] = extract_session(target_url)
        with open(f"{workspace}/sessions/{role}_session.json", "w") as f:
            json.dump(sessions[role], f)
    
    # Crawl with each role
    for role, session in sessions.items():
        print(f"[Phase 3.{role}] Crawling as {role}...")
        run_katana(target_url, session, output_prefix=role)
    
    # Merge crawl results
    subprocess.run(["python", "scripts/merge_crawl_results.py",
                   "--inputs", f"{workspace}/*_crawled.jsonl",
                   "--output", f"{workspace}/targets.txt"])
    
    # Continue with standard workflow...
    # Phase 5 agents will automatically test for IDOR/privilege escalation
    # using the multiple session data

# Usage
run_multiuser_pentest(
    target_url="https://example.com",
    accounts={
        "admin": {"username": "admin", "password": "from_env"},
        "user": {"username": "normaluser", "password": "from_env"}
    }
)
```

### Report-Only Generation

```python
"""
Generate reports from existing findings without re-scanning
Useful when you need to regenerate reports after manual review
"""
def generate_reports_only(workspace="workspace"):
    from datetime import datetime
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Phase 5.5: Re-analyze attack chains
    subprocess.run(["python", "scripts/analyze_chains.py",
                   "--findings", f"{workspace}/findings",
                   "--output", f"{workspace}/attack_chains.json"])
    
    # Phase 5.6: Re-verify findings
    subprocess.run(["python", "scripts/verify_findings.py",
                   "--findings", f"{workspace}/findings"])
    
    # Phase 6: Generate reports
    subprocess.run(["python", "scripts/generate_report.py",
                   "--findings-dir", f"{workspace}/findings",
                   "--fingerprint", f"{workspace}/fingerprint.json",
                   "--output-json", f"{workspace}/report_{timestamp}.json",
                   "--output-html", f"{workspace}/report_{timestamp}.html",
                   "--output-docx", f"{workspace}/report_{timestamp}.docx"])
    
    # Validate report formats
    validate_reports(workspace, timestamp)

def validate_reports(workspace, timestamp):
    import os
    
    json_path = f"{workspace}/report_{timestamp}.json"
    html_path = f"{workspace}/report_{timestamp}.html"
    docx_path = f"{workspace}/report_{timestamp}.docx"
    
    assert os.path.exists(json_path), "JSON report missing"
    assert os.path.exists(html_path), "HTML report missing"
    assert os.path.exists(docx_path), "DOCX report missing"
    
    with open(json_path) as f:
        report_data = json.load(f)
        assert "findings" in report_data
        assert "summary" in report_data
    
    print("✓ All report formats validated")
```

## Troubleshooting

### Katana Crawler Issues

**Problem**: Crawler returns empty results or finishes in <20 seconds

```python
# Solution: Verify session cookies are valid
import json

with open("workspace/sessions/session.json") as f:
    session = json.load(f)
    
# Check cookie expiration
for cookie in session["cookies"]:
    if "expires" in cookie:
        print(f"{cookie['name']}: expires {cookie['expires']}")

# Re-extract session if cookies expired
session = extract_session(target_url)
```

**Problem**: Crawler times out or hangs

```bash
# Solution: Reduce crawl depth and concurrency
python scripts/run_katana.py \
    --url https://example.com \
    --depth 3 \
    --concurrency 5 \
    --timeout 600
```

### Multi-Agent Coordination

**Problem**: Agents not finding vulnerabilities

```python
# Solution: Check that skeleton findings were properly generated
import os

findings_dir = "workspace/findings"
agents = ["auth", "injection", "logic", "upload", "api", "info"]

for agent in agents:
    skeleton_path = f"{findings_dir}/{agent}_findings.json"
    if not os.path.exists(skeleton_path):
        print(f"Missing skeleton for {agent} agent")
        # Regenerate skeletons
        subprocess.run(["python", "scripts/prepare_agent_findings.py",
                       "--targets", "workspace/targets.txt",
                       "--fingerprint", "workspace/fingerprint.json",
                       "--output", findings_dir])
        break
```

**Problem**: Agents marking everything as "Potential" without confirmation

```text
Reminder for AI agents:
- Must attempt actual exploitation, not just theory
- Require HTTP request/response evidence for "Confirmed" status
- Try 2-3 bypass techniques on WAF/validation failures
- Mark as "Potential" only if technical constraints prevent confirmation
```

### Report Generation Failures

**Problem**: Report missing sections or malformed

```python
# Solution: Validate findings structure before report generation
def validate_findings_structure(findings_dir):
    import glob
    
    for finding_file in glob.glob(f"{findings_dir}/*_findings.json"):
        with open(finding_file) as f:
            data = json.load(f)
            
        required_fields = ["agent_name", "findings", "summary"]
        for field in required_fields:
            assert field in data, f"Missing {field} in {finding_file}"
            
        for finding in data["findings"]:
            assert "title" in finding
            assert "severity" in finding
            assert "status" in finding  # Confirmed, Potential, or False Positive
            assert "evidence" in finding
            
    print("✓ All findings files valid")

validate_findings_structure("workspace/findings")
```

### Session Extraction Issues

**Problem**: Browser doesn't launch or session not captured

```python
# Solution: Use explicit browser path and user data directory
from playwright.sync_api import sync_playwright

def extract_session_robust(target_url):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            executable_path="/usr/bin/chromium",  # Adjust for your system
            args=["--disable-blink-features=AutomationControlled"]
        )
        
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
        )
        
        page = context.new_page()
        page.goto(target_url)
        
        print("Please log in manually. Press Enter when done...")
        input()
        
        # Comprehensive credential extraction
        cookies = context.cookies()
        storage = page.evaluate("() => Object.assign({}, localStorage)")
        session_storage = page.evaluate("() => Object.assign({}, sessionStorage)")
        
        # Capture auth headers from network traffic
        auth_headers = {}
        def handle_response(response):
            if "authorization" in response.request.headers:
                auth_headers["Authorization"] = response.request.headers["authorization"]
        
        page.on("response", handle_response)
        page.reload()
        page.wait_for_load_state("networkidle")
        
        browser.close()
        
        return {
            "cookies": cookies,
            "localStorage": storage,
            "sessionStorage": session_storage,
            "headers": auth_headers
        }
```

## Best Practices

1. **Always use separate workspaces** for different targets to avoid cross-contamination
2. **Verify authorization** documentation before starting any test
3. **Test on staging/UAT** environments when possible, not production
4. **Review findings manually** before delivering reports to clients
5. **Keep vibe-pentest updated** using `git pull` to get latest detection techniques
6. **Use multiple accounts** to thoroughly test authorization controls
7. **Document custom test data** created during testing for cleanup

## Additional Resources

- **3-minute tutorial video**: [bilibili.com/video/BV1RiGX6rESQ/](https://www.bilibili.com/video/BV1RiGX6rESQ/)
- **Sub-agent setup guide**: See `sub_agent.md` in repository
- **Katana documentation**: [github.com/projectdiscovery/katana](https://github.com/projectdiscovery/katana)
Source

Creator's repository · aradotso/security-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk