vibe-pentest-ai-security-testing

AI-powered automated penetration testing tool using multi-agent architecture for web applications, APIs, and admin panels

Skill file

Preview skill file
---
name: vibe-pentest-ai-security-testing
description: AI-powered automated penetration testing tool using multi-agent architecture for web applications, APIs, and admin panels
triggers:
  - run automated penetration test on web application
  - perform AI-based security testing with vibe-pentest
  - execute multi-agent black box security scan
  - scan web app for vulnerabilities using AI agents
  - conduct automated pentest with business logic testing
  - generate security assessment report with vibe-pentest
  - test web application security using AI automation
  - perform comprehensive web vulnerability scan
---

# vibe-pentest-ai-security-testing

> Skill by [ara.so](https://ara.so) — Security Skills collection.

Vibe Pentest is an AI Agent-based automated penetration testing tool that uses a multi-agent parallel execution architecture to perform comprehensive black-box penetration testing (including business logic vulnerability assessment) on web applications, APIs, and admin backends. It outputs stable and reliable security reports with actionable remediation recommendations.

## Overview

Vibe Pentest orchestrates multiple AI agents to:
- Fingerprint web technologies and frameworks
- Crawl and map attack surfaces
- Execute parallel vulnerability testing across multiple categories
- Identify business logic flaws
- Generate comprehensive security reports in HTML and DOCX formats

**Version**: v1.0.7  
**License**: AGPL-3.0  
**Primary Language**: Python

## Installation

### Prerequisites

1. **Git** (required for auto-update mechanism):
```bash
# Clone from Gitee (recommended for better access)
git clone https://gitee.com/ok-helloworld/vibe-pentest
cd vibe-pentest

# Or from GitHub
git clone https://github.com/ok-helloworld/vibe-pentest
cd vibe-pentest
```

2. **Python 3.10+** and dependencies:
```bash
pip install playwright python-docx matplotlib requests urllib3 argparse httpx charset-normalizer chardet
playwright install chromium
```

3. **Katana Crawler** (included for Windows, download for other OS):
   - Windows version included in `tools/katana`
   - For other OS, see `tools/katana_downloads.json` for download links

### Automated Installation

You can also ask your AI coding agent to install everything:

```text
Install vibe-pentest skill including all runtime dependencies from: https://gitee.com/ok-helloworld/vibe-pentest
```

## Project Structure

```
vibe-pentest/
├── scripts/              # Core testing scripts
│   ├── run_katana.py    # Crawler wrapper
│   ├── prepare_agent_findings.py  # Multi-agent orchestration
│   ├── generate_report.py         # Report generation
│   └── ...
├── tools/               # External tools (katana, etc.)
├── workspace/           # Test outputs (created during execution)
│   ├── sessions/        # Browser session data
│   ├── findings/        # Vulnerability findings
│   └── report_result/   # Final reports
└── prompts/             # AI prompt templates
```

## Core Testing Workflow

Vibe Pentest follows a 7-phase workflow:

### Phase 0: Fingerprinting
Identify web technologies, frameworks, and server information.

```python
# Example: Running fingerprint detection
import subprocess
import json

result = subprocess.run(
    ["python", "scripts/fingerprint.py", "--url", "https://example.com"],
    capture_output=True,
    text=True
)

fingerprint = json.loads(result.stdout)
print(f"Detected: {fingerprint.get('framework')}, {fingerprint.get('server')}")
```

### Phase 0.5: Backend Entry Scanning
Scan for admin panels and sensitive endpoints.

### Phase 1: Authorization Confirmation
Verify written authorization before proceeding.

### Phase 2: Browser Login & Credential Extraction
Launch browser for manual login, extract session cookies/tokens.

```python
# Example: Browser session extraction
from playwright.sync_api import sync_playwright

def extract_session(target_url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context()
        page = context.new_page()
        
        page.goto(target_url)
        input("Press Enter after logging in manually...")
        
        # Extract cookies and local storage
        cookies = context.cookies()
        storage = page.evaluate("() => Object.assign({}, localStorage)")
        
        browser.close()
        return {"cookies": cookies, "storage": storage}
```

### Phase 3: Katana Crawling
Use Katana crawler to discover all endpoints and parameters.

```python
# Example: Running Katana crawler via script
import subprocess
import time

# Run crawler (must execute outside sandbox)
proc = subprocess.Popen(
    ["python", "scripts/run_katana.py", 
     "--url", "https://example.com",
     "--cookies", "session=abc123"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

# Monitor for max 20 minutes
timeout = 1200
start_time = time.time()
while proc.poll() is None and (time.time() - start_time) < timeout:
    time.sleep(10)

if proc.poll() is None:
    proc.terminate()
    time.sleep(5)  # Wait for results to flush

# Read crawl results
with open("workspace/crawl_summary.json") as f:
    crawl_data = json.load(f)
```

### Phase 4: Data Cleaning
Process crawler output, deduplicate URLs, extract parameters.

### Phase 4.5: Attack Surface Mapping
Map discovered endpoints to vulnerability test categories.

### Phase 5: Multi-Agent Parallel Testing
Distribute testing across 6 specialized agents using prepared skeleton files.

```python
# Example: Preparing agent findings skeleton
import subprocess
import json

# Generate skeleton findings for 6 agents
subprocess.run([
    "python", "scripts/prepare_agent_findings.py",
    "--targets", "workspace/targets.txt",
    "--fingerprint", "workspace/fingerprint.json",
    "--output", "workspace/findings"
])

# Each agent gets assigned specific test categories:
# Agent 1: Authentication & Authorization
# Agent 2: Injection Attacks (SQLi, XSS, etc.)
# Agent 3: Business Logic & IDOR
# Agent 4: File Upload & Path Traversal
# Agent 5: API Security & Rate Limiting
# Agent 6: Information Disclosure & Misconfigurations
```

### Phase 5.5: Attack Chain Analysis
Identify cross-agent attack chains and compound vulnerabilities.

### Phase 5.6: Evidence Verification
Re-verify confirmed vulnerabilities with HTTP evidence.

```python
# Example: Verifying SQLi finding
import httpx

def verify_sqli(endpoint, payload, original_response_time):
    # Time-based SQLi verification
    url = f"{endpoint}?id={payload}"
    
    start = time.time()
    response = httpx.get(url, timeout=30)
    elapsed = time.time() - start
    
    if elapsed > original_response_time + 5:
        return {
            "verified": True,
            "method": "GET",
            "url": url,
            "response_time": elapsed,
            "status_code": response.status_code
        }
    return {"verified": False}
```

### Phase 6: Report Generation
Generate comprehensive reports in multiple formats.

```python
# Example: Generating final report
import subprocess
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

subprocess.run([
    "python", "scripts/generate_report.py",
    "--findings-dir", "workspace/findings",
    "--fingerprint", "workspace/fingerprint.json",
    "--output-json", f"workspace/report_{timestamp}.json",
    "--output-html", f"workspace/report_{timestamp}.html",
    "--output-docx", f"workspace/report_{timestamp}.docx"
])
```

## Configuration

### Environment Variables

```bash
# Set custom workspace directory
export VIBE_WORKSPACE="/path/to/workspace"

# Configure crawler timeout (seconds)
export KATANA_TIMEOUT=1200

# Set max concurrent agents
export MAX_AGENTS=6

# Configure LLM provider (for AI agents)
export OPENAI_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here
```

### Testing Principles

**Critical Rules**:
1. All sub-agents MUST actively investigate, not wait for prompts
2. If Katana runs >20 minutes, terminate and collect results
3. Test ALL discovered functionality, not just entry points
4. Attempt 2-3 bypass techniques on failed tests
5. **Iron Law**: May modify/delete own test data; NEVER modify production data

## Common Usage Patterns

### Standard Authorized Testing

```python
"""
Complete penetration test workflow with single account
"""
import os
import subprocess
import json

def run_standard_pentest(target_url, auth_statement, account_info):
    workspace = "workspace"
    os.makedirs(workspace, exist_ok=True)
    
    # Phase 0: Fingerprinting
    print("[Phase 0] Fingerprinting...")
    subprocess.run(["python", "scripts/fingerprint.py", 
                   "--url", target_url,
                   "--output", f"{workspace}/fingerprint.json"])
    
    # Phase 0.5: Backend scanning
    print("[Phase 0.5] Scanning for admin panels...")
    subprocess.run(["python", "scripts/admin_scanner.py",
                   "--url", target_url,
                   "--output", f"{workspace}/admin_entries.json"])
    
    # Phase 1: Confirm authorization
    print(f"[Phase 1] Authorization: {auth_statement}")
    
    # Phase 2: Extract session
    print("[Phase 2] Launch browser for manual login...")
    session_data = extract_session(target_url)
    with open(f"{workspace}/sessions/session.json", "w") as f:
        json.dump(session_data, f)
    
    # Phase 3: Crawl
    print("[Phase 3] Running Katana crawler...")
    run_katana(target_url, session_data)
    
    # Phase 4: Clean data
    print("[Phase 4] Processing crawler data...")
    subprocess.run(["python", "scripts/clean_crawl_data.py",
                   "--input", f"{workspace}/crawled_anonymous.jsonl",
                   "--output", f"{workspace}/targets.txt"])
    
    # Phase 4.5: Map attack surface
    print("[Phase 4.5] Mapping attack surface...")
    subprocess.run(["python", "scripts/map_attack_surface.py",
                   "--targets", f"{workspace}/targets.txt",
                   "--fingerprint", f"{workspace}/fingerprint.json",
                   "--output", f"{workspace}/attack_surface.json"])
    
    # Phase 5: Multi-agent testing
    print("[Phase 5] Launching 6 parallel agents...")
    subprocess.run(["python", "scripts/prepare_agent_findings.py",
                   "--targets", f"{workspace}/targets.txt",
                   "--fingerprint", f"{workspace}/fingerprint.json",
                   "--output", f"{workspace}/findings"])
    
    # Phase 5.5: Attack chain analysis
    print("[Phase 5.5] Analyzing attack chains...")
    subprocess.run(["python", "scripts/analyze_chains.py",
                   "--findings", f"{workspace}/findings",
                   "--output", f"{workspace}/attack_chains.json"])
    
    # Phase 5.6: Verify evidence
    print("[Phase 5.6] Verifying vulnerability evidence...")
    subprocess.run(["python", "scripts/verify_findings.py",
                   "--findings", f"{workspace}/findings"])
    
    # Phase 6: Generate reports
    print("[Phase 6] Generating final reports...")
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    subprocess.run(["python", "scripts/generate_report.py",
                   "--findings-dir", f"{workspace}/findings",
                   "--output-json", f"{workspace}/report_{timestamp}.json",
                   "--output-html", f"{workspace}/report_{timestamp}.html",
                   "--output-docx", f"{workspace}/report_{timestamp}.docx"])
    
    print(f"✓ Reports generated in {workspace}/report_result/")

# Usage
run_standard_pentest(
    target_url="https://example.com",
    auth_statement="Written authorization obtained for full-scope testing",
    account_info={"username": "testuser", "password": "from_env"}
)
```

### Multi-Account Privilege Escalation Testing

```python
"""
Test for privilege escalation and horizontal authorization bypass
using multiple accounts with different permission levels
"""
def run_multiuser_pentest(target_url, accounts):
    workspace = "workspace"
    
    # Extract sessions for all accounts
    sessions = {}
    for role, account in accounts.items():
        print(f"[Phase 2.{role}] Login as {role}...")
        sessions[role] = extract_session(target_url)
        with open(f"{workspace}/sessions/{role}_session.json", "w") as f:
            json.dump(sessions[role], f)
    
    # Crawl with each role
    for role, session in sessions.items():
        print(f"[Phase 3.{role}] Crawling as {role}...")
        run_katana(target_url, session, output_prefix=role)
    
    # Merge crawl results
    subprocess.run(["python", "scripts/merge_crawl_results.py",
                   "--inputs", f"{workspace}/*_crawled.jsonl",
                   "--output", f"{workspace}/targets.txt"])
    
    # Continue with standard workflow...
    # Phase 5 agents will automatically test for IDOR/privilege escalation
    # using the multiple session data

# Usage
run_multiuser_pentest(
    target_url="https://example.com",
    accounts={
        "admin": {"username": "admin", "password": "from_env"},
        "user": {"username": "normaluser", "password": "from_env"}
    }
)
```

### Report-Only Generation

```python
"""
Generate reports from existing findings without re-scanning
Useful when you need to regenerate reports after manual review
"""
def generate_reports_only(workspace="workspace"):
    from datetime import datetime
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Phase 5.5: Re-analyze attack chains
    subprocess.run(["python", "scripts/analyze_chains.py",
                   "--findings", f"{workspace}/findings",
                   "--output", f"{workspace}/attack_chains.json"])
    
    # Phase 5.6: Re-verify findings
    subprocess.run(["python", "scripts/verify_findings.py",
                   "--findings", f"{workspace}/findings"])
    
    # Phase 6: Generate reports
    subprocess.run(["python", "scripts/generate_report.py",
                   "--findings-dir", f"{workspace}/findings",
                   "--fingerprint", f"{workspace}/fingerprint.json",
                   "--output-json", f"{workspace}/report_{timestamp}.json",
                   "--output-html", f"{workspace}/report_{timestamp}.html",
                   "--output-docx", f"{workspace}/report_{timestamp}.docx"])
    
    # Validate report formats
    validate_reports(workspace, timestamp)

def validate_reports(workspace, timestamp):
    import os
    
    json_path = f"{workspace}/report_{timestamp}.json"
    html_path = f"{workspace}/report_{timestamp}.html"
    docx_path = f"{workspace}/report_{timestamp}.docx"
    
    assert os.path.exists(json_path), "JSON report missing"
    assert os.path.exists(html_path), "HTML report missing"
    assert os.path.exists(docx_path), "DOCX report missing"
    
    with open(json_path) as f:
        report_data = json.load(f)
        assert "findings" in report_data
        assert "summary" in report_data
    
    print("✓ All report formats validated")
```

## Troubleshooting

### Katana Crawler Issues

**Problem**: Crawler returns empty results or finishes in <20 seconds

```python
# Solution: Verify session cookies are valid
import json

with open("workspace/sessions/session.json") as f:
    session = json.load(f)
    
# Check cookie expiration
for cookie in session["cookies"]:
    if "expires" in cookie:
        print(f"{cookie['name']}: expires {cookie['expires']}")

# Re-extract session if cookies expired
session = extract_session(target_url)
```

**Problem**: Crawler times out or hangs

```bash
# Solution: Reduce crawl depth and concurrency
python scripts/run_katana.py \
    --url https://example.com \
    --depth 3 \
    --concurrency 5 \
    --timeout 600
```

### Multi-Agent Coordination

**Problem**: Agents not finding vulnerabilities

```python
# Solution: Check that skeleton findings were properly generated
import os

findings_dir = "workspace/findings"
agents = ["auth", "injection", "logic", "upload", "api", "info"]

for agent in agents:
    skeleton_path = f"{findings_dir}/{agent}_findings.json"
    if not os.path.exists(skeleton_path):
        print(f"Missing skeleton for {agent} agent")
        # Regenerate skeletons
        subprocess.run(["python", "scripts/prepare_agent_findings.py",
                       "--targets", "workspace/targets.txt",
                       "--fingerprint", "workspace/fingerprint.json",
                       "--output", findings_dir])
        break
```

**Problem**: Agents marking everything as "Potential" without confirmation

```text
Reminder for AI agents:
- Must attempt actual exploitation, not just theory
- Require HTTP request/response evidence for "Confirmed" status
- Try 2-3 bypass techniques on WAF/validation failures
- Mark as "Potential" only if technical constraints prevent confirmation
```

### Report Generation Failures

**Problem**: Report missing sections or malformed

```python
# Solution: Validate findings structure before report generation
def validate_findings_structure(findings_dir):
    import glob
    
    for finding_file in glob.glob(f"{findings_dir}/*_findings.json"):
        with open(finding_file) as f:
            data = json.load(f)
            
        required_fields = ["agent_name", "findings", "summary"]
        for field in required_fields:
            assert field in data, f"Missing {field} in {finding_file}"
            
        for finding in data["findings"]:
            assert "title" in finding
            assert "severity" in finding
            assert "status" in finding  # Confirmed, Potential, or False Positive
            assert "evidence" in finding
            
    print("✓ All findings files valid")

validate_findings_structure("workspace/findings")
```

### Session Extraction Issues

**Problem**: Browser doesn't launch or session not captured

```python
# Solution: Use explicit browser path and user data directory
from playwright.sync_api import sync_playwright

def extract_session_robust(target_url):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=False,
            executable_path="/usr/bin/chromium",  # Adjust for your system
            args=["--disable-blink-features=AutomationControlled"]
        )
        
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
        )
        
        page = context.new_page()
        page.goto(target_url)
        
        print("Please log in manually. Press Enter when done...")
        input()
        
        # Comprehensive credential extraction
        cookies = context.cookies()
        storage = page.evaluate("() => Object.assign({}, localStorage)")
        session_storage = page.evaluate("() => Object.assign({}, sessionStorage)")
        
        # Capture auth headers from network traffic
        auth_headers = {}
        def handle_response(response):
            if "authorization" in response.request.headers:
                auth_headers["Authorization"] = response.request.headers["authorization"]
        
        page.on("response", handle_response)
        page.reload()
        page.wait_for_load_state("networkidle")
        
        browser.close()
        
        return {
            "cookies": cookies,
            "localStorage": storage,
            "sessionStorage": session_storage,
            "headers": auth_headers
        }
```

## Best Practices

1. **Always use separate workspaces** for different targets to avoid cross-contamination
2. **Verify authorization** documentation before starting any test
3. **Test on staging/UAT** environments when possible, not production
4. **Review findings manually** before delivering reports to clients
5. **Keep vibe-pentest updated** using `git pull` to get latest detection techniques
6. **Use multiple accounts** to thoroughly test authorization controls
7. **Document custom test data** created during testing for cleanup

## Additional Resources

- **3-minute tutorial video**: [bilibili.com/video/BV1RiGX6rESQ/](https://www.bilibili.com/video/BV1RiGX6rESQ/)
- **Sub-agent setup guide**: See `sub_agent.md` in repository
- **Katana documentation**: [github.com/projectdiscovery/katana](https://github.com/projectdiscovery/katana)

Source

Creator's repository · aradotso/security-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk