AI-powered automated penetration testing tool using multi-agent architecture for web applications, APIs, and admin panels
---
name: vibe-pentest-ai-security-testing
description: AI-powered automated penetration testing tool using multi-agent architecture for web applications, APIs, and admin panels
triggers:
- run automated penetration test on web application
- perform AI-based security testing with vibe-pentest
- execute multi-agent black box security scan
- scan web app for vulnerabilities using AI agents
- conduct automated pentest with business logic testing
- generate security assessment report with vibe-pentest
- test web application security using AI automation
- perform comprehensive web vulnerability scan
---
# vibe-pentest-ai-security-testing
> Skill by [ara.so](https://ara.so) — Security Skills collection.
Vibe Pentest is an AI Agent-based automated penetration testing tool that uses a multi-agent parallel execution architecture to perform comprehensive black-box penetration testing (including business logic vulnerability assessment) on web applications, APIs, and admin backends. It outputs stable and reliable security reports with actionable remediation recommendations.
## Overview
Vibe Pentest orchestrates multiple AI agents to:
- Fingerprint web technologies and frameworks
- Crawl and map attack surfaces
- Execute parallel vulnerability testing across multiple categories
- Identify business logic flaws
- Generate comprehensive security reports in HTML and DOCX formats
**Version**: v1.0.7
**License**: AGPL-3.0
**Primary Language**: Python
## Installation
### Prerequisites
1. **Git** (required for auto-update mechanism):
```bash
# Clone from Gitee (recommended for better access)
git clone https://gitee.com/ok-helloworld/vibe-pentest
cd vibe-pentest
# Or from GitHub
git clone https://github.com/ok-helloworld/vibe-pentest
cd vibe-pentest
```
2. **Python 3.10+** and dependencies:
```bash
pip install playwright python-docx matplotlib requests urllib3 argparse httpx charset-normalizer chardet
playwright install chromium
```
3. **Katana Crawler** (included for Windows, download for other OS):
- Windows version included in `tools/katana`
- For other OS, see `tools/katana_downloads.json` for download links
### Automated Installation
You can also ask your AI coding agent to install everything:
```text
Install vibe-pentest skill including all runtime dependencies from: https://gitee.com/ok-helloworld/vibe-pentest
```
## Project Structure
```
vibe-pentest/
├── scripts/ # Core testing scripts
│ ├── run_katana.py # Crawler wrapper
│ ├── prepare_agent_findings.py # Multi-agent orchestration
│ ├── generate_report.py # Report generation
│ └── ...
├── tools/ # External tools (katana, etc.)
├── workspace/ # Test outputs (created during execution)
│ ├── sessions/ # Browser session data
│ ├── findings/ # Vulnerability findings
│ └── report_result/ # Final reports
└── prompts/ # AI prompt templates
```
## Core Testing Workflow
Vibe Pentest follows a 7-phase workflow:
### Phase 0: Fingerprinting
Identify web technologies, frameworks, and server information.
```python
# Example: Running fingerprint detection
import subprocess
import json
result = subprocess.run(
["python", "scripts/fingerprint.py", "--url", "https://example.com"],
capture_output=True,
text=True
)
fingerprint = json.loads(result.stdout)
print(f"Detected: {fingerprint.get('framework')}, {fingerprint.get('server')}")
```
### Phase 0.5: Backend Entry Scanning
Scan for admin panels and sensitive endpoints.
### Phase 1: Authorization Confirmation
Verify written authorization before proceeding.
### Phase 2: Browser Login & Credential Extraction
Launch browser for manual login, extract session cookies/tokens.
```python
# Example: Browser session extraction
from playwright.sync_api import sync_playwright
def extract_session(target_url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto(target_url)
input("Press Enter after logging in manually...")
# Extract cookies and local storage
cookies = context.cookies()
storage = page.evaluate("() => Object.assign({}, localStorage)")
browser.close()
return {"cookies": cookies, "storage": storage}
```
### Phase 3: Katana Crawling
Use Katana crawler to discover all endpoints and parameters.
```python
# Example: Running Katana crawler via script
import subprocess
import time
# Run crawler (must execute outside sandbox)
proc = subprocess.Popen(
["python", "scripts/run_katana.py",
"--url", "https://example.com",
"--cookies", "session=abc123"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
# Monitor for max 20 minutes
timeout = 1200
start_time = time.time()
while proc.poll() is None and (time.time() - start_time) < timeout:
time.sleep(10)
if proc.poll() is None:
proc.terminate()
time.sleep(5) # Wait for results to flush
# Read crawl results
with open("workspace/crawl_summary.json") as f:
crawl_data = json.load(f)
```
### Phase 4: Data Cleaning
Process crawler output, deduplicate URLs, extract parameters.
### Phase 4.5: Attack Surface Mapping
Map discovered endpoints to vulnerability test categories.
### Phase 5: Multi-Agent Parallel Testing
Distribute testing across 6 specialized agents using prepared skeleton files.
```python
# Example: Preparing agent findings skeleton
import subprocess
import json
# Generate skeleton findings for 6 agents
subprocess.run([
"python", "scripts/prepare_agent_findings.py",
"--targets", "workspace/targets.txt",
"--fingerprint", "workspace/fingerprint.json",
"--output", "workspace/findings"
])
# Each agent gets assigned specific test categories:
# Agent 1: Authentication & Authorization
# Agent 2: Injection Attacks (SQLi, XSS, etc.)
# Agent 3: Business Logic & IDOR
# Agent 4: File Upload & Path Traversal
# Agent 5: API Security & Rate Limiting
# Agent 6: Information Disclosure & Misconfigurations
```
### Phase 5.5: Attack Chain Analysis
Identify cross-agent attack chains and compound vulnerabilities.
### Phase 5.6: Evidence Verification
Re-verify confirmed vulnerabilities with HTTP evidence.
```python
# Example: Verifying SQLi finding
import httpx
def verify_sqli(endpoint, payload, original_response_time):
# Time-based SQLi verification
url = f"{endpoint}?id={payload}"
start = time.time()
response = httpx.get(url, timeout=30)
elapsed = time.time() - start
if elapsed > original_response_time + 5:
return {
"verified": True,
"method": "GET",
"url": url,
"response_time": elapsed,
"status_code": response.status_code
}
return {"verified": False}
```
### Phase 6: Report Generation
Generate comprehensive reports in multiple formats.
```python
# Example: Generating final report
import subprocess
from datetime import datetime
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
subprocess.run([
"python", "scripts/generate_report.py",
"--findings-dir", "workspace/findings",
"--fingerprint", "workspace/fingerprint.json",
"--output-json", f"workspace/report_{timestamp}.json",
"--output-html", f"workspace/report_{timestamp}.html",
"--output-docx", f"workspace/report_{timestamp}.docx"
])
```
## Configuration
### Environment Variables
```bash
# Set custom workspace directory
export VIBE_WORKSPACE="/path/to/workspace"
# Configure crawler timeout (seconds)
export KATANA_TIMEOUT=1200
# Set max concurrent agents
export MAX_AGENTS=6
# Configure LLM provider (for AI agents)
export OPENAI_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here
```
### Testing Principles
**Critical Rules**:
1. All sub-agents MUST actively investigate, not wait for prompts
2. If Katana runs >20 minutes, terminate and collect results
3. Test ALL discovered functionality, not just entry points
4. Attempt 2-3 bypass techniques on failed tests
5. **Iron Law**: May modify/delete own test data; NEVER modify production data
## Common Usage Patterns
### Standard Authorized Testing
```python
"""
Complete penetration test workflow with single account
"""
import os
import subprocess
import json
def run_standard_pentest(target_url, auth_statement, account_info):
workspace = "workspace"
os.makedirs(workspace, exist_ok=True)
# Phase 0: Fingerprinting
print("[Phase 0] Fingerprinting...")
subprocess.run(["python", "scripts/fingerprint.py",
"--url", target_url,
"--output", f"{workspace}/fingerprint.json"])
# Phase 0.5: Backend scanning
print("[Phase 0.5] Scanning for admin panels...")
subprocess.run(["python", "scripts/admin_scanner.py",
"--url", target_url,
"--output", f"{workspace}/admin_entries.json"])
# Phase 1: Confirm authorization
print(f"[Phase 1] Authorization: {auth_statement}")
# Phase 2: Extract session
print("[Phase 2] Launch browser for manual login...")
session_data = extract_session(target_url)
with open(f"{workspace}/sessions/session.json", "w") as f:
json.dump(session_data, f)
# Phase 3: Crawl
print("[Phase 3] Running Katana crawler...")
run_katana(target_url, session_data)
# Phase 4: Clean data
print("[Phase 4] Processing crawler data...")
subprocess.run(["python", "scripts/clean_crawl_data.py",
"--input", f"{workspace}/crawled_anonymous.jsonl",
"--output", f"{workspace}/targets.txt"])
# Phase 4.5: Map attack surface
print("[Phase 4.5] Mapping attack surface...")
subprocess.run(["python", "scripts/map_attack_surface.py",
"--targets", f"{workspace}/targets.txt",
"--fingerprint", f"{workspace}/fingerprint.json",
"--output", f"{workspace}/attack_surface.json"])
# Phase 5: Multi-agent testing
print("[Phase 5] Launching 6 parallel agents...")
subprocess.run(["python", "scripts/prepare_agent_findings.py",
"--targets", f"{workspace}/targets.txt",
"--fingerprint", f"{workspace}/fingerprint.json",
"--output", f"{workspace}/findings"])
# Phase 5.5: Attack chain analysis
print("[Phase 5.5] Analyzing attack chains...")
subprocess.run(["python", "scripts/analyze_chains.py",
"--findings", f"{workspace}/findings",
"--output", f"{workspace}/attack_chains.json"])
# Phase 5.6: Verify evidence
print("[Phase 5.6] Verifying vulnerability evidence...")
subprocess.run(["python", "scripts/verify_findings.py",
"--findings", f"{workspace}/findings"])
# Phase 6: Generate reports
print("[Phase 6] Generating final reports...")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
subprocess.run(["python", "scripts/generate_report.py",
"--findings-dir", f"{workspace}/findings",
"--output-json", f"{workspace}/report_{timestamp}.json",
"--output-html", f"{workspace}/report_{timestamp}.html",
"--output-docx", f"{workspace}/report_{timestamp}.docx"])
print(f"✓ Reports generated in {workspace}/report_result/")
# Usage
run_standard_pentest(
target_url="https://example.com",
auth_statement="Written authorization obtained for full-scope testing",
account_info={"username": "testuser", "password": "from_env"}
)
```
### Multi-Account Privilege Escalation Testing
```python
"""
Test for privilege escalation and horizontal authorization bypass
using multiple accounts with different permission levels
"""
def run_multiuser_pentest(target_url, accounts):
workspace = "workspace"
# Extract sessions for all accounts
sessions = {}
for role, account in accounts.items():
print(f"[Phase 2.{role}] Login as {role}...")
sessions[role] = extract_session(target_url)
with open(f"{workspace}/sessions/{role}_session.json", "w") as f:
json.dump(sessions[role], f)
# Crawl with each role
for role, session in sessions.items():
print(f"[Phase 3.{role}] Crawling as {role}...")
run_katana(target_url, session, output_prefix=role)
# Merge crawl results
subprocess.run(["python", "scripts/merge_crawl_results.py",
"--inputs", f"{workspace}/*_crawled.jsonl",
"--output", f"{workspace}/targets.txt"])
# Continue with standard workflow...
# Phase 5 agents will automatically test for IDOR/privilege escalation
# using the multiple session data
# Usage
run_multiuser_pentest(
target_url="https://example.com",
accounts={
"admin": {"username": "admin", "password": "from_env"},
"user": {"username": "normaluser", "password": "from_env"}
}
)
```
### Report-Only Generation
```python
"""
Generate reports from existing findings without re-scanning
Useful when you need to regenerate reports after manual review
"""
def generate_reports_only(workspace="workspace"):
from datetime import datetime
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Phase 5.5: Re-analyze attack chains
subprocess.run(["python", "scripts/analyze_chains.py",
"--findings", f"{workspace}/findings",
"--output", f"{workspace}/attack_chains.json"])
# Phase 5.6: Re-verify findings
subprocess.run(["python", "scripts/verify_findings.py",
"--findings", f"{workspace}/findings"])
# Phase 6: Generate reports
subprocess.run(["python", "scripts/generate_report.py",
"--findings-dir", f"{workspace}/findings",
"--fingerprint", f"{workspace}/fingerprint.json",
"--output-json", f"{workspace}/report_{timestamp}.json",
"--output-html", f"{workspace}/report_{timestamp}.html",
"--output-docx", f"{workspace}/report_{timestamp}.docx"])
# Validate report formats
validate_reports(workspace, timestamp)
def validate_reports(workspace, timestamp):
import os
json_path = f"{workspace}/report_{timestamp}.json"
html_path = f"{workspace}/report_{timestamp}.html"
docx_path = f"{workspace}/report_{timestamp}.docx"
assert os.path.exists(json_path), "JSON report missing"
assert os.path.exists(html_path), "HTML report missing"
assert os.path.exists(docx_path), "DOCX report missing"
with open(json_path) as f:
report_data = json.load(f)
assert "findings" in report_data
assert "summary" in report_data
print("✓ All report formats validated")
```
## Troubleshooting
### Katana Crawler Issues
**Problem**: Crawler returns empty results or finishes in <20 seconds
```python
# Solution: Verify session cookies are valid
import json
with open("workspace/sessions/session.json") as f:
session = json.load(f)
# Check cookie expiration
for cookie in session["cookies"]:
if "expires" in cookie:
print(f"{cookie['name']}: expires {cookie['expires']}")
# Re-extract session if cookies expired
session = extract_session(target_url)
```
**Problem**: Crawler times out or hangs
```bash
# Solution: Reduce crawl depth and concurrency
python scripts/run_katana.py \
--url https://example.com \
--depth 3 \
--concurrency 5 \
--timeout 600
```
### Multi-Agent Coordination
**Problem**: Agents not finding vulnerabilities
```python
# Solution: Check that skeleton findings were properly generated
import os
findings_dir = "workspace/findings"
agents = ["auth", "injection", "logic", "upload", "api", "info"]
for agent in agents:
skeleton_path = f"{findings_dir}/{agent}_findings.json"
if not os.path.exists(skeleton_path):
print(f"Missing skeleton for {agent} agent")
# Regenerate skeletons
subprocess.run(["python", "scripts/prepare_agent_findings.py",
"--targets", "workspace/targets.txt",
"--fingerprint", "workspace/fingerprint.json",
"--output", findings_dir])
break
```
**Problem**: Agents marking everything as "Potential" without confirmation
```text
Reminder for AI agents:
- Must attempt actual exploitation, not just theory
- Require HTTP request/response evidence for "Confirmed" status
- Try 2-3 bypass techniques on WAF/validation failures
- Mark as "Potential" only if technical constraints prevent confirmation
```
### Report Generation Failures
**Problem**: Report missing sections or malformed
```python
# Solution: Validate findings structure before report generation
def validate_findings_structure(findings_dir):
import glob
for finding_file in glob.glob(f"{findings_dir}/*_findings.json"):
with open(finding_file) as f:
data = json.load(f)
required_fields = ["agent_name", "findings", "summary"]
for field in required_fields:
assert field in data, f"Missing {field} in {finding_file}"
for finding in data["findings"]:
assert "title" in finding
assert "severity" in finding
assert "status" in finding # Confirmed, Potential, or False Positive
assert "evidence" in finding
print("✓ All findings files valid")
validate_findings_structure("workspace/findings")
```
### Session Extraction Issues
**Problem**: Browser doesn't launch or session not captured
```python
# Solution: Use explicit browser path and user data directory
from playwright.sync_api import sync_playwright
def extract_session_robust(target_url):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=False,
executable_path="/usr/bin/chromium", # Adjust for your system
args=["--disable-blink-features=AutomationControlled"]
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
)
page = context.new_page()
page.goto(target_url)
print("Please log in manually. Press Enter when done...")
input()
# Comprehensive credential extraction
cookies = context.cookies()
storage = page.evaluate("() => Object.assign({}, localStorage)")
session_storage = page.evaluate("() => Object.assign({}, sessionStorage)")
# Capture auth headers from network traffic
auth_headers = {}
def handle_response(response):
if "authorization" in response.request.headers:
auth_headers["Authorization"] = response.request.headers["authorization"]
page.on("response", handle_response)
page.reload()
page.wait_for_load_state("networkidle")
browser.close()
return {
"cookies": cookies,
"localStorage": storage,
"sessionStorage": session_storage,
"headers": auth_headers
}
```
## Best Practices
1. **Always use separate workspaces** for different targets to avoid cross-contamination
2. **Verify authorization** documentation before starting any test
3. **Test on staging/UAT** environments when possible, not production
4. **Review findings manually** before delivering reports to clients
5. **Keep vibe-pentest updated** using `git pull` to get latest detection techniques
6. **Use multiple accounts** to thoroughly test authorization controls
7. **Document custom test data** created during testing for cleanup
## Additional Resources
- **3-minute tutorial video**: [bilibili.com/video/BV1RiGX6rESQ/](https://www.bilibili.com/video/BV1RiGX6rESQ/)
- **Sub-agent setup guide**: See `sub_agent.md` in repository
- **Katana documentation**: [github.com/projectdiscovery/katana](https://github.com/projectdiscovery/katana)
Creator's repository · aradotso/security-skills