esaa-security-audit

Execute deterministic, event-sourced security audits using ESAA-Security's LLM-based agent architecture with 95 checks across 16 security domains
Skill file

Preview skill file↓↑
---
name: esaa-security-audit
description: Execute deterministic, event-sourced security audits using ESAA-Security's LLM-based agent architecture with 95 checks across 16 security domains
triggers:
  - run a security audit using ESAA
  - audit this codebase with ESAA-Security
  - execute security checks with event sourcing
  - generate deterministic security report
  - verify security findings with ESAA
  - audit AI-generated code for vulnerabilities
  - run PARCER security playbooks
  - trace security audit with event store
---

# ESAA-Security Audit Skill

> Skill by [ara.so](https://ara.so) — Security Skills collection.

## Overview

ESAA-Security applies the Event Sourcing for Autonomous Agents (ESAA) architecture to automated security auditing. It executes structured security audits across **16 security domains** with **95 executable checks**, governed by an immutable append-only event log. Every finding, classification, and remediation decision is recorded as a verifiable fact.

**Key differentiators:**
- **Deterministic audits** — same repository state produces same findings via event replay
- **Hallucination prevention** — schema-validated outputs with evidence requirements
- **Complete audit trail** — `.roadmap/activity.jsonl` records every check execution
- **Governed agents** — PARCER contracts enforce decision hygiene and token budgets
- **Verifiable reports** — SHA-256 hash verification from events to final output

## Installation

```bash
# Clone the repository
git clone https://github.com/elzobrito/ESAA-Security.git
cd ESAA-Security

# Install Python dependencies
pip install -r requirements.txt

# Set up environment variables
export OPENAI_API_KEY=$YOUR_OPENAI_KEY
export ANTHROPIC_API_KEY=$YOUR_ANTHROPIC_KEY  # if using Claude
export AUDIT_TARGET_REPO="/path/to/repo"
```

**Requirements:**
- Python 3.9+
- LLM API access (OpenAI GPT-4, Anthropic Claude, or compatible)
- Target repository must be readable by the audit agent

## Repository Structure

```
.roadmap/                              # Event sourcing core
├── activity.jsonl                     # Immutable event store
├── roadmap.json                       # Derived audit progress
├── issues.json                        # Structured findings
├── AGENT_CONTRACT.yaml                # Agent boundaries
├── ORCHESTRATOR_CONTRACT.yaml         # State mutation rules
└── PROJECTION_SPEC.md                 # Event → state mapping

playbooks/
├── playbooks.security.json            # 95 security checks
└── global_input_contract.json         # Input requirements

reports/
├── phase1/                            # Reconnaissance
├── phase2/                            # Domain audits
├── phase3/                            # Risk classification
├── phase4/                            # Recommendations
└── final/                             # Compiled report
```

## Core Concepts

### Event Store (`activity.jsonl`)

Every audit action is an immutable event:

```json
{
  "event_id": "evt_001",
  "timestamp": "2026-05-14T10:30:00Z",
  "event_type": "task.started",
  "task_id": "SEC-010",
  "phase": "phase2",
  "domain": "authentication",
  "agent": "agent-impl"
}
```

```json
{
  "event_id": "evt_002",
  "timestamp": "2026-05-14T10:32:15Z",
  "event_type": "check.completed",
  "task_id": "SEC-010",
  "check_id": "AU-002",
  "status": "fail",
  "severity": "high",
  "finding": "Password stored without bcrypt/argon2",
  "evidence": {
    "file": "auth/user.py",
    "line": 45,
    "code_snippet": "user.password = request.form['password']"
  },
  "hash": "a3f8b2..."
}
```

### Read Model (`roadmap.json`)

Projection of audit progress (derived from events):

```json
{
  "version": "0.4.0",
  "phases": {
    "phase1": {
      "status": "done",
      "tasks": {
        "SEC-001": {"status": "done", "output": "reports/phase1/tech-stack.md"}
      }
    },
    "phase2": {
      "status": "in_progress",
      "domains": {
        "authentication": {
          "checks_passed": 5,
          "checks_failed": 3,
          "tasks": ["SEC-010", "SEC-011"]
        }
      }
    }
  }
}
```

## Running an Audit

### Phase 1: Reconnaissance

```python
# orchestrator.py
from esaa_security import Orchestrator, Agent

# Initialize orchestrator
orchestrator = Orchestrator(
    event_store=".roadmap/activity.jsonl",
    roadmap_path=".roadmap/roadmap.json",
    target_repo=os.getenv("AUDIT_TARGET_REPO")
)

# Initialize reconnaissance agent
agent_spec = Agent(
    role="agent-spec",
    contract_path=".roadmap/AGENT_CONTRACT.yaml",
    parcer_profile="PARCER_PROFILE.agent-spec.yaml"
)

# Execute reconnaissance phase
recon_tasks = ["SEC-001", "SEC-002", "SEC-003", "SEC-004"]
for task_id in recon_tasks:
    result = agent_spec.execute_task(task_id, orchestrator.get_context())
    orchestrator.validate_and_append(task_id, result)
```

**Task outputs:**
- `SEC-001`: Tech stack inventory (languages, frameworks, dependencies)
- `SEC-002`: Architecture map (components, trust boundaries)
- `SEC-003`: Data flow diagram (inputs, storage, outputs)
- `SEC-004`: Attack surface enumeration (endpoints, file uploads, APIs)

### Phase 2: Domain Audit Execution

```python
# Load security playbooks
with open("playbooks/playbooks.security.json") as f:
    playbooks = json.load(f)

# Initialize audit execution agent
agent_impl = Agent(
    role="agent-impl",
    contract_path=".roadmap/AGENT_CONTRACT.yaml",
    parcer_profile="PARCER_PROFILE.agent-impl.yaml"
)

# Execute checks for a domain (e.g., Authentication)
auth_checks = ["AU-001", "AU-002", "AU-003", "AU-004", "AU-005", "AU-006", "AU-007", "AU-008"]

for check_id in auth_checks:
    playbook = playbooks["checks"][check_id]
    
    result = agent_impl.execute_check(
        check_id=check_id,
        playbook=playbook,
        context=orchestrator.get_context()
    )
    
    # Orchestrator validates against schema
    orchestrator.validate_and_append(
        task_id=f"SEC-{check_id}",
        result=result
    )
```

**Example check result:**

```python
# agent_impl output for AU-002 (Password Storage)
{
    "check_id": "AU-002",
    "status": "fail",
    "severity": "critical",
    "title": "Weak Password Hashing",
    "description": "Passwords stored using SHA-256 instead of bcrypt/argon2",
    "evidence": {
        "files": ["auth/models.py"],
        "lines": [67],
        "code": "hashlib.sha256(password.encode()).hexdigest()"
    },
    "cwe": "CWE-916",
    "owasp": "A02:2021 Cryptographic Failures",
    "recommendation": "Replace SHA-256 with bcrypt (cost factor 12+)",
    "references": [
        "https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html"
    ]
}
```

### Phase 3: Risk Classification

```python
# Initialize QA/risk agent
agent_qa = Agent(
    role="agent-qa",
    contract_path=".roadmap/AGENT_CONTRACT.yaml",
    parcer_profile="PARCER_PROFILE.agent-qa.yaml"
)

# Classify vulnerabilities
findings = orchestrator.get_all_findings()
risk_matrix = agent_qa.classify_risks(findings)

# Write risk classification
with open("reports/phase3/risk-matrix.json", "w") as f:
    json.dump(risk_matrix, f, indent=2)
```

**Risk matrix output:**

```json
{
  "critical": [
    {"id": "AU-002", "cvss": 9.1, "exploitability": "high"}
  ],
  "high": [
    {"id": "IV-003", "cvss": 7.5, "exploitability": "medium"}
  ],
  "medium": [
    {"id": "SH-001", "cvss": 5.3, "exploitability": "low"}
  ],
  "remediation_priority": ["AU-002", "AZ-001", "IV-003"]
}
```

### Phase 4: Report Generation

```python
# Generate final report
report = agent_qa.generate_report(
    findings=orchestrator.get_all_findings(),
    risk_matrix=risk_matrix,
    context=orchestrator.get_context()
)

# Write final outputs
with open("reports/final/security-audit-report.md", "w") as f:
    f.write(report["markdown"])

with open("reports/final/security-audit-report.json", "w") as f:
    json.dump(report["structured"], f, indent=2)
```

## Event Replay and Verification

```python
# Verify audit determinism
from esaa_security import EventReplay, HashVerifier

# Replay events from scratch
replayer = EventReplay(event_store=".roadmap/activity.jsonl")
replayed_roadmap = replayer.project_roadmap()

# Compare hash
original_hash = HashVerifier.compute_hash(".roadmap/roadmap.json")
replayed_hash = HashVerifier.compute_hash(replayed_roadmap)

assert original_hash == replayed_hash, "Non-deterministic projection detected"
```

## Security Domain Coverage

### Critical Domains (8 total)

**Secrets & Configuration (SC-001 to SC-008):**
```python
# Example: Check for hardcoded secrets
playbook = {
    "check_id": "SC-001",
    "title": "Hardcoded Secrets Detection",
    "patterns": [
        r'password\s*=\s*["\'][^"\']+["\']',
        r'api_key\s*=\s*["\'][^"\']+["\']',
        r'AWS_SECRET_ACCESS_KEY'
    ],
    "severity": "critical"
}
```

**Authentication (AU-001 to AU-008):**
- Password hashing strength
- MFA enforcement
- Session token generation
- Credential transmission (HTTPS)

**Authorization (AZ-001 to AZ-006):**
- RBAC implementation
- Privilege escalation checks
- IDOR vulnerabilities
- API authorization

**Input Validation (IV-001 to IV-007):**
- SQL injection (ORM usage, parameterized queries)
- XSS (output encoding)
- Command injection
- Path traversal

**Data Security (DA-001 to DA-005):**
- Encryption at rest
- PII handling
- Data retention policies

### High Priority Domains (7 total)

**AI/LLM Security (AI-001 to AI-005):**
```python
# Example: Check for prompt injection vulnerabilities
playbook = {
    "check_id": "AI-001",
    "title": "Prompt Injection Defense",
    "checks": [
        "user_input_sanitization",
        "system_prompt_isolation",
        "output_validation",
        "context_length_limits"
    ],
    "severity": "high"
}
```

## Configuration

### Agent Contract (`.roadmap/AGENT_CONTRACT.yaml`)

```yaml
agent_impl:
  can:
    - read: ["**/*.py", "**/*.js", "**/*.java", "config/**"]
    - write: ["reports/phase2/**"]
    - execute_checks: true
  cannot:
    - write: [".roadmap/activity.jsonl", ".roadmap/roadmap.json"]
    - modify_state: true
    - append_events: true
  output_schema: "agent_result.schema.json"
  token_budget: 8000
```

### PARCER Profile (Token Budgets)

```yaml
# PARCER_PROFILE.agent-impl.yaml
budget:
  max_tokens: 8000
  per_check: 500
  context_window: 4000
  
fallback:
  strategy: "map_reduce"
  chunk_size: 2000
  
validation:
  require_evidence: true
  require_cwe_mapping: true
  schema: "agent_result.schema.json"
```

## Programmatic Usage

### Custom Audit Pipeline

```python
from esaa_security import AuditPipeline, SecurityDomain

# Define custom domain subset
domains = [
    SecurityDomain.AUTHENTICATION,
    SecurityDomain.AUTHORIZATION,
    SecurityDomain.INPUT_VALIDATION,
    SecurityDomain.AI_LLM_SECURITY
]

# Initialize pipeline
pipeline = AuditPipeline(
    target_repo=os.getenv("AUDIT_TARGET_REPO"),
    domains=domains,
    event_store=".roadmap/activity.jsonl"
)

# Execute with streaming
for event in pipeline.execute_streaming():
    if event["event_type"] == "check.completed":
        print(f"✓ {event['check_id']}: {event['status']}")
    elif event["event_type"] == "finding.detected":
        print(f"⚠ {event['severity']}: {event['title']}")

# Get final report
report = pipeline.get_report()
```

### Query Event Store

```python
from esaa_security import EventQuery

query = EventQuery(".roadmap/activity.jsonl")

# Find all critical findings
critical = query.filter(
    event_type="check.completed",
    status="fail",
    severity="critical"
).to_list()

# Get domain coverage
coverage = query.aggregate_by("domain")
# {"authentication": 8, "authorization": 6, ...}

# Audit timeline
timeline = query.timeline(group_by="1h")
```

## Common Patterns

### Incremental Audit (Skip Completed)

```python
roadmap = orchestrator.load_roadmap()

for task_id in all_tasks:
    if roadmap.get_task_status(task_id) == "done":
        print(f"Skip {task_id} (already completed)")
        continue
    
    result = agent.execute_task(task_id)
    orchestrator.validate_and_append(task_id, result)
```

### Parallel Domain Execution

```python
from concurrent.futures import ThreadPoolExecutor

def audit_domain(domain_name, checks):
    agent = Agent(role="agent-impl")
    results = []
    for check_id in checks:
        result = agent.execute_check(check_id)
        results.append(result)
    return domain_name, results

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = {
        executor.submit(audit_domain, domain, checks): domain
        for domain, checks in domain_map.items()
    }
    
    for future in as_completed(futures):
        domain, results = future.result()
        for result in results:
            orchestrator.validate_and_append(result)
```

### Export to SARIF

```python
from esaa_security import SARIFExporter

exporter = SARIFExporter(event_store=".roadmap/activity.jsonl")
sarif = exporter.to_sarif()

with open("security-audit.sarif", "w") as f:
    json.dump(sarif, f, indent=2)
```

## Troubleshooting

### Issue: Schema Validation Failure

```python
# Error: agent_result failed schema validation
# Fix: Check output structure matches agent_result.schema.json

# Validate manually
from jsonschema import validate
import json

with open(".roadmap/agent_result.schema.json") as f:
    schema = json.load(f)

with open("reports/phase2/results/SEC-010.json") as f:
    result = json.load(f)

validate(instance=result, schema=schema)  # Raises ValidationError with details
```

### Issue: Event Store Corruption

```python
# Verify event store integrity
from esaa_security import EventStoreValidator

validator = EventStoreValidator(".roadmap/activity.jsonl")
errors = validator.validate()

if errors:
    print("Corrupt events:")
    for err in errors:
        print(f"Line {err['line']}: {err['message']}")
else:
    print("✓ Event store valid")
```

### Issue: Non-Deterministic Replay

```python
# Debug: Find which event causes divergence
from esaa_security import ReplayDebugger

debugger = ReplayDebugger(
    event_store=".roadmap/activity.jsonl",
    expected_roadmap=".roadmap/roadmap.json"
)

divergent_event = debugger.find_divergence()
print(f"Divergence at event: {divergent_event['event_id']}")
print(f"Expected: {divergent_event['expected_state']}")
print(f"Actual: {divergent_event['actual_state']}")
```

### Issue: Agent Exceeds Token Budget

```python
# Error: Agent exceeded 8000 token budget
# Fix: Enable Map-Reduce fallback in PARCER profile

# PARCER_PROFILE.agent-impl.yaml
fallback:
  strategy: "map_reduce"
  chunk_size: 2000
  max_chunks: 10
  
# Or reduce context window
budget:
  context_window: 3000  # from 4000
```

### Issue: Missing Evidence in Findings

```python
# Orchestrator rejects findings without evidence
# Fix: Ensure agent output includes code snippets

# Valid finding structure
{
    "check_id": "IV-001",
    "status": "fail",
    "evidence": {
        "file": "api/routes.py",
        "line": 23,
        "code_snippet": "query = f\"SELECT * FROM users WHERE id={user_id}\""
    }
}
```

## Integration with CI/CD

### GitHub Actions

```yaml
# .github/workflows/security-audit.yml
name: ESAA Security Audit

on: [push, pull_request]

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Run ESAA-Security Audit
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          AUDIT_TARGET_REPO: ${{ github.workspace }}
        run: |
          pip install -r requirements.txt
          python orchestrator.py --full-audit
      
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: security-audit.sarif
      
      - name: Fail on Critical Findings
        run: |
          python -c "import json; \
          report = json.load(open('reports/final/security-audit-report.json')); \
          exit(1 if report['critical_count'] > 0 else 0)"
```

## Best Practices

1. **Always verify event store integrity** before generating reports
2. **Use deterministic replay** to validate audit reproducibility
3. **Configure token budgets** per agent role to prevent runaway costs
4. **Enable Map-Reduce fallback** for large repositories (>10k LOC)
5. **Review PARCER profiles** to adjust validation strictness
6. **Export to SARIF** for GitHub Security tab integration
7. **Archive `.roadmap/` directory** for audit forensics

## References

- [ESAA Paper (arXiv:2602.23193)](https://arxiv.org/abs/2602.23193)
- [ESAA-Security Paper (arXiv:2603.06365)](https://arxiv.org/abs/2603.06365)
- [PARCER Paper (arXiv:2603.00856)](https://arxiv.org/abs/2603.00856)
- [PARCER v1.6.0 Security Auditor](docs/PARCER_v1.6.0-security-audit.yaml)
Source

Creator's repository · aradotso/security-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk