pydantic-ai-harness

Extend Pydantic AI agents with batteries-included capabilities from pydantic-ai-harness — currently Code Mode, which collapses many tool calls into one sandboxed Python execution. Use when the user mentions pydantic-ai-harness, CodeMode, Monty, code mode, or tool sandboxing, when they want an agent to run agent-written Python, or when a Pydantic AI agent would benefit from orchestrating multiple tool calls in a single sandboxed script.

Skill file

Preview skill file
---
name: pydantic-ai-harness
description: Extend Pydantic AI agents with batteries-included capabilities from pydantic-ai-harness — currently Code Mode, which collapses many tool calls into one sandboxed Python execution. Use when the user mentions pydantic-ai-harness, CodeMode, Monty, code mode, or tool sandboxing, when they want an agent to run agent-written Python, or when a Pydantic AI agent would benefit from orchestrating multiple tool calls in a single sandboxed script.
license: MIT
compatibility: Requires Python 3.10+ and pydantic-ai-slim>=1.95.1
metadata:
  version: "0.1.0"
  author: pydantic
---

# Building with Pydantic AI Harness

Pydantic AI Harness is the official capability library for Pydantic AI. Capabilities that need model or
framework support — and those fundamental to every agent — live in core `pydantic-ai`; optional,
batteries-included capabilities live here. Both are composed onto an agent through the same
`capabilities=[...]` API.

This skill covers the capabilities shipped by `pydantic-ai-harness`. For the core framework — agents,
tools, structured output, hooks, and testing — use the `building-pydantic-ai-agents` skill instead.

## When to Use This Skill

Invoke this skill when:
- The user mentions `pydantic-ai-harness`, `CodeMode`, code mode, or the Monty sandbox
- An agent makes many sequential tool calls that could collapse into one sandboxed Python execution
- The user wants the model to write Python that loops, branches, aggregates, or parallelizes tool calls with `asyncio.gather`
- The user asks to sandbox or constrain the code an agent runs

Do **not** use this skill for:
- Core Pydantic AI usage — building agents, adding tools, structured output, streaming, or testing (use `building-pydantic-ai-agents`)
- Capabilities that ship in core `pydantic-ai`, such as web search, tool search, and thinking
- The Pydantic validation library on its own (`pydantic`/`BaseModel` without agents)

## Supported Capabilities

| Capability | Description | Reference |
|---|---|---|
| `CodeMode` | Wraps eligible tools into a single sandboxed `run_code` tool so the model orchestrates them in Python | [Code Mode](./references/CODE-MODE.md) |

More capability areas are tracked in the
[capability matrix](https://github.com/pydantic/pydantic-ai-harness#capability-matrix); as they stabilize,
this skill grows to cover them.

## Install

```bash
uv add pydantic-ai-harness
```

Each capability declares its own extra. Code Mode needs the Monty sandbox:

```bash
uv add "pydantic-ai-harness[codemode]"   # `code-mode` is also accepted as an alias
```

Requires Python 3.10+ and `pydantic-ai-slim>=1.95.1`.

## Quick Start

A harness capability is added to the agent like any other. Here `CodeMode` wraps an MCP server's tools into
a single `run_code` tool that the model drives with Python.

```python {test="skip"}
from pydantic_ai import Agent
from pydantic_ai.capabilities import MCP  # MCP ships in core pydantic-ai

from pydantic_ai_harness import CodeMode

agent = Agent(
    'anthropic:claude-sonnet-4-6',
    capabilities=[
        # native=False routes the MCP tools through a local toolset so CodeMode can wrap them.
        # Without it, providers with native MCP run the tools server-side and bypass the sandbox.
        MCP('https://hn.caseyjhand.com/mcp', native=False),
        CodeMode(),
    ],
)

result = agent.run_sync(
    'Across the top and best Hacker News feeds, find the most-discussed story with at '
    'least 100 points and summarize its comment thread in one paragraph.'
)
print(result.output)
#> The most-discussed story clearing 100 points is ...
```

Instead of one model round-trip per tool call, the model writes a single Python script that fetches both
feeds with `asyncio.gather`, dedupes and ranks them in plain Python, and pulls the winning thread —
collapsing many calls into one `run_code`.

## Key Practices

- **Confirm a harness capability is actually needed.** If core Pydantic AI tools and capabilities are enough, use the `building-pydantic-ai-agents` skill instead — don't reach for the harness by default.
- **Read the reference before writing code.** Each capability has its own configuration, constraints, and gotchas — load the linked reference (e.g. [Code Mode](./references/CODE-MODE.md)) first.
- **Install the capability's extra.** Importing `CodeMode` without `pydantic-ai-harness[codemode]` raises an `ImportError`; the Monty sandbox is an optional dependency.

## Common Gotchas

- **`native=True` tools bypass `CodeMode`.** Provider-native MCP servers and web search execute server-side, so `run_code` never sees them. Construct them with `native=False` to keep them local and wrappable.
- **The Monty sandbox is a Python subset.** No class definitions, no third-party imports, and only a small stdlib allowlist — read [Code Mode](./references/CODE-MODE.md#sandbox-restrictions) before debugging generated code that fails to run.
- **`CodeMode` needs its extra.** Install `pydantic-ai-harness[codemode]`, not the bare package.

Source

Creator's repository · pydantic/skills

View on GitHub

License: MIT

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk