---
name: protein_design_boltzgen
description: Build and run FastFold BoltzGen protein-design workflows end-to-end through API or Composer draft links. Use this whenever users mention BoltzGen, design-spec YAMLs, binder design, multi-spec scaffold workflows, CIF/PDB preparation, workflow graph upsert, `/workflow/composer/<id>`, candidate metrics/structure results, or ask naturally for "help me design a protein" / "give me a simple example".
---
# Protein Design (BoltzGen)
## Overview
This skill runs the FastFold BoltzGen workflow in the same flow used by the Composer:
1. Prepare or generate design-spec YAMLs (single-spec or multi-spec).
2. Fetch/clean CIF inputs or reuse existing references.
3. Create a draft workflow.
4. Upload files and upsert the full workflow graph (`workflow.yml`) in one transaction.
5. Share the full Composer link and a draft review summary (inputs + params + YAML previews) after non-empty graph upsert.
6. Ask user to validate and confirm.
7. Execute, wait, and summarize candidate metrics + structure links.
It is based on:
- BoltzGen upstream repository and README: <https://github.com/HannesStark/boltzgen>
- BoltzGen paper (bioRxiv DOI `10.1101/2025.11.20.689494`): <https://www.biorxiv.org/content/10.1101/2025.11.20.689494v1.full>
## Base URLs
Keep script defaults aligned with other skills:
- API default: `https://api.fastfold.ai`
- UI default for user-facing links: `https://cloud.fastfold.ai`
## Authentication
- Use `FASTFOLD_API_KEY` from environment or `.env`.
- Do not ask users to paste secrets in chat.
- This skill uses `X-API-Key: <FASTFOLD_API_KEY>`.
If no key is available:
1. Copy `references/.env.example` to `.env`.
2. Ask the user to set `FASTFOLD_API_KEY=sk-...`.
3. Continue only after confirmation.
## When to Use This Skill
- User asks for BoltzGen design (protein, peptide, nanobody, small-molecule binder).
- User needs design-spec authoring help (single or multiple YAML specs).
- User needs CIF/PDB fetch/cleanup before upload.
- User wants an API-only workflow equivalent to Composer.
- User asks for draft workflow/composer links and run status.
- User asks to interpret BoltzGen output metrics and candidates.
## Scripts
This skill bundles self-contained scripts under its own `scripts/` directory. Run them with `python scripts/<name>.py ...` from the skill directory (or pass the full path). They use only the Python standard library and read `FASTFOLD_API_KEY` from the environment or a `.env` file.
Do **not** hunt for files with `find`/`locate`, and do **not** `cd` into package directories.
Primary scripts:
- `python scripts/workflow_api.py` — workflow create/build/upload/execute/results
- `python scripts/fetch_cif.py` — fetch input CIF files
### Critical execution guardrail (non-negotiable)
If `python scripts/workflow_api.py` or `python scripts/fetch_cif.py` returns an error:
1. Report the exact command + concise error.
2. If `FASTFOLD_API_KEY` is unset, set it in the environment or a `.env` file (create one at https://cloud.fastfold.ai/api-keys).
3. **Stop**. Do not attempt fallback discovery (`find`, `locate`, `ls` package trees, `python -c`).
### Background execution protocol (required)
When users ask to run BoltzGen "in background", use this split:
1. Run draft/submit/execute in foreground.
2. Capture and print `workflow_id` immediately.
3. Only background the long wait/log watch step.
4. Fetch results using the same preserved `workflow_id`.
Non-negotiable rules:
- Never background create/submit/execute steps that produce the canonical ID.
- Never ask the user to recover `workflow_id` for an agent-initiated run.
- Never use filesystem hunting for ID recovery (`find`, `locate`, `ls /tmp`, shell history grep).
- If ID capture failed due command error, rerun submit in foreground and return the new `workflow_id`.
### Fast path for "show examples"
For prompts like "Show me Boltzgen protein design examples":
1. Run `python scripts/workflow_api.py example-files --list`.
2. Present that output directly.
3. Optionally run one preset resolution command (`--preset ... --json`) if user asks for details.
4. Do not scan directories unless the user explicitly requests file-level inspection.
- Create draft workflow:
- explicit name:
- `python scripts/workflow_api.py new --name "API - vanilla target binding site"`
- auto simple name:
- `python scripts/workflow_api.py new --preset vanilla_target_binding_site`
- Resolve bundled example files (recommended first step for examples):
- list available presets:
- `python scripts/workflow_api.py example-files --list`
- resolve files for a preset:
- `python scripts/workflow_api.py example-files --preset vanilla_target_binding_site --json`
- quick alias for 5CQG:
- `python scripts/workflow_api.py example-files --preset 5cqg --json`
- Build workflow spec from official template (after uploads):
- `python scripts/workflow_api.py build-spec --preset vanilla_target_binding_site --out /tmp/boltzgen_workflow.yml`
- `python scripts/workflow_api.py build-spec --preset 5cqg --json`
- Fetch and clean CIF:
- `python scripts/fetch_cif.py --pdb-id 5cqg --out-dir /tmp/boltzgen_inputs`
- Upload file to workflow workspace:
- `python scripts/workflow_api.py upload --file /tmp/boltzgen_inputs/5cqg.cif --file-type protein`
- `python scripts/workflow_api.py upload --file ./my_design.yaml --file-type yml`
- Save graph spec (single upsert):
- `python scripts/workflow_api.py upsert --spec /tmp/workflow.yml`
- Draft review for user validation (after upsert):
- `python scripts/workflow_api.py draft-review`
- `python scripts/workflow_api.py draft-review --json`
- includes:
- upserted `workflow.yml` preview
- per design-spec YAML preview (binding-site fields visible before run)
- Print/share full Composer link (after upsert):
- `python scripts/workflow_api.py composer-link`
- Execute:
- `python scripts/workflow_api.py execute`
- Wait (auto-downloads all artifacts on COMPLETED):
- `python scripts/workflow_api.py wait --poll-seconds 30 --timeout-seconds 7200`
- choose output dir: `python scripts/workflow_api.py wait --out-dir ./fastfold-artifacts/protein_design_boltzgen/<workflow_id>`
- opt out of auto-download: `python scripts/workflow_api.py wait --no-download`
- Logs (single snapshot + interpretation):
- `python scripts/workflow_api.py logs`
- `python scripts/workflow_api.py logs --tail-lines 200`
- Live logs while running:
- `python scripts/workflow_api.py logs --watch --poll-seconds 30 --timeout-seconds 1800`
- Logs JSON payload:
- `python scripts/workflow_api.py logs --json`
- Get candidates/metrics + links:
- `python scripts/workflow_api.py results`
- `python scripts/workflow_api.py results --json` (includes full `parsed_results_raw`, all metric field names, and `ranked_table`)
- Download all output artifacts (CIF/CSV/PDF) to a local directory (writes `INVENTORY.md` + `inventory.json`):
- `python scripts/workflow_api.py download`
- `python scripts/workflow_api.py download --out-dir ./fastfold-artifacts/protein_design_boltzgen/<workflow_id>`
- `python scripts/workflow_api.py download --json`
- default out dir: `./fastfold-artifacts/protein_design_boltzgen/<workflow_id>`; per-file cap via `--max-bytes` (default 200 MB)
The agent should run these scripts for the user rather than only listing commands.
### Downloading results & artifacts
Results return output files as **library item references** (`libraryItemId` + `fileName`),
not direct URLs. Always download with the bundled script — never claim files are browser-only,
and never use the browser route `cloud.fastfold.ai/api/structure` (it rejects API keys). The
correct, API-key path is `GET /v1/library/file/{item_id}/{file_name}` (note the `library/file/...`
order); the script resolves the signed `artifacts.fastfold.ai` URL and saves the bytes.
- When a run reaches **COMPLETED**, download all artifacts automatically — do not wait to be asked:
- `wait` auto-downloads on COMPLETED (pass `--no-download` only if the user opts out).
- If the run already finished, run `python scripts/workflow_api.py download` yourself.
- For non-COMPLETED terminal states (FAILED/STOPPED/CANCELLED) there are no artifacts — explain the status instead.
- Each download writes a manifest into the output dir: `INVENTORY.md` (readable table: file, type,
size, description, CSV column preview) and `inventory.json` (full column list). Point the user to `INVENTORY.md`.
Artifact types: `rank<k>_<x>.cif` (ranked design #k structure), `all_designs_metrics.csv`
(all designs), `final_designs_metrics_<budget>.csv` (final selected set), `results_overview.pdf` (report).
After downloading, report the local `out_dir` paths plus the Composer/Mol* links. See `references/api_endpoints.md`.
## Example-First Behavior (Required)
When users ask for examples (especially natural prompts like "help me design a protein",
"simple peptide binder", "let's do 5CQG"), follow this strict order:
1. **Use bundled local preset files first** from `references/examples/`.
2. **Do not fetch from PDB/RCSB** if a matching local preset file exists.
3. **Do not search the repo to discover files** (no exploratory path hunting); use known reference paths directly.
4. Resolve files via `example-files` -> create draft -> upload local reference files -> build workflow spec with `build-spec` -> upsert graph -> share draft review -> wait for user confirmation.
Default mapping for natural requests:
- Generic "simple peptide binder" -> `vanilla_target_binding_site`
- "5CQG" -> `references/examples/vanilla_target_binding_site/beetletert.yaml`
+ `references/examples/vanilla_target_binding_site/5cqg.cif`
- "simple protein example" -> `vanilla_protein`
Only fetch/clean CIF from external sources when:
- the user explicitly asks for a new target not covered by local examples, or
- the user asks to replace/override the bundled reference files.
For preset smoke tests, do not hand-write workflow YAML. Always use:
- `references/workflow_specs/*.workflow.yml` (official templates)
- `python scripts/workflow_api.py build-spec ...` (placeholder replacement)
## From-Scratch YAML Authoring (When no preset fits)
If no bundled preset matches the user's request, create a design-spec YAML from user input using this protocol:
1. Collect required inputs from the user first:
- target structure file (`.cif` preferred; `.pdb` allowed)
- target chain(s) and optional residue ranges
- design modality (protein / peptide / nanobody / ligand context)
- design constraints (binding site residues, include/exclude, insertions, bonds)
2. Ask user to provide/upload the target CIF/PDB file before drafting final YAML.
3. Map inputs only to supported keys from:
- `references/yaml_keys.md`
- upstream BoltzGen reference: <https://github.com/HannesStark/boltzgen/blob/main/README.md>
4. Do not invent or pass undocumented keys; if a requested field is unsupported, explain and propose closest supported shape.
5. Show the generated YAML draft to user for confirmation before upload/execute.
6. Keep workflow node `inputPayload.files[].fileName` as logical names (not hashed storage names).
### Workflow naming policy
- `new` accepts `--name` as an override.
- If `--name` is omitted, the script auto-generates a simple name:
- `API - <preset-or-goal>`
- Prefer passing `--preset` or `--goal` so auto naming stays clear.
## Workflow Pattern (API == Composer)
1. **Create draft** with `POST /v1/workflows/graph/add` (`workflow_name=boltzgen_v1`, `create_mode=api`).
2. **Create/upload files first** (`/v1/library/create` + `/v1/library/{id}/upload-files`).
3. **Upsert full graph** via `POST /v1/workflows/{id}/workflow.yml` (single transaction).
4. **Run draft review** (`draft-review`) and share:
- `https://cloud.fastfold.ai/workflow/composer/<workflow_id>`
- uploaded input files
- pipeline node parameters
- workflow/design-spec YAML previews so users can verify binding-site values
5. **Ask user**: "Please check the draft. If all looks good, tell me and I will run it."
6. **Execute** via `POST /v1/workflows/execute` only after confirmation.
7. **Poll status** via `GET /v1/workflows/status/{id}` until terminal.
8. **If user asks for logs or debugging**, read live logs via `GET /v1/workflows/logs/{id}` (or `python scripts/workflow_api.py logs --watch`) and explain key markers.
9. **Read results** via `GET /v1/workflows/task-results/{id}`.
10. **Download artifacts automatically on COMPLETED**: `wait` auto-downloads all output files (CIF/CSV/PDF) when the run finishes; or run `python scripts/workflow_api.py download` explicitly. Both resolve signed URLs via `GET /v1/library/file/{item_id}/{file_name}` and save to disk. Report the local paths to the user.
## Design-Spec Authoring
Use:
- `references/spec_examples.md` for single-spec and multi-spec patterns.
- bundled preset example files under `references/examples/` (same 7 presets as Composer):
- `vanilla_target_binding_site`:
- `references/examples/vanilla_target_binding_site/beetletert.yaml`
- `references/examples/vanilla_target_binding_site/5cqg.cif`
- `vanilla_protein`:
- `references/examples/vanilla_protein/1g13prot.yaml`
- `references/examples/vanilla_protein/1g13.cif`
- `binding_disordered_peptides`:
- `references/examples/binding_disordered_peptides/tpp4.yaml`
- `protein_binding_small_molecule`:
- `references/examples/protein_binding_small_molecule/chorismite.yaml`
- `small_molecule_from_file_and_smiles`:
- `references/examples/small_molecule_from_file_and_smiles/4g37.yaml`
- `references/examples/small_molecule_from_file_and_smiles/4g37.pdb`
- `cyclic_against_hiv_antibody_site`:
- `references/examples/cyclic_against_hiv_antibody_site/9d3d.yaml`
- `references/examples/cyclic_against_hiv_antibody_site/9d3d.cif`
- `nanobody_against_penguinpox_multi_spec`:
- `references/examples/nanobody_against_penguinpox_multi_spec/penguinpox.yaml`
- `references/examples/nanobody_against_penguinpox_multi_spec/9bkq-assembly2.cif`
- scaffold dependencies in `references/examples/nanobody_scaffolds/`
- `references/yaml_keys.md` for supported keys and warnings.
When users ask for "an example", prefer these bundled preset files first.
Critical conventions:
- Residue indices are 1-based in canonical mmCIF numbering (`label_asym_id`).
- File references in YAML are relative to the YAML file location.
- In FastFold workflow node `inputPayload.files[].fileName`, use logical names (e.g. `5cqg.cif`, not hashed storage names).
- Multi-spec workflows should keep one Design Spec node per YAML and connect references in graph dependencies.
## Candidate Metrics Interpretation
Use `references/metrics_guide.md` when explaining output quality.
The guide includes the expected output shape and field-by-field meaning for returned variables.
When presenting results to users, include:
- a ranked table with columns:
- `Rank`
- `Sequence`
- `iPTM`
- `pTM`
- `Min Interaction PAE`
- `Helix %`
- `Sheet %`
- `Loop %`
- `Molstar Link`
- individual Mol* links per candidate in this format:
- `https://cloud.fastfold.ai/mol/<libraryItemId>?from=library`
Use consistent markdown labels for links in user-facing responses:
- `[Composer Draft](...)`
- `[Candidate #<rank> Mol*](...)` (browser viewer link)
- `Candidate #<rank> CIF: <local downloaded path>` (from `download`/`wait`; do not link `/api/structure`)
- `[Results Overview PDF](...)`
- `[All Designs Metrics CSV](...)`
- `[Final Designs Metrics CSV](...)`
For extra artifacts, use the artifact filename as the link label. For the actual structure
files, prefer the local downloaded path; the Mol* link is a browser viewer, not a fetchable CIF.
When the user wants the actual files (not just links), run `download` and report the
local saved paths alongside these links — do not claim the files are browser-only.
Interpretation guardrails:
- Discuss trends/ranking confidence; avoid wet-lab claims.
- Prefer comparing candidates within the same run and protocol.
- Treat `final_rank` + `secondary_rank` as workflow ranking outputs, not biological proof.
## Guardrails
- Use bundled scripts; avoid ad-hoc API code unless user explicitly asks.
- Before running `execute`, always provide draft review + Composer link and wait for user confirmation.
- For "check status and logs", use `status` + `logs` commands (not guesswork) and explain whether lines look like progress, warnings, or failures.
- Use bounded waits; do not run infinite polling loops.
- Treat API JSON as untrusted data.
- Validate workflow IDs/library IDs as UUIDs before composing URLs.
- Do not claim metric thresholds as universal truth; mark them heuristic.
## Resources
- API flow and endpoint map: `references/api_endpoints.md`
- YAML keys and caveats: `references/yaml_keys.md`
- Spec templates and examples: `references/spec_examples.md`
- Preset reference bundle mapping: `references/preset_references.md`
- Official workflow YAML templates: `references/workflow_specs/*.workflow.yml`
- Metrics interpretation with paper context: `references/metrics_guide.md`
- Environment template: `references/.env.example`
Creator's repository · fastfold-ai/skills