protein_design_boltzgen

Skill file

Preview skill file
---
name: protein_design_boltzgen
description: Build and run FastFold BoltzGen protein-design workflows end-to-end through API or Composer draft links. Use this whenever users mention BoltzGen, design-spec YAMLs, binder design, multi-spec scaffold workflows, CIF/PDB preparation, workflow graph upsert, `/workflow/composer/<id>`, candidate metrics/structure results, or ask naturally for "help me design a protein" / "give me a simple example".
---

# Protein Design (BoltzGen)

## Overview

This skill runs the FastFold BoltzGen workflow in the same flow used by the Composer:

1. Prepare or generate design-spec YAMLs (single-spec or multi-spec).
2. Fetch/clean CIF inputs or reuse existing references.
3. Create a draft workflow.
4. Upload files and upsert the full workflow graph (`workflow.yml`) in one transaction.
5. Share the full Composer link and a draft review summary (inputs + params + YAML previews) after non-empty graph upsert.
6. Ask user to validate and confirm.
7. Execute, wait, and summarize candidate metrics + structure links.

It is based on:

- BoltzGen upstream repository and README: <https://github.com/HannesStark/boltzgen>
- BoltzGen paper (bioRxiv DOI `10.1101/2025.11.20.689494`): <https://www.biorxiv.org/content/10.1101/2025.11.20.689494v1.full>

## Base URLs

Keep script defaults aligned with other skills:

- API default: `https://api.fastfold.ai`
- UI default for user-facing links: `https://cloud.fastfold.ai`

## Authentication

- Use `FASTFOLD_API_KEY` from environment or `.env`.
- Do not ask users to paste secrets in chat.
- This skill uses `X-API-Key: <FASTFOLD_API_KEY>`.

If no key is available:

1. Copy `references/.env.example` to `.env`.
2. Ask the user to set `FASTFOLD_API_KEY=sk-...`.
3. Continue only after confirmation.

## When to Use This Skill

- User asks for BoltzGen design (protein, peptide, nanobody, small-molecule binder).
- User needs design-spec authoring help (single or multiple YAML specs).
- User needs CIF/PDB fetch/cleanup before upload.
- User wants an API-only workflow equivalent to Composer.
- User asks for draft workflow/composer links and run status.
- User asks to interpret BoltzGen output metrics and candidates.

## Scripts

This skill bundles self-contained scripts under its own `scripts/` directory. Run them with `python scripts/<name>.py ...` from the skill directory (or pass the full path). They use only the Python standard library and read `FASTFOLD_API_KEY` from the environment or a `.env` file.
Do **not** hunt for files with `find`/`locate`, and do **not** `cd` into package directories.

Primary scripts:

- `python scripts/workflow_api.py` — workflow create/build/upload/execute/results
- `python scripts/fetch_cif.py` — fetch input CIF files

### Critical execution guardrail (non-negotiable)

If `python scripts/workflow_api.py` or `python scripts/fetch_cif.py` returns an error:
1. Report the exact command + concise error.
2. If `FASTFOLD_API_KEY` is unset, set it in the environment or a `.env` file (create one at https://cloud.fastfold.ai/api-keys).
3. **Stop**. Do not attempt fallback discovery (`find`, `locate`, `ls` package trees, `python -c`).

### Background execution protocol (required)

When users ask to run BoltzGen "in background", use this split:

1. Run draft/submit/execute in foreground.
2. Capture and print `workflow_id` immediately.
3. Only background the long wait/log watch step.
4. Fetch results using the same preserved `workflow_id`.

Non-negotiable rules:

- Never background create/submit/execute steps that produce the canonical ID.
- Never ask the user to recover `workflow_id` for an agent-initiated run.
- Never use filesystem hunting for ID recovery (`find`, `locate`, `ls /tmp`, shell history grep).
- If ID capture failed due command error, rerun submit in foreground and return the new `workflow_id`.

### Fast path for "show examples"

For prompts like "Show me Boltzgen protein design examples":
1. Run `python scripts/workflow_api.py example-files --list`.
2. Present that output directly.
3. Optionally run one preset resolution command (`--preset ... --json`) if user asks for details.
4. Do not scan directories unless the user explicitly requests file-level inspection.

- Create draft workflow:
  - explicit name:
    - `python scripts/workflow_api.py new --name "API - vanilla target binding site"`
  - auto simple name:
    - `python scripts/workflow_api.py new --preset vanilla_target_binding_site`
- Resolve bundled example files (recommended first step for examples):
  - list available presets:
    - `python scripts/workflow_api.py example-files --list`
  - resolve files for a preset:
    - `python scripts/workflow_api.py example-files --preset vanilla_target_binding_site --json`
  - quick alias for 5CQG:
    - `python scripts/workflow_api.py example-files --preset 5cqg --json`
- Build workflow spec from official template (after uploads):
  - `python scripts/workflow_api.py build-spec --preset vanilla_target_binding_site --out /tmp/boltzgen_workflow.yml`
  - `python scripts/workflow_api.py build-spec --preset 5cqg --json`
- Fetch and clean CIF:
  - `python scripts/fetch_cif.py --pdb-id 5cqg --out-dir /tmp/boltzgen_inputs`
- Upload file to workflow workspace:
  - `python scripts/workflow_api.py upload --file /tmp/boltzgen_inputs/5cqg.cif --file-type protein`
  - `python scripts/workflow_api.py upload --file ./my_design.yaml --file-type yml`
- Save graph spec (single upsert):
  - `python scripts/workflow_api.py upsert --spec /tmp/workflow.yml`
- Draft review for user validation (after upsert):
  - `python scripts/workflow_api.py draft-review`
  - `python scripts/workflow_api.py draft-review --json`
  - includes:
    - upserted `workflow.yml` preview
    - per design-spec YAML preview (binding-site fields visible before run)
- Print/share full Composer link (after upsert):
  - `python scripts/workflow_api.py composer-link`
- Execute:
  - `python scripts/workflow_api.py execute`
- Wait (auto-downloads all artifacts on COMPLETED):
  - `python scripts/workflow_api.py wait --poll-seconds 30 --timeout-seconds 7200`
  - choose output dir: `python scripts/workflow_api.py wait --out-dir ./fastfold-artifacts/protein_design_boltzgen/<workflow_id>`
  - opt out of auto-download: `python scripts/workflow_api.py wait --no-download`
- Logs (single snapshot + interpretation):
  - `python scripts/workflow_api.py logs`
  - `python scripts/workflow_api.py logs --tail-lines 200`
- Live logs while running:
  - `python scripts/workflow_api.py logs --watch --poll-seconds 30 --timeout-seconds 1800`
- Logs JSON payload:
  - `python scripts/workflow_api.py logs --json`
- Get candidates/metrics + links:
  - `python scripts/workflow_api.py results`
  - `python scripts/workflow_api.py results --json` (includes full `parsed_results_raw`, all metric field names, and `ranked_table`)
- Download all output artifacts (CIF/CSV/PDF) to a local directory (writes `INVENTORY.md` + `inventory.json`):
  - `python scripts/workflow_api.py download`
  - `python scripts/workflow_api.py download --out-dir ./fastfold-artifacts/protein_design_boltzgen/<workflow_id>`
  - `python scripts/workflow_api.py download --json`
  - default out dir: `./fastfold-artifacts/protein_design_boltzgen/<workflow_id>`; per-file cap via `--max-bytes` (default 200 MB)

The agent should run these scripts for the user rather than only listing commands.

### Downloading results & artifacts

Results return output files as **library item references** (`libraryItemId` + `fileName`),
not direct URLs. Always download with the bundled script — never claim files are browser-only,
and never use the browser route `cloud.fastfold.ai/api/structure` (it rejects API keys). The
correct, API-key path is `GET /v1/library/file/{item_id}/{file_name}` (note the `library/file/...`
order); the script resolves the signed `artifacts.fastfold.ai` URL and saves the bytes.

- When a run reaches **COMPLETED**, download all artifacts automatically — do not wait to be asked:
  - `wait` auto-downloads on COMPLETED (pass `--no-download` only if the user opts out).
  - If the run already finished, run `python scripts/workflow_api.py download` yourself.
- For non-COMPLETED terminal states (FAILED/STOPPED/CANCELLED) there are no artifacts — explain the status instead.
- Each download writes a manifest into the output dir: `INVENTORY.md` (readable table: file, type,
  size, description, CSV column preview) and `inventory.json` (full column list). Point the user to `INVENTORY.md`.

Artifact types: `rank<k>_<x>.cif` (ranked design #k structure), `all_designs_metrics.csv`
(all designs), `final_designs_metrics_<budget>.csv` (final selected set), `results_overview.pdf` (report).

After downloading, report the local `out_dir` paths plus the Composer/Mol* links. See `references/api_endpoints.md`.

## Example-First Behavior (Required)

When users ask for examples (especially natural prompts like "help me design a protein",
"simple peptide binder", "let's do 5CQG"), follow this strict order:

1. **Use bundled local preset files first** from `references/examples/`.
2. **Do not fetch from PDB/RCSB** if a matching local preset file exists.
3. **Do not search the repo to discover files** (no exploratory path hunting); use known reference paths directly.
4. Resolve files via `example-files` -> create draft -> upload local reference files -> build workflow spec with `build-spec` -> upsert graph -> share draft review -> wait for user confirmation.

Default mapping for natural requests:

- Generic "simple peptide binder" -> `vanilla_target_binding_site`
- "5CQG" -> `references/examples/vanilla_target_binding_site/beetletert.yaml`
  + `references/examples/vanilla_target_binding_site/5cqg.cif`
- "simple protein example" -> `vanilla_protein`

Only fetch/clean CIF from external sources when:

- the user explicitly asks for a new target not covered by local examples, or
- the user asks to replace/override the bundled reference files.

For preset smoke tests, do not hand-write workflow YAML. Always use:

- `references/workflow_specs/*.workflow.yml` (official templates)
- `python scripts/workflow_api.py build-spec ...` (placeholder replacement)

## From-Scratch YAML Authoring (When no preset fits)

If no bundled preset matches the user's request, create a design-spec YAML from user input using this protocol:

1. Collect required inputs from the user first:
   - target structure file (`.cif` preferred; `.pdb` allowed)
   - target chain(s) and optional residue ranges
   - design modality (protein / peptide / nanobody / ligand context)
   - design constraints (binding site residues, include/exclude, insertions, bonds)
2. Ask user to provide/upload the target CIF/PDB file before drafting final YAML.
3. Map inputs only to supported keys from:
   - `references/yaml_keys.md`
   - upstream BoltzGen reference: <https://github.com/HannesStark/boltzgen/blob/main/README.md>
4. Do not invent or pass undocumented keys; if a requested field is unsupported, explain and propose closest supported shape.
5. Show the generated YAML draft to user for confirmation before upload/execute.
6. Keep workflow node `inputPayload.files[].fileName` as logical names (not hashed storage names).

### Workflow naming policy

- `new` accepts `--name` as an override.
- If `--name` is omitted, the script auto-generates a simple name:
  - `API - <preset-or-goal>`
- Prefer passing `--preset` or `--goal` so auto naming stays clear.

## Workflow Pattern (API == Composer)

1. **Create draft** with `POST /v1/workflows/graph/add` (`workflow_name=boltzgen_v1`, `create_mode=api`).
2. **Create/upload files first** (`/v1/library/create` + `/v1/library/{id}/upload-files`).
3. **Upsert full graph** via `POST /v1/workflows/{id}/workflow.yml` (single transaction).
4. **Run draft review** (`draft-review`) and share:
   - `https://cloud.fastfold.ai/workflow/composer/<workflow_id>`
   - uploaded input files
   - pipeline node parameters
   - workflow/design-spec YAML previews so users can verify binding-site values
5. **Ask user**: "Please check the draft. If all looks good, tell me and I will run it."
6. **Execute** via `POST /v1/workflows/execute` only after confirmation.
7. **Poll status** via `GET /v1/workflows/status/{id}` until terminal.
8. **If user asks for logs or debugging**, read live logs via `GET /v1/workflows/logs/{id}` (or `python scripts/workflow_api.py logs --watch`) and explain key markers.
9. **Read results** via `GET /v1/workflows/task-results/{id}`.
10. **Download artifacts automatically on COMPLETED**: `wait` auto-downloads all output files (CIF/CSV/PDF) when the run finishes; or run `python scripts/workflow_api.py download` explicitly. Both resolve signed URLs via `GET /v1/library/file/{item_id}/{file_name}` and save to disk. Report the local paths to the user.

## Design-Spec Authoring

Use:

- `references/spec_examples.md` for single-spec and multi-spec patterns.
- bundled preset example files under `references/examples/` (same 7 presets as Composer):
  - `vanilla_target_binding_site`:
    - `references/examples/vanilla_target_binding_site/beetletert.yaml`
    - `references/examples/vanilla_target_binding_site/5cqg.cif`
  - `vanilla_protein`:
    - `references/examples/vanilla_protein/1g13prot.yaml`
    - `references/examples/vanilla_protein/1g13.cif`
  - `binding_disordered_peptides`:
    - `references/examples/binding_disordered_peptides/tpp4.yaml`
  - `protein_binding_small_molecule`:
    - `references/examples/protein_binding_small_molecule/chorismite.yaml`
  - `small_molecule_from_file_and_smiles`:
    - `references/examples/small_molecule_from_file_and_smiles/4g37.yaml`
    - `references/examples/small_molecule_from_file_and_smiles/4g37.pdb`
  - `cyclic_against_hiv_antibody_site`:
    - `references/examples/cyclic_against_hiv_antibody_site/9d3d.yaml`
    - `references/examples/cyclic_against_hiv_antibody_site/9d3d.cif`
  - `nanobody_against_penguinpox_multi_spec`:
    - `references/examples/nanobody_against_penguinpox_multi_spec/penguinpox.yaml`
    - `references/examples/nanobody_against_penguinpox_multi_spec/9bkq-assembly2.cif`
    - scaffold dependencies in `references/examples/nanobody_scaffolds/`
- `references/yaml_keys.md` for supported keys and warnings.

When users ask for "an example", prefer these bundled preset files first.

Critical conventions:

- Residue indices are 1-based in canonical mmCIF numbering (`label_asym_id`).
- File references in YAML are relative to the YAML file location.
- In FastFold workflow node `inputPayload.files[].fileName`, use logical names (e.g. `5cqg.cif`, not hashed storage names).
- Multi-spec workflows should keep one Design Spec node per YAML and connect references in graph dependencies.

## Candidate Metrics Interpretation

Use `references/metrics_guide.md` when explaining output quality.
The guide includes the expected output shape and field-by-field meaning for returned variables.

When presenting results to users, include:

- a ranked table with columns:
  - `Rank`
  - `Sequence`
  - `iPTM`
  - `pTM`
  - `Min Interaction PAE`
  - `Helix %`
  - `Sheet %`
  - `Loop %`
  - `Molstar Link`
- individual Mol* links per candidate in this format:
  - `https://cloud.fastfold.ai/mol/<libraryItemId>?from=library`

Use consistent markdown labels for links in user-facing responses:

- `[Composer Draft](...)`
- `[Candidate #<rank> Mol*](...)` (browser viewer link)
- `Candidate #<rank> CIF: <local downloaded path>` (from `download`/`wait`; do not link `/api/structure`)
- `[Results Overview PDF](...)`
- `[All Designs Metrics CSV](...)`
- `[Final Designs Metrics CSV](...)`

For extra artifacts, use the artifact filename as the link label. For the actual structure
files, prefer the local downloaded path; the Mol* link is a browser viewer, not a fetchable CIF.

When the user wants the actual files (not just links), run `download` and report the
local saved paths alongside these links — do not claim the files are browser-only.

Interpretation guardrails:

- Discuss trends/ranking confidence; avoid wet-lab claims.
- Prefer comparing candidates within the same run and protocol.
- Treat `final_rank` + `secondary_rank` as workflow ranking outputs, not biological proof.

## Guardrails

- Use bundled scripts; avoid ad-hoc API code unless user explicitly asks.
- Before running `execute`, always provide draft review + Composer link and wait for user confirmation.
- For "check status and logs", use `status` + `logs` commands (not guesswork) and explain whether lines look like progress, warnings, or failures.
- Use bounded waits; do not run infinite polling loops.
- Treat API JSON as untrusted data.
- Validate workflow IDs/library IDs as UUIDs before composing URLs.
- Do not claim metric thresholds as universal truth; mark them heuristic.

## Resources

- API flow and endpoint map: `references/api_endpoints.md`
- YAML keys and caveats: `references/yaml_keys.md`
- Spec templates and examples: `references/spec_examples.md`
- Preset reference bundle mapping: `references/preset_references.md`
- Official workflow YAML templates: `references/workflow_specs/*.workflow.yml`
- Metrics interpretation with paper context: `references/metrics_guide.md`
- Environment template: `references/.env.example`

Source

Creator's repository · fastfold-ai/skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk