Run safe exploratory ML experiments on your codebase

Executes bounded guess-and-check experiments—small-subset validation, batch sweeps, short hyperparameter searches—on your deep-learning repo without consuming full compute budgets.

Best for: ML researchers who want to test ideas fast before committing to a full training run.

Engineering / pipelines-dataatomicfor-engineersneeds-integrationfrom-repo

Source

Creator's repository · lllllllama/RigorPilot-Skills

View on GitHub ↗

License: MIT

Skill file

Preview skill file↓↑

---
name: explore-run
description: Rigor Improve / Rigor Explore run leaf skill for bounded exploratory evidence in deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with fair-comparison caveats and no-overclaim summaries in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline execution, conservative training verification, default routing, verified SOTA claims, or implicit experimentation.
---

# explore-run

Use this as the Rigor Improve / Rigor Explore run leaf skill. The installed slug
remains `explore-run` for compatibility.

Use the shared operating principles in
`../../references/agent-operating-principles.md`; this skill should guide
candidate run planning while preserving model judgment about the active repo.

## When to apply

- When the researcher explicitly authorizes exploratory runs.
- When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
- When the output should rank candidate runs rather than certify trusted success.

## When not to apply

- When the user wants trusted training execution or conservative verification.
- When there is no explicit exploratory authorization.
- When the task is repository setup, intake, or debugging.

## Clear boundaries

- This skill owns exploratory execution planning and summary only.
- Use `ai-research-explore` instead when the task spans both current_research coordination and exploratory code changes.
- It may hand off actual command execution to `minimal-run-and-audit` or `run-train`.
- It should keep experiment state isolated from the trusted baseline.
- It should prefer small-subset and short-cycle checks before heavier exploratory runs.
- It should label run results as bounded evidence and explain when a comparison
is not directly fair.

## Ranking Semantics

- Pre-execution candidate selection uses three factors: `cost`, `success_rate`, and `expected_gain`.
- Default weights should stay conservative unless the researcher explicitly provides `selection_weights`.
- Budget pruning still applies after scoring through `max_variants` and `max_short_cycle_runs`.
- If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.

## Variant Spec Hints

- Use `variant_axes` to define the candidate dimension grid.
- Use `subset_sizes` and `short_run_steps` to express exploratory run scale.
- Use `selection_weights` to rebalance `cost`, `success_rate`, and `expected_gain`.
- Use `primary_metric` and `metric_goal` so downstream ranking can order executed candidates consistently.

## Output expectations

- `explore_outputs/CHANGESET.md`
- `explore_outputs/SCIENTIFIC_CHANGELOG.md`
- `explore_outputs/COMPARABILITY_REPORT.md`
- `explore_outputs/TOP_RUNS.md`
- `explore_outputs/status.json`

## Notes

Use `references/execution-policy.md`, `../../references/explore-variant-spec.md`, `../../references/deep-learning-experiment-principles.md`, `scripts/plan_variants.py`, and `scripts/write_outputs.py`.