vss-summarize-video

Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for live RTSP captioning or incident-range reports.

Skill file

Preview skill file
---
name: vss-summarize-video
description: Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for live RTSP captioning or incident-range reports.
license: Apache-2.0
metadata:
  version: "3.2.0"
  author: "NVIDIA Video Search and Summarization team"
  github-url: "https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization"
  tags: "nvidia blueprint operational"
---
## Instructions

Follow the routing tables and step-by-step workflows below. Each section that ends in *workflow*, *quick start*, or *flow* is intended to be executed top-to-bottom. Detailed reference material lives in `references/` and helper scripts live in `scripts/` — call them via `run_script` when the skill points to a script by name.

## Examples

Worked end-to-end examples are kept under `evals/` (each `*.json` manifest contains a runnable scenario) and inline in the per-workflow `curl` blocks below. Run a Tier-3 evaluation with `nv-base validate <this-skill-dir> --agent-eval` to replay them.

You are a video summarization assistant. You call the VLM NIM or the video summarization
microservice **directly**. Always run `curl` commands yourself; never instruct the user to run them.

Primary video workflow query type: **"Summarize this video."** Direct video summarization API
and service-ops requests are handled by the reference-routed sections below.

## Purpose

Produce a single, polished narrative summary of one recorded video clip, with
timestamped events when the LVS microservice path is reachable.

**Do NOT use this skill for:**
- Live RTSP captioning — use `vss-deploy-dense-captioning`.
- Incident-range or alert-window reports — use `vss-generate-video-report` Mode B.
- Semantic search across the archive — use `vss-search-archive`.

## Prerequisites

- VSS `lvs` profile running on `$HOST_IP` (port 38111) OR a reachable
  VLM/RT-VLM endpoint as a fallback. The `vss-deploy-profile` skill brings
  these up.
- Network reachability from the agent host to both endpoints; clip URLs from
  VIOS must be fetchable by the chosen backend.
- `jq` and `curl` available on the agent host.

## Limitations

- Direct VLM fallback uses a single fixed prompt and cannot target
  scenario/events — output quality is lower than the LVS path.
- Remote VLM endpoints generally cannot reach `localhost`/private clip URLs.
- One backend call per request; no parallel hedging or multi-pass summaries.

## Troubleshooting

| Symptom | Cause | Fix |
|---|---|---|
| `/v1/ready` returns 503 repeatedly | LVS service still warming up | Retry up to ~30 s as shown in *Setup*; if it never returns 200 the service may not be deployed |
| Empty `video_summary` and `events` | Clip does not contain the requested events | Re-run with broader `scenario` or different `events` |
| VLM returns `<think>` block | Cosmos Reason 2 reasoning mode | Strip everything up to `</think>` before rendering |
| Empty stdout from `curl /v1/ready` | Service legitimately returns 200 with empty body | Always check HTTP status with `-o /dev/null -w '%{http_code}'`, never inspect the body |

See [`references/video-summarization-debugging.md`](references/video-summarization-debugging.md) for deeper diagnostics.

## Reference Map

Use these references only when the user asks for the relevant detail, or when
the core workflow below needs deeper video summarization information:

- **video summarization API details**: [`references/video-summarization-api.md`](references/video-summarization-api.md) for
  `/v1/summarize`, `/summarize`, `/v1/generate_captions`,
  `/v1/stream_summarize`, health probes, `/models`, `/recommended_config`,
  `/metrics`, request fields, response shapes, and API gotchas.
- **video summarization service configuration and ops**:
  [`references/video-summarization-deployment.md`](references/video-summarization-deployment.md) for
  the VSS `lvs` profile, ports, required env vars, logs, status, dry-runs,
  teardown, model/backend swaps, Elasticsearch/Neo4j/ArangoDB backend
  selection, and service-level troubleshooting.
- **Extended video summarization ops references**:
  [`references/video-summarization-environment-variables.md`](references/video-summarization-environment-variables.md),
  [`references/video-summarization-debugging.md`](references/video-summarization-debugging.md), and
  `assets/video-summarization.env.example`.

Load `video-summarization-api.md` only when you need a request field, response shape, or
endpoint that is not already covered by the Step 2 LVS or fallback VLM
example below, or when handling a direct video summarization API
request. Load `video-summarization-deployment.md` only for deployment,
configuration, or service operations.

## Video Summarization API And Service Ops Requests

If the user asks to call or debug video summarization endpoints directly, answer from
[`references/video-summarization-api.md`](references/video-summarization-api.md) instead of running the
end-to-end video summarization workflow. Examples: list video summarization models, check
readiness, get recommended chunking config, inspect metrics, explain a 422
response, or build a `/v1/summarize` request body.

If the user asks to configure, deploy, restart, tear down, or troubleshoot the
video summarization service, prefer the `vss-deploy-profile` skill for full VSS profile
deployment and use [`references/video-summarization-deployment.md`](references/video-summarization-deployment.md)
for video summarization-specific service details.

## Routing

Decide purely from video summarization service availability (probed in
*Setup → Availability checks* below). **Duration does not drive routing.**

| `/v1/ready` | Backend | Endpoint |
|---|---|---|
| HTTP 200 | LVS microservice with HITL | `POST ${LVS_BACKEND_URL}/v1/summarize` |
| Anything else | VLM / RT-VLM with the default prompt + fallback note | `POST ${VLM_BASE_URL}/v1/chat/completions` |

Fallback message when the LVS service is unreachable — copy verbatim above the summary:

> ⚠ **Note:** Input video `<name>` is `<N>`s long.
> The video summarization service is not deployed, so this summary was
> produced by the VLM alone with a generic default prompt. Deploy the
> `lvs` profile for higher-quality summaries with scenario/events
> targeting.

## Deployment prerequisite

The VSS **lvs** profile on `$HOST_IP` is the primary backend. If the
`/v1/ready` probe (see *Setup → Availability checks*) returns anything
other than 200 after the warmup retries, ask the user:

> *"The VSS `lvs` profile isn't running on `$HOST_IP`. Shall I deploy it now using the `/vss-deploy-profile` skill with `-p lvs`? Reply `no` to summarize with the VLM-only fallback instead (lower quality, no scenario/events targeting)."*

- **Yes** → hand off to `/vss-deploy-profile`, then re-probe and continue with Step 2 (LVS + HITL).
- **No** → go straight to **Step 2 fallback (VLM with default prompt)** and prepend the Routing fallback note. Do not ask again, and do not run scenario/events HITL.
- **Pre-authorized to deploy autonomously** (caller said so explicitly) → skip the confirmation and invoke `/vss-deploy-profile` directly.
- **Pre-authorized to use VLM fallback** ("skip lvs, just use the VLM") → go straight to Step 2 fallback without prompting.

---

## Setup

**Endpoints (defaults for a local VSS `lvs` deployment):**

- VLM / RT-VLM: `${VLM_BASE_URL}` — default `${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}`
- LVS service: `${LVS_BACKEND_URL}` — default `http://${HOST_IP:-localhost}:38111`
- VIOS: owned by `vss-manage-video-io-storage`; refer there.

Use env vars when set (strip trailing `/v1` from the VLM base — the skill appends it). Otherwise use the defaults. If neither works, ask the user — do not scan ports or read config files to guess.

**Model name:** read `${VLM_NAME}` (default
`nim_nvidia_cosmos-reason2-8b_hf-1208`). It must match the id RT-VLM
`/v1/models` advertises; do not substitute the friendly
`nvidia/cosmos-reason2-8b`.

For endpoint schemas, optional fields, response envelopes, and error handling, see [`references/video-summarization-api.md`](references/video-summarization-api.md).

**Availability checks** (run both before routing).
**Readiness is determined by the HTTP status code only** — the LVS
`/v1/ready` may legitimately return `200` with an empty body, so do not
inspect the body.

```bash
VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"

# VLM / RT-VLM: 200 on /v1/models
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 \
  "$VLM/v1/models")
[ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"

# Video summarization service: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s
VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}
video_sum_code=000
for i in $(seq 1 10); do
  video_sum_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$VIDEO_SUMMARIZATION_URL/v1/ready")
  case "$video_sum_code" in
    200) echo "video summarization OK"; break ;;
    503) sleep 3 ;;                 # warming up; keep polling
    *)   break ;;                   # any other code = not reachable, stop retrying
  esac
done
[ "$video_sum_code" = "200" ] || echo "video summarization service not reachable (HTTP $video_sum_code)"
```

**How to interpret the results:**

- `video_sum_code = 200` → **Step 2 (LVS + HITL)** for every video.
- `video_sum_code != 200`, `vlm_code = 200` → **Step 2 fallback (VLM)**; prepend the Routing fallback note.
- `vlm_code != 200` → fail; at least one backend must be reachable.
- A non-200 LVS code after the retry loop is the ONLY signal of unavailability. Empty stdout or missing JSON fields are NOT "unavailable."

---

## Step 1 - Get the clip URL via `vss-manage-video-io-storage` (sub-task, NOT the final answer)

**Use the `vss-manage-video-io-storage` skill for all VIOS interactions** — it
owns the canonical curl recipes, parameter defaults, and delete/upload flows.
Do not fabricate URLs or hand-roll VIOS calls; they will drift.

This step is a sub-task — do NOT end your turn here; do NOT return the clip
URL as the final answer. From VIOS collect three values:

1. **`streamId`** (via `sensor/list` → `sensor/<id>/streams`, or directly from an upload response).
2. **Timeline** - `{startTime, endTime}` (ISO 8601 UTC). `endTime - startTime` is the duration; needed only for the user-facing header (routing is driven solely by `/v1/ready`).
3. **Temporary MP4 clip URL** — the `/storage/file/<streamId>/url` variant with `container=mp4`. Response field: `.videoUrl`. Both backends need an HTTP(S) URL they can `GET`.

Everything else (auth, upload, `disableAudio`, expiry, etc.) lives in the
`vss-manage-video-io-storage` skill — refer users there if VIOS fails.

---

## Step 2 — Primary: video summarization microservice with HITL

Use this path **whenever** `/v1/ready` returned 200 in Setup. Duration is irrelevant.

For advanced fields (`media_info`, `schema`, structured output, stream captioning, metrics, recommended config) see [`references/video-summarization-api.md`](references/video-summarization-api.md).

### HITL: collect scenario and events first (REQUIRED — do not skip)

Full walk-through is in [`references/hitl-prompts.md`](references/hitl-prompts.md). Always run HITL before calling the LVS service.

**Autonomous-mode defaults.** When the caller has bypassed HITL ("run
autonomously without prompting") AND the original query asks for
`default`/`defaults` (or gives none), use
`scenario="activity monitoring"` and `events=["notable activity"]`
**verbatim** — do not infer from filename or sensor name. Note the
defaults in the final reply and offer a re-run with more specific
parameters. This is the ONLY supported HITL bypass; "the video is
short" or "the user seems in a hurry" are not valid reasons.

Prefer `POST /v1/summarize` (3.2 GA route); `/summarize` is a compatibility alias.

```bash
VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}

# From HITL reply:
SCENARIO='warehouse monitoring'
EVENTS_JSON='["notable activity"]'
OBJECTS_JSON=''  # '' to omit, else '["forklifts","pallets","workers"]'

curl -s -X POST "$VIDEO_SUMMARIZATION_URL/v1/summarize" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_hf-1208}" \
        --arg scenario "$SCENARIO" \
        --argjson events "$EVENTS_JSON" \
        --argjson objects "${OBJECTS_JSON:-null}" '{
    url: $url,
    model: $model,
    scenario: $scenario,
    events: $events,
    chunk_duration: 10,
    num_frames_per_second_or_fixed_frames_chunk: 20,
    use_fps_for_chunking: false,
    seed: 1
  } + (if $objects == null then {} else {objects_of_interest: $objects} end)')" \
  | jq -r '.choices[0].message.content' \
  | jq '{video_summary, events}'
```

If both `video_summary` and `events` are empty, the clip probably doesn't contain the requested events — re-run with broader `scenario`/`events`, don't report "no content".

**Tuning:** `chunk_duration` (default `10`s; `0` = single chunk),
`num_frames_per_second_or_fixed_frames_chunk` (default `20`; meaning depends
on `use_fps_for_chunking`), `seed` (default `1`). `num_frames_per_chunk` is
deprecated.

---

## Step 2 fallback — VLM direct with default prompt

Use this path **only** when `/v1/ready` did not return 200 after warmup. Do NOT run HITL — the user did not opt in; you fell back because the service was missing. Prepend the Routing fallback note to the response.

```bash
VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s -X POST "$VLM/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_hf-1208}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'
```

**Response:** standard OpenAI chat-completion envelope. The summary is in
`choices[0].message.content`.

**Cosmos-model notes:** Cosmos Reason 2 supports reasoning via
`<think>...</think><answer>...</answer>` blocks. Omit the reasoning
instructions if you want a plain summary. Frame sampling and pixel limits
are applied server-side; no client-side prep is required when you pass a
`video_url`.

---

## End-to-end example

See [`references/end-to-end-example.md`](references/end-to-end-example.md) for
the full LVS-or-VLM-fallback script that probes `/v1/ready` and runs the
appropriate path.

---

## Responses

- **VLM** returns an OpenAI chat-completion envelope; summary is
  `choices[0].message.content`.
- **LVS service** returns the same envelope but `content` is a JSON string —
  run `jq -r '.choices[0].message.content' | jq` to reach `{video_summary, events}`.
- **Errors** surface as HTTP non-2xx plus JSON `{error: ...}`. LVS `503` usually
  means warmup — retry `/v1/ready`.

### Presenting the output to the user

Surface backend output with **minimal transformation** — do not paraphrase,
re-voice, add emojis, or reformat. **One backend call → one rendering**: no
parallel hedging, no duplicate headers, never call both LVS and VLM for the
same video.

**Header line.** Start with exactly one:

```
Summary of <video_name> (<duration>)
```

`<duration>` = `Ns` for `< 60 s`, else `Mm Ss` (e.g. `3m 30s`).

**LVS output:** render `video_summary` **verbatim** (polished, tone-controlled
report — rewriting loses fidelity). Render each `events` entry with its
`start_time`, `end_time`, `type`, and full `description` verbatim (table when
the client renders one cleanly, otherwise a per-event list). You MAY add a
one-line header and a closing offer to re-run with different parameters.

**VLM output:** render `choices[0].message.content` verbatim. If the model
produced `<think>…</think><answer>…</answer>` blocks, drop the `<think>`
block and show the answer.

**Fallback warning** (when applicable) goes **above** the summary, never
mixed into it.

## Tips

- **Route by service availability, not by duration.** Probe `/v1/ready` once
  in Setup; HTTP 200 → LVS+HITL for every clip; anything else → VLM fallback.
- **HITL is mandatory on the LVS path.** The `defaults` opt-in is the only
  sanctioned bypass. The VLM fallback path is silent (no HITL).
- **Readiness = HTTP 200 on `/v1/ready`. Nothing else.** Body may be empty.
  Always use `curl -s -o /dev/null -w '%{http_code}'` — never pipe through
  `jq`/`grep`/`head`.
- **Delegate VIOS to `vss-manage-video-io-storage`** — it is a sub-task; the
  final answer is the Step 2 summary, not the clip URL.
- **`jq` twice for LVS output.** First unwraps the OpenAI envelope, second
  parses the JSON string inside `content`.
- **Prefer `/v1/summarize` for 3.2 GA**; `/summarize` is a compatibility alias.
- **Use the exact VLM model id advertised by the endpoint** (default
  `nim_nvidia_cosmos-reason2-8b_hf-1208`).
- **Render output verbatim** — no paraphrasing, no reformatting, no rewriting
  the `video_summary` or `choices[0].message.content`.
- **One call, one render.** No parallel hedging, no double renderings.

## Cross-reference

- **vss-deploy-profile** — bring up the `base` (VLM only) or `lvs` (VLM + video summarization service) profile
- **vss-manage-video-io-storage** (VIOS API) — upload videos, list streams, get clip URLs
- **vss-search-archive** — semantic search across the archive (different profile)
- **vss-query-analytics** — query incidents/events from Elasticsearch
- **video summarization API reference** — [`references/video-summarization-api.md`](references/video-summarization-api.md)
- **video summarization service ops reference** — [`references/video-summarization-deployment.md`](references/video-summarization-deployment.md)

bump:1

Source

Creator's repository · nvidia/skills

View on GitHub

License: Apache-2.0

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk