music

Skill file

Preview skill file↓↑
---
name: music
description: |
  Generate, remix, extend, edit, and analyze AI music (Mureka). Triggers on:
  "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "混音", "remix",
  "续写", "extend", "纯音乐", "instrumental", "配乐", "soundtrack", "分轨", "stem",
  "识别歌词", "recognize lyrics", "作曲", "compose",
  "create a song", "做一首歌".
metadata:
  openclaw:
    emoji: "🎵"
    requires:
      bin: ["listenhub"]
    primaryBin: "listenhub"
---

## When to Use

- User wants to generate original AI music from a prompt and/or lyrics
- User wants to remix / re-create an existing song with new lyrics
- User wants a pure instrumental, or a soundtrack scored to an image or video
- User wants to extend a song or isolate/generate a single track
- User wants to analyze audio — recognize lyrics, describe a song, or split stems
- User says "音乐", "music", "生成音乐", "generate music", "翻唱"/"混音"/"remix", "续写"/"extend", "纯音乐"/"instrumental", "配乐"/"soundtrack", "分轨"/"stem", "识别歌词", "作曲", "compose", "create a song", or "做一首歌"

## When NOT to Use

- User wants text-to-speech reading (use `/speech`)
- User wants a podcast discussion (use `/podcast`)
- User wants an explainer video with narration (use `/explainer`)
- User wants to transcribe spoken audio to text — not song lyrics (use `/asr`)

## Purpose

Full ListenHub music toolkit, powered by the **Mureka** provider via the `listenhub music` CLI. Capabilities:

**Generation (async — return a task to poll):**

1. **generate** — text and/or lyrics → a new song. Optional style, title, instrumental, and a cloned `--vocal-id`.
2. **remix** — an existing song + new lyrics → a re-creation. Input is one of an audio file, an audio URL, or a provider song ID.
3. **instrumental** — a pure instrumental from a prompt, or guided by a reference audio.
4. **soundtrack** — music scored to an image or a video.
5. **track** — isolate or generate a single instrument/vocal track from a song.
6. **extend** — make a song longer.
7. **cover** *(deprecated)* — older cover flow; prefer **remix**.

**Analysis (sync — return results immediately):**

8. **recognize** — lyrics with line-level timestamps.
9. **describe** — description, tags, genres, instruments.
10. **stem** — split a song into separated stems (ZIP download URLs).

**Task management:** `list` (recent tasks) and `get <taskId>` (status/result of one task).

Models for generation commands: `auto` (default), `mureka-7.6`, `mureka-8`, `mureka-9`, `mureka-o2`. See `references/music-api.md` for the full per-command parameter reference.

## Hard Constraints

- Always read config following `shared/config-pattern.md` before any interaction
- Follow `shared/cli-patterns.md` for execution modes, error handling, and interaction patterns
- Always follow `shared/cli-authentication.md` for auth checks
- Never save files to `~/Downloads/` or `.listenhub/` — save artifacts to the current working directory with friendly topic-based names (see `shared/config-pattern.md` § Artifact Naming)
- No speakers involved — music generation does not use speaker selection
- File limits (all max 10 MB): audio mp3/m4a (`track` also accepts wav); image jpg/jpeg/png/webp; video mp4/mov/avi/mkv/webm
- All time-range flags are in **seconds** (`--generate-start/--generate-end`)
- For async generation commands, use a long timeout: `run_in_background: true` with `timeout: 660000` (600s+). Sync commands (`recognize`, `describe`, `stem`) return immediately
- `cover` is deprecated — steer users to `remix` unless they explicitly ask for `cover`

<HARD-GATE>
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed.

</HARD-GATE>

## Step -1: CLI Auth Check

Follow `shared/cli-authentication.md`. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.

## Step 0: Config Setup

Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot).

**If file doesn't exist** — silently create with defaults and proceed:
```bash
mkdir -p ".listenhub/music"
echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json"
CONFIG_PATH=".listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")
```
**Do NOT ask any setup questions.** Proceed directly to the Interaction Flow.

**If file exists** — read config silently and proceed:
```bash
CONFIG_PATH=".listenhub/music/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")
```

### Setup Flow (user-initiated reconfigure only)

Only run when the user explicitly asks to reconfigure. Display current settings:
```
当前配置 (music)：
  输出方式：{inline / download / both}
  语言偏好：{zh / en / 未设置}
```

Then ask:

1. **outputMode**: Follow `shared/output-mode.md` § Setup Flow Question.

2. **Language** (optional): "默认语言？"
   - "中文 (zh)"
   - "English (en)"
   - "每次手动选择" → keep `null`

After collecting answers, save immediately:
```bash
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
if [ "$LANGUAGE" != "null" ]; then
  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
```

## Interaction Flow

### Step 1: Capability

Pick the capability. Skip the question if the user's intent is already clear (e.g., "翻唱"/"混音"/"remix" → remix; "作曲"/"compose"/"做一首歌" → generate; "纯音乐"/"instrumental" → instrumental; "续写"/"extend" → extend; "分轨"/"stem" → stem; "识别歌词" → recognize).

```
Question: "想做什么？"
Options:
  - "原创 (Generate)" — 用文字 / 歌词生成全新歌曲
  - "混音 (Remix)" — 基于已有歌曲 + 新歌词重新创作
  - "纯音乐 (Instrumental)" — 生成无人声的器乐
  - "配乐 (Soundtrack)" — 为图片或视频配乐
  - "其他" — 续写 / 单轨 / 识别歌词 / 描述 / 分轨
```

If the user picks "其他", follow up with a second AskUserQuestion listing: 续写 (Extend)、单轨 (Track)、识别歌词 (Recognize)、描述 (Describe)、分轨 (Stem).

`get <taskId>` and `list` are not interactive flows — run them directly when the user asks about a task's status.

### Step 2: Gather inputs (per capability)

Use the per-capability fields below. Ask for required inputs; offer optional ones. For any audio/image/video file, **validate** before confirming:

- Local path: verify the file exists and the extension matches the allowed list for that command (see Hard Constraints / `references/music-api.md`).
- URL: accept as-is (the CLI validates).
- Size: reject local files over 10 MB.

```bash
FILE_SIZE=$(stat -f%z "{path}" 2>/dev/null || stat -c%s "{path}" 2>/dev/null)
if [ "$FILE_SIZE" -gt 10485760 ]; then echo "File exceeds 10 MB limit"; fi
```

**generate** — `--prompt` and/or `--lyrics` (at least one); optional `--style`, `--title`, `--model`, `--instrumental`, `--vocal-id`.

**remix** — exactly one input source: `--audio` (file) / `--audio-url` / `--provider-song-id`; plus `--lyrics` and `--prompt` (both required); optional `--style`, `--title`, `--model`.

**instrumental** — exactly one of `--prompt` / `--reference-audio`; optional `--title`, `--model`.

**soundtrack** — exactly one of `--image` / `--video`; optional `--prompt`, `--title`, `--model`.

**track** — exactly one input source `--audio` / `--provider-song-id`; `--generate-type` (one of Vocals|Instrumental|Drums|Bass|Guitar|Keyboard|Percussion|Strings|Synth|FX|Brass|Woodwinds); optional `--prompt`; `--lyrics` only when type is Vocals; `--vocal-gender male|female`; `--generate-start`/`--generate-end` (seconds); `--model`.

**extend** — one input source `--audio` / `--provider-song-id`; optional `--prompt`, `--model`.

**recognize** / **describe** / **stem** — `--audio` only. `stem` also takes `--model audio-separation-1|audio-separation-2`.

For multi-choice fields (model, generate-type, vocal-gender, instrumental yes/no) use the AskUserQuestion tool. Free-text fields (prompt, lyrics, style, title) accept plain text.

### Step 3: Confirm

Summarize the capability and every collected parameter, then ask the user to confirm. Examples:

**generate:**
```
准备生成音乐：
  能力：原创 (Generate)
  描述：{prompt / 无}
  歌词：{lyrics / 无}
  风格：{style / 自动}
  标题：{title / 自动}
  模型：{model / auto}
  人声：{带人声 / 纯音乐}
  Vocal ID：{vocal-id / 无}
  确认？
```

**remix:**
```
准备混音：
  能力：混音 (Remix)
  原曲：{audio / audio-url / provider-song-id}
  新歌词：{lyrics}
  描述：{prompt}
  风格：{style / 自动}
  标题：{title / 自动}
  模型：{model / auto}
  确认？
```

For analysis capabilities (recognize / describe / stem) the summary is just the capability + the input audio (+ separation model for stem); these run synchronously, so confirmation can be lightweight.

Wait for explicit confirmation before running any CLI command.

## Workflow

### Async generation commands

`generate`, `remix`, `instrumental`, `soundtrack`, `track`, `extend`, `cover`.

1. **Submit (background)** with `run_in_background: true` and `timeout: 660000`. Always pass `--json`. Include only the flags the user provided; omit the rest.

   **generate:**
   ```bash
   listenhub music generate \
     --prompt "{prompt}" \
     --lyrics "{lyrics}" \
     --model "{model}" \
     --style "{style}" \
     --title "{title}" \
     --instrumental \
     --vocal-id "{vocal-id}" \
     --json
   ```

   **remix:**
   ```bash
   listenhub music remix \
     --audio "{path}" \
     --lyrics "{lyrics}" \
     --prompt "{prompt}" \
     --style "{style}" \
     --title "{title}" \
     --json
   ```
   (use exactly one of `--audio` / `--audio-url` / `--provider-song-id`)

   **instrumental:**
   ```bash
   listenhub music instrumental \
     --prompt "{prompt}" \
     --model "{model}" \
     --json
   ```
   (or `--reference-audio "{path}"` instead of `--prompt`)

   **soundtrack:**
   ```bash
   listenhub music soundtrack \
     --image "{path}" \
     --prompt "{prompt}" \
     --json
   ```
   (or `--video "{path}"` instead of `--image`)

   **track:**
   ```bash
   listenhub music track \
     --audio "{path}" \
     --generate-type "Vocals" \
     --lyrics "{lyrics}" \
     --vocal-gender "female" \
     --generate-start 0 --generate-end 30 \
     --json
   ```
   (or `--provider-song-id`; `--lyrics` only when `--generate-type Vocals`)

   **extend:**
   ```bash
   listenhub music extend \
     --audio "{path}" \
     --prompt "{how to continue}" \
     --json
   ```

   The CLI handles polling internally. Generation can take up to ~10 minutes.

2. Tell the user the task is submitted and that they'll be notified when it finishes. If they only have a `taskId`, they can check with `listenhub music get <taskId> --json` or `listenhub music list --json`.

3. When notified of completion, **present the result**. The CLI JSON is a task object — the song is in `tracks[0]`, credit is `creditCost`, and `duration` is in **seconds**. Parse the key fields:
   ```bash
   AUDIO_URL=$(echo "$RESULT" | jq -r '.tracks[0].audioUrl // empty')
   TITLE=$(echo "$RESULT" | jq -r '[.tracks[0].title, .params.title, "Untitled"] | map(select(. != null and . != "")) | .[0]')
   # duration is seconds (older pre-rollout Mureka tasks may still be ms → a value ≥ 3600 means ms)
   DURATION=$(echo "$RESULT" | jq -r '.tracks[0].duration // 0' \
     | awk '{d=$1; if (d>=3600) d/=1000; printf "%d:%02d", int(d/60), int(d%60)}')
   CREDITS=$(echo "$RESULT" | jq -r '.creditCost // empty')
   ```

   Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior.

   **`inline` or `both`**: Display the audio URL as a clickable link.
   ```
   音乐已生成！

   标题：{title}
   在线收听：{audioUrl}
   时长：{duration}
   消耗积分：{credits}
   ```

   **`download` or `both`**: Also download the file. Generate a slug from the title following `shared/config-pattern.md` § Artifact Naming.
   ```bash
   SLUG="{slug}"  # e.g. "summer-breeze"
   NAME="${SLUG}.mp3"
   # Dedup: if file exists, append -2, -3, etc.
   BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
   while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
   curl -sS -o "$NAME" "{audioUrl}"
   ```
   ```
   已保存到当前目录：
     {NAME}
   ```

### Sync analysis commands

`recognize`, `describe`, `stem` return results in the same call — run them in the foreground (no background, no long timeout) and present immediately.

   **recognize** (lyrics + timestamps):
   ```bash
   listenhub music recognize --audio "{path}" --json
   ```

   **describe** (description, tags, genres, instruments):
   ```bash
   listenhub music describe --audio "{path}" --json
   ```

   **stem** (separated stems → ZIP download URLs):
   ```bash
   listenhub music stem --audio "{path}" --model "audio-separation-2" --json
   ```
   In `download`/`both` mode, download the ZIP URL(s) promptly to cwd.

### Task management

```bash
listenhub music list --json            # recent tasks
listenhub music get "{taskId}" --json   # one task's status / result
```

### After Successful Generation

Update config with the language used this session if the user explicitly specified one:

```bash
if [ -n "$LANGUAGE" ]; then
  NEW_CONFIG=$(echo "$CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
  echo "$NEW_CONFIG" > "$CONFIG_PATH"
fi
```

**Estimated times**:
- Music generation: 5-10 minutes

## Resources

- Per-command parameter reference: `references/music-api.md`
- CLI authentication: `shared/cli-authentication.md`
- CLI patterns: `shared/cli-patterns.md`
- Config pattern: `shared/config-pattern.md`
- Output mode: `shared/output-mode.md`

## Composability

- **Invokes**: nothing
- **Invoked by**: content-planner (Phase 3)

## Examples

**Generate original:**

> "帮我做一首关于夏天海边的歌"

1. Detect: generate mode ("做一首歌")
2. Read config (first run: create defaults with `outputMode: "download"`)
3. Infer: mode = generate, prompt = "夏天海边的歌"
4. Ask: style? title? instrumental?
5. Confirm summary → user confirms

```bash
listenhub music generate \
  --prompt "关于夏天海边的歌" \
  --json
```

Wait for CLI to return result, then download `{slug}.mp3` to cwd.

**Remix an existing song:**

> "用 demo.mp3 重新填词混音，把它做成 city pop 风格"

1. Detect: remix capability ("混音")
2. Validate: `demo.mp3` exists, is mp3/m4a, under 10 MB
3. Ask: new lyrics (`--lyrics`, required), prompt/direction (`--prompt`, required), style, title
4. Confirm summary → user confirms

```bash
listenhub music remix \
  --audio "demo.mp3" \
  --lyrics "{new lyrics}" \
  --prompt "rework as upbeat city pop" \
  --style "city pop" \
  --json
```

Wait for the CLI result, then download `{slug}.mp3` to cwd.

**Generate instrumental:**

> "Create an instrumental electronic track for a game intro"

1. Detect: instrumental capability ("instrumental")
2. Infer: prompt = "electronic track for a game intro"
3. Confirm summary → user confirms

```bash
listenhub music instrumental \
  --prompt "electronic track for a game intro" \
  --json
```

Wait for the CLI result, then download `{slug}.mp3` to cwd.

**Soundtrack for a video:**

> "给这段 clip.mp4 配一段紧张的背景音乐"

1. Detect: soundtrack capability ("配乐"), input is a video
2. Validate: `clip.mp4` exists (mp4/mov/avi/mkv/webm), under 10 MB
3. Infer: prompt = "紧张的背景音乐"
4. Confirm summary → user confirms

```bash
listenhub music soundtrack \
  --video "clip.mp4" \
  --prompt "tense, suspenseful background score" \
  --json
```

**Recognize lyrics (sync):**

> "帮我识别 song.mp3 里的歌词"

1. Detect: recognize capability ("识别歌词")
2. Validate: `song.mp3` exists, under 10 MB
3. Run in foreground and show lyrics with timestamps

```bash
listenhub music recognize --audio "song.mp3" --json
```

**Split stems (sync):**

> "把 track.mp3 分轨"

1. Detect: stem capability ("分轨")
2. Ask: separation model (audio-separation-1 / audio-separation-2)
3. Run in foreground; in download mode, fetch the ZIP URL to cwd

```bash
listenhub music stem --audio "track.mp3" --model "audio-separation-2" --json
```
Source

Creator's repository · marswaveai/skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk