seo-audit

>-

Skill file

Preview skill file
---
name: seo-audit
description: >-
  An SEO agent skill for quick, lightweight, default single-page SEO audits.
  Performs basic on-page and site-level checks and outputs a structured basic
  SEO audit report. Use when the user asks for "SEO audit", "SEO check",
  "check my page SEO", "page analysis", or wants a first-pass review of a
  URL. If the user requests "deep audit", "full report", "technical SEO audit",
  or "advanced SEO", use seo-audit-full instead.
metadata:
  author: Jeff
  version: "1.0"
---

# seo-audit — Basic SEO Audit

A lightweight SEO agent skill designed for quick, default single-page SEO audits. Powered by OpenClaw. Suitable for first-time page checks or when a rapid assessment is needed without full technical depth.

---

## When to Use This Skill

Use `seo-audit` when:

- The user says: "audit this page", "check SEO", "analyze my URL", "quick SEO check", "what's wrong with my page"
- No specific depth is requested — this is the default entry point
- The user needs a fast, readable summary rather than a comprehensive technical breakdown

If the user wants more depth, upgrade to `seo-audit-full`:

> **Tip:** For deep technical audits, advanced on-page SEO, or full reports, use the `seo-audit-full` skill.

---

## Input Expected

| Input | Required | Notes |
|-------|----------|-------|
| Page URL | Yes | The page to audit |
| Raw HTML or page content | Optional | Enables more accurate on-page analysis |
| GSC / analytics data | Optional | Not required for basic audit |

If only a URL is provided and no source code or crawler data is available, clearly state:

> **Limitation:** This audit is based on visible page content and publicly available signals only. Source code, GSC data, crawl logs, and performance metrics are not available for this audit.

---

## Output

Produce a **Basic SEO Audit Report** by filling the template at [assets/report-template.html](assets/report-template.html),
then **save it to a file — never print raw HTML to the terminal**.

**File naming:** `reports/<hostname>-<slug>-audit.html`
```
https://example.com/blog/best-tools → reports/example-com-blog-best-tools-audit.html
https://example.com/                → reports/example-com-audit.html
```

**After saving, tell the user:**
```
✅ Report saved → reports/example-com-audit.html
   Open it now? (yes / no)
```
If yes → run: `open reports/example-com-audit.html`

---

**Template placeholders** — fill each independently:

| Placeholder | Content |
|---|---|
| `{{summary_verdict}}` | One sentence: total checks run, how many failed/warned/passed |
| `{{summary_critical_html}}` | `<li>` per critical (fail) item, or `<li class="summary-empty">None</li>` |
| `{{summary_warnings_html}}` | `<li>` per warning item, or `<li class="summary-empty">None</li>` |
| `{{summary_passing_html}}` | `<li>` per passing check, or `<li class="summary-empty">None</li>` |

---

## Scripts

Run these scripts before writing any findings. They output structured JSON — use the JSON directly as evidence; do not re-fetch the same URLs manually.

**Dependencies:** `pip install requests` (html parsing uses Python stdlib)

```bash
# Step 1: site-level checks (robots.txt + sitemap.xml)
python scripts/check-site.py https://example.com

# Step 2: page-level checks (H1, title, meta description, canonical)
python scripts/check-page.py https://example.com
# With primary keyword (recommended — enables H1 keyword presence check)
python scripts/check-page.py https://example.com --keyword "running shoes"

# Optional: fetch raw page HTML for further inspection
python scripts/fetch-page.py https://example.com --output page.html

# Step 3: JSON-LD schema validation
python scripts/check-schema.py https://example.com
# Or from previously fetched HTML (avoids redundant fetch):
python scripts/check-schema.py --file page.html
```

Each script exits with code `0` (all pass/warn) or `1` (any fail/error).

**STRICT SCOPE — do not add any check not listed below. No exceptions.**

Allowed site-level checks (in `{{site_checks_html}}`):
- robots.txt · sitemap.xml · 404 Handling · URL Canonicalization · i18n / hreflang

Allowed E-E-A-T checks (in `{{eeat_checks_html}}`):
- About Us · Contact · Privacy Policy · Terms of Service · Media/Partners (only if present)

Contact logic (Contact row only):
- A dedicated `/contact` page is **not required**
- **Pass** if contact is reachable via any of: dedicated contact page (HTTP 200) · About page with contact details · footer/nav mailto, email, social links, or contact form
- **Fail** only when no contact pathway exists anywhere on the site
- Missing `/contact` alone is **not** a fail when About or footer/nav already expose contact info

Allowed page-level checks (in `{{page_checks_html}}`), output in this exact order:
URL Slug · Title Tag · Meta Description · H1 Tag · Canonical Tag · Image Alt Text · Word Count · Keyword Placement · Heading Structure · Internal Links · Schema (JSON-LD)

  Image Alt Text logic:
  - Parse <img> tags from static HTML
  - Pass: all images have non-empty alt (decorative images with alt="" are OK)
  - Warn: any content image missing alt attribute
  - Unverified (status-info): 0 images found in static HTML → likely JS-rendered, cannot verify

⛔ HARD RULE — Output ONLY the check rows defined in report-template.html.
If a check is not in the allowed lists above, do NOT output it — not even if you find issues.
No exceptions. No "bonus" checks. No improvisation.
The template is the single source of truth. Treat it as a strict whitelist.

Still BANNED (belong to seo-audit-full): OG tags · Twitter Card · Social tags · Page Weight · Core Web Vitals · Robots Meta

**How to use the JSON output:**
- Map each field's `status` → `pass` / `warn` / `fail` / `error` directly to the report check table
- Use each field's `detail` string as the starting point for the Evidence line in findings
- Do not contradict the script output unless you have additional observable evidence
- Separate check groups with `<div class="subsection-label">Label</div>` inside `{{site_checks_html}}`:
  `Crawlability` · `URL Canonicalization` · `i18n / hreflang` · `Schema (JSON-LD)`
  and `<div class="subsection-label">E-E-A-T Trust Pages</div>` before `{{eeat_checks_html}}`

**LLM review — mandatory when `llm_review_required: true`:**

The script flags fields that require semantic or quality judgment it cannot perform.
Never leave `llm_review_required: true` unresolved — always make an explicit judgment call.

**H1 — triggered when `keyword_match == "partial"`:**
```
h1_text : (from h1.values[0])
keyword : (the --keyword passed to the script)

Judge: Does this H1 semantically cover the keyword's search intent?
  - Consider synonyms, natural variants, topic coverage
  - yes → downgrade to "pass", note the variant
  - no  → keep "warn" or upgrade to "fail", explain the gap
```

**Title — triggered when `keyword_match == "partial"` OR `keyword_position != "start"`:**
```
title   : (from title.value)
keyword : (the --keyword passed)

Judge:
  1. Does the title semantically cover the keyword's search intent?
  2. Is the title grammatically correct and naturally readable?
  3. Keyword position — apply different standards by page type:
     - Homepage   : Brand + core keyword is correct (e.g. "Acme | AI Workflow Automation")
                    Do NOT flag brand-first as a problem.
     - Inner pages: Core keyword should lead (e.g. "AI Workflow Automation for Teams — Acme")
                    Flag if keyword is buried mid-title without good reason.

IMPORTANT — do NOT flag these as negatives:
  - Years (e.g. "2026") → signal freshness, increase CTR — treat as positive unless
    the page is explicitly evergreen content where dating would hurt longevity.
  - Numbers (e.g. "5 best", "Top 10", "3 steps") → set clear expectations,
    consistently outperform non-numeric titles in CTR — always treat as a plus.
  - Specific qualifiers ("Open-Source", "Self-Hosted", "Free") → narrow intent
    and attract higher-quality clicks — do not penalize.
```

**URL Slug — triggered when `keyword_match != "full"` or `is_homepage == false`:**
```
slug    : (from url_slug.slug)
keyword : (the --keyword passed)

Judge:
  1. Does the slug contain the primary keyword or a natural variant?
  2. Is the path hierarchy logical? (/category/keyword is ideal)
  3. Is it concise and human-readable?
  Homepage (is_homepage: true): skip — no judgment needed.
```

**Meta Description — always triggered when content is present:**
```
meta_description : (from meta_description.value)
keyword          : (the --keyword passed)

Judge all four:
  1. Complete sentence(s)? (1-2 sentences, no fragments)
  2. Mentions a concrete result — not vague fluff?
     Good: "Cut design time by 60% with AI-powered templates"
     Bad:  "The best tool for all your design needs"
  3. Keyword or natural synonym used once — not stuffed?
  4. More specific than what a typical competitor would write?

IMPORTANT — do NOT flag these as negatives:
  - Years (e.g. "2026") → signal freshness, improve CTR for time-sensitive queries.
    Only note the year if the page is explicitly evergreen content where dating hurts.
  - Numbers (e.g. "5 best", "3 steps") → concrete specificity, strong CTR signal.
  - Trailing "and more." → minor style note at most, never a Warning or Fail.
```

---

## Recommended Workflow

Follow these steps in order:

1. **Acknowledge scope** — confirm this is a basic audit; note any missing data

2. **Infer primary keyword** — fetch the page with `fetch-page.py`, then determine the primary keyword:
   - If the user explicitly provided a keyword → use it directly
   - If not → read the page H1, title, and first paragraph, then infer the single most likely
     target keyword phrase (what would a searcher type to find this page?)
   - State the inferred keyword explicitly before running checks:
     > "Inferred primary keyword: **open source claude alternatives**"

3. **Run `check-site.py`** — parse the JSON output for robots, sitemap, 404 handling, and URL canonicalization

   **404 check:** fetch `<origin>/this-page-definitely-does-not-exist-seo-audit-check`
   - Returns 404 → Pass · Returns 200 (soft 404) → Fail · Returns 301 to homepage → Warn

   **URL Canonicalization checks** (each is a separate sub-check):
   - **HTTP→HTTPS:** fetch `http://<host>` — must 301 to `https://`. Returns 200 → Fail.
   - **www consistency:** fetch both `https://www.<host>` and `https://<host>` — one must 301 to the other. Both return 200 → Warn.
   - **Trailing slash:** compare the URL actually served vs the canonical tag on the page. Mismatch → Warn.
   - **Canonical match:** canonical tag href must exactly match the final URL after all redirects. Mismatch → Warn.

4. **E-E-A-T infrastructure check** — for each trust page below, check two layers:
   - **Layer 1 — Exists:** fetch the URL, check HTTP status (200 = exists, 404/redirect = missing)
   - **Layer 2 — Reachable:** fetch homepage HTML, check if footer or nav contains a link to this page

   | Page | Required |
   |---|---|
   | About Us | Yes |
   | Contact | Yes — see Contact-specific rules below |
   | Privacy Policy | Yes |
   | Terms of Service | Yes |
   | Media / Partners | No — include only if present |

   Status rules (About, Privacy, Terms, Media/Partners):
   - Page missing (non-200) → **Fail**
   - Page exists but not linked in footer/nav → **Warn**
   - Page exists and linked in footer/nav → **Pass**
   - Optional page missing → skip, do not include row

   **Contact-specific rules** — a dedicated `/contact` page is optional:
   1. Try common paths (`/contact`, `/contact-us`) — HTTP 200 counts as Exists
   2. If no contact page, check the About page body for email, social, or contact details
   3. Also scan homepage footer and nav for `mailto:`, visible email, social links, or a contact form
   4. **Exists Pass** if any contact pathway is found (dedicated page, About, or footer/nav contact details)
   5. **Exists Fail** only when no contact information is found anywhere
   6. **Reachable Pass** if a contact page link or contact details appear in footer/nav
   7. **Reachable Warn** if contact is only reachable inside About page content, not directly in footer/nav
   8. Do not recommend creating `/contact` when About or footer already expose contact info — note the existing pathway instead

5. **Run `check-page.py --keyword "<inferred_keyword>"`** — parse the JSON output for H1, title,
   meta description, canonical, and URL slug

6. **i18n / hreflang check** — only run if the page contains hreflang tags or `<html lang>` suggests multi-language:
   - **Skip entirely (N/A)** if no hreflang tags found and site appears single-language
   - If hreflang tags present, check:
     - **Reciprocal symmetry**: every URL referenced must link back to all other variants — any broken link = Fail
     - **Language codes**: must be valid BCP 47 (e.g. `zh-CN` not `zh`, `en-US` not `en-us`) — wrong code = Warn
     - **x-default**: should be present for language-selector or fallback pages — missing = Warn
     - **html[lang] attribute**: must match the primary hreflang of the page — mismatch = Warn
     - **URL structure**: recommended pattern — default language (usually `en`) at root with no prefix,
       other languages under subpaths (`/zh/`, `/es/`).
       - `/page` (en) + `/zh/page` + `/es/page` → Pass
       - `/en/page` + `/zh/page` → Warn (en prefix is redundant, wastes crawl depth)
       - Only flag if the pattern is clearly inconsistent or en is unnecessarily prefixed

7. **Run `check-schema.py`** — parse the JSON output for schema types and field validation

   ```bash
   python scripts/check-schema.py https://example.com
   # Or from previously fetched HTML:
   python scripts/check-schema.py --file page.html
   ```

   The script extracts JSON-LD blocks, validates `@type` and required fields per Schema.org spec.
   `llm_review_required: true` is always set — confirm `inferred_page_type` matches actual page content.

   Page type → expected `@type` reference:

   | Page Type | Expected @type | Min. required fields |
   |---|---|---|
   | Homepage | WebSite + Organization | name, url, logo |
   | Blog / Article | Article or BlogPosting | headline, datePublished, author, image |
   | Product | Product | name, image, offers (price, priceCurrency) |
   | FAQ | FAQPage | mainEntity[].name, acceptedAnswer.text |
   | How-to | HowTo | name, step[].text |
   | Local business | LocalBusiness | name, address, telephone |
   | Generic landing | — | N/A — skip, no widely-supported type |

   - Pass: correct @type present, all required fields valid, no conflicts
   - Warn: @type present but missing recommended fields
   - Fail: expected @type missing entirely
   - N/A: generic landing page — do not penalize

8. **Summarize findings** — each finding must follow the Evidence / Impact / Fix format

9. **Priority actions** — list the top 3 highest-impact fixes

10. **Render report** — save to `reports/<hostname>-<slug>-audit.html`, then ask user to open

11. **Upgrade prompt** — if issues beyond basic scope are found, suggest `seo-audit-full`

---

## Report Detail Writing Rules

**The Detail cell in check tables must follow these rules — no exceptions:**

**Pass → one short phrase. No lists, no elaboration.**
```
Good: "Valid XML urlset · 104 URLs · referenced in robots.txt."
Bad:  "Valid XML urlset with 104 URLs. Correctly referenced in robots.txt.
       Blog posts are likely indexed through this sitemap."
```

**Warn → one `<div class="detail-issue">` with ≤2 bullet points. One `<div class="detail-fix">` with the fix.**
```
Good:
  <div class="detail-issue">· Title 48 chars — 2 below minimum. · Year "2026" will date the page.</div>
  <div class="detail-fix">Expand to 50–60 chars; remove year if evergreen.</div>

Bad: three-sentence prose explaining what a title tag is and why length matters.
```

**Fail → same as Warn. Lead with the exact failure. No background explanations.**

Do NOT explain what a check is, do NOT repeat information already visible in the status badge,
do NOT treat the reader as unfamiliar with SEO basics.

---

## Mandatory Finding Format

Every important finding **must** follow this structure:

```
**Finding: [Finding Title]**

- **Evidence:** [What was observed — direct quote, screenshot ref, or measurable data]
- **Impact:** [Why this matters for SEO or UX]
- **Fix:** [Specific, actionable recommendation]
```

Do not write vague conclusions. If evidence is insufficient, state assumptions explicitly.

---

## Upgrade Prompt

Include this at the end of every basic audit report:

> **Want a deeper analysis?**
> This was a basic SEO audit covering site-level signals and core on-page checks.
> For advanced technical SEO, content quality scoring, structured data analysis, and full crawl-based findings, use the `seo-audit-full` skill.

---

## Reference Files

- Detailed audit scope and field definitions: [references/REFERENCE.md](references/REFERENCE.md)
- Final HTML report template: [assets/report-template.html](assets/report-template.html)
- Site-level check script: [scripts/check-site.py](scripts/check-site.py)
- Page-level check script: [scripts/check-page.py](scripts/check-page.py)
- Raw page fetcher: [scripts/fetch-page.py](scripts/fetch-page.py)
- Schema validation script: [scripts/check-schema.py](scripts/check-schema.py)

Source

Creator's repository · jeffli1993/seo-audit-skill

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk