browser-screenshot

Take focused, region-specific screenshots from web pages. Navigates to the right page based on user context (URL, search query, social media post), locates the target region via DOM selectors, and crops to a clean, focused screenshot.
Skill file

Preview skill file↓↑
---
name: browser-screenshot
description: Take focused, region-specific screenshots from web pages. Navigates to the right page based on user context (URL, search query, social media post), locates the target region via DOM selectors, and crops to a clean, focused screenshot.
---

# Skill: Browser Screenshot

Take focused screenshots of specific regions on web pages — a Reddit post, a tweet, an article section, a chart, etc. — not just a full-page dump.

> **Prerequisite**: agent-browser must be installed and Chrome must have remote debugging enabled. See `references/agent-browser-setup.md` if unsure.

---

## Overview

This skill handles the full pipeline:

1. **Research** the best page to screenshot (web search, fetch)
2. **Navigate** to the right page in the browser
3. **Locate** the target element/region on the page
4. **Capture** a focused, cropped screenshot of just that region

### Hard Rule: No Full-Screen Screenshots

**NEVER output an uncropped full-viewport or full-page screenshot as a final result.** Full screenshots contain too much noise (nav bars, sidebars, ads, unrelated content) and are unsuitable as article illustrations. Every screenshot MUST be cropped to a focused region.

---

## Step 0: Research — Find and Validate Sources Before Opening the Browser

**The browser is for capturing, not for browsing.** Before opening anything in Chrome, use text-based tools (WebSearch, WebFetch) to find candidate pages, read their content, and decide which ones are actually worth screenshotting.

### Research-First Workflow

1. **WebSearch** to find candidate pages for the topic
2. **WebFetch** each candidate to read its text content — check if it has the information/visual you need
3. **Evaluate**: Is this page worth a screenshot? Does it have a clear, focused region that would work as an illustration?
4. **Only then** open the browser to capture the screenshot

This saves significant time — most candidate pages won't be worth screenshotting, and you can eliminate them without the overhead of browser navigation.

### When to Use Browser-First Instead

Skip the WebSearch/WebFetch phase and go directly to Chrome browsing when:

- **The target platform requires login** — Reddit, LinkedIn, X/Twitter, and other social platforms often gate content behind login walls. If the user's Chrome session is already logged in, use the browser directly.
- **The user specifies a platform with a clear search need** — e.g., "find a Reddit post about X" or "screenshot a tweet about Y". Go straight to the platform's search in Chrome.
- **WebFetch returns blocked/incomplete content** — some sites aggressively block non-browser requests. If you get a 403, a CAPTCHA page, or stripped content, switch to Chrome.

In these cases, Chrome browsing replaces WebSearch — navigate to the platform's search page, browse results, and evaluate pages visually before deciding what to screenshot.

### Page Selection Strategy

The right page depends on the context of the article and how recent/notable the subject is:

| Subject Type | Best Page to Find | How to Find It |
|--------------|-------------------|----------------|
| **New model/feature launch** (< 6 months) | Official blog post announcing it | WebSearch `"<model name>" site:<vendor-domain> blog` |
| **Established product** (> 6 months) | Product landing page or docs overview | WebSearch `"<model name>" official page` |
| **Open-source model** | HuggingFace model card or GitHub repo | Direct URL: `huggingface.co/<org>/<model>` |
| **API service** | API documentation page | WebSearch `"<service name>" API docs` |

> **Note**: This table lists common subject types but is **not exhaustive**. Apply the same research-first strategy to any subject type — find the most authoritative and visually clean source page for the topic at hand.

### What Makes a Good Screenshot Source

**Core principle: Less is more. Focus on content, not chrome.**

A good screenshot source contains a **focused, self-contained piece of information** — a paragraph of text, a key quote, a data table, a diagram. It should NOT be a busy page full of buttons, navigation, sidebars, and interactive elements.

- **Prefer**: A section of a blog post with a clear heading and 1-2 paragraphs of text. A single chart or diagram. A model card header with name and description. A quote or key finding.
- **Avoid**: Full landing pages with CTAs and navigation. Dashboard views with multiple panels. Pages dominated by UI controls (buttons, dropdowns, forms) rather than readable content.
- **Official blog posts** are ideal: they have hero images, prominent titles, and concise descriptions designed for sharing
- **Product landing pages** can work but only if you crop to the hero section — ignore the rest
- **HuggingFace model cards** are reliable for open-source models: consistent layout, model name + description always at top
- **API docs** are acceptable fallback: show the product name and key specs

> **Rule of thumb**: If the region you plan to capture contains more interactive UI elements (buttons, links, nav items) than readable text content, it's a bad crop. Find a more content-rich region, or pick a different page entirely.

### Pre-Flight URL Validation

Before opening in the browser, validate URLs with WebFetch (lightweight HEAD/GET) to avoid wasting time on 404s or redirects:

```
WebFetch: <candidate-url>
→ Check status code, title, and content snippet
→ If 404 or redirect to unrelated page, try next candidate
```

### Region Selection Strategy

Think about **what the article reader needs to see** in this screenshot:

| Article Context | What to Capture | Target Region |
|-----------------|----------------|---------------|
| Introducing a model in a lineup | Model name + key tagline/description | Blog hero section or HF model card header |
| Comparing capabilities | Feature highlights or spec table | Blog section showing specs/features |
| Discussing a specific feature | The feature description | Relevant section heading + 1-2 paragraphs |
| Showing a product/service | Brand identity + value prop | Landing page hero (title + subtitle + visual) |

The screenshot should make the reader think "ah, that's what this model/product is" — not "what am I looking at?"

---

## Step 1: Navigate to the Target Page

### Always Start by Listing Tabs

```bash
agent-browser --auto-connect tab list
```

Check if the page is already open. Reuse existing tabs — they have login sessions and correct state.

### Navigation by Input Type

| User Provides | Strategy |
|---------------|----------|
| Direct URL | `agent-browser --auto-connect open <url>` |
| Search query | `open https://www.google.com/search?q=<encoded-query>` → find and click the best result |
| Platform + topic | Construct platform search URL (see below) → locate target content |
| Vague description | Google search → evaluate results → navigate to best match |

### Platform-Specific Search URLs

| Platform | Search URL Pattern |
|----------|-------------------|
| Reddit | `https://www.reddit.com/search/?q=<query>` |
| X / Twitter | `https://x.com/search?q=<query>` |
| LinkedIn | `https://www.linkedin.com/search/results/content/?keywords=<query>` |
| Hacker News | `https://hn.algolia.com/?q=<query>` |
| GitHub | `https://github.com/search?q=<query>` |
| YouTube | `https://www.youtube.com/results?search_query=<query>` |

### Wait for Page Load

After navigation, wait for content to settle:

```bash
agent-browser --auto-connect wait --load networkidle
```

> **Note**: Some sites (Reddit, X, LinkedIn) never reach `networkidle`. If `open` already shows the page title in its output, skip the wait. Use `wait 2000` as a safe alternative.

---

## Step 2: Locate the Target Region

This is the critical step. The goal is to find a **CSS selector** that precisely wraps the content to capture.

### Primary Method: DOM Selector Discovery

1. **Take an annotated screenshot** to understand the page layout:
   ```bash
   agent-browser --auto-connect screenshot --annotate
   ```

2. **Take a snapshot** to see the page's accessibility tree:
   ```bash
   agent-browser --auto-connect snapshot -i
   ```

3. **Identify the target container element**. Look for:
   - Semantic HTML containers: `<article>`, `<main>`, `<section>`
   - Platform-specific components (see [Platform Selectors](#platform-selectors))
   - Data attributes: `[data-testid="..."]`, `[data-id="..."]`

4. **Verify with `get box`** to confirm the element has a reasonable bounding box:
   ```bash
   agent-browser --auto-connect get box "<selector>"
   ```
   This returns `{ x, y, width, height }`. Sanity-check:
   - Width should be > 100px and < viewport width
   - Height should be > 50px
   - If the box is the entire page, the selector is too broad — refine it

5. **If the selector is hard to find**, use `eval` to explore the DOM:
   ```bash
   agent-browser --auto-connect eval "document.querySelector('article')?.getBoundingClientRect()"
   ```

### Platform Selectors

Common container selectors for popular platforms:

| Platform | Target | Typical Selector |
|----------|--------|-----------------|
| Reddit | A post | `shreddit-post`, `[data-testid="post-container"]` |
| X / Twitter | A tweet | `article[data-testid="tweet"]` |
| LinkedIn | A feed post | `.feed-shared-update-v2` |
| Hacker News | A story + comments | `#hnmain .fatitem` |
| GitHub | A repo card | `[data-hpc]`, `.repository-content` |
| YouTube | Video player area | `#player-container-outer` |
| Generic article | Main content | `article`, `main`, `[role="main"]`, `.post-content`, `.article-body` |

> These selectors may change over time. Always verify with `get box` before using.

### Multiple Matching Elements

If the selector matches multiple elements (e.g., multiple tweets on a timeline), narrow it down:

```bash
# Count matches
agent-browser --auto-connect get count "article[data-testid='tweet']"

# Use nth-child or :first-of-type, or a more specific selector
# Or use eval to find the right one by text content:
agent-browser --auto-connect eval --stdin <<'EOF'
const posts = document.querySelectorAll('article[data-testid="tweet"]');
for (let i = 0; i < posts.length; i++) {
  const text = posts[i].textContent.substring(0, 80);
  console.log(i, text);
}
EOF
```

Then target a specific one using `:nth-of-type(N)` or a unique parent selector.

---

## Step 3: Capture the Focused Screenshot

### Method A: Scroll + Viewport Screenshot (Preferred for Viewport-Sized Targets)

Best when the target element fits within the viewport.

```bash
# Scroll the target into view
agent-browser --auto-connect scrollintoview "<selector>"
agent-browser --auto-connect wait 500

# Take viewport screenshot
agent-browser --auto-connect screenshot /tmp/browser-screenshot-raw.png
```

Then crop using the bounding box (see [Cropping](#cropping)).

### Method B: Full-Page Screenshot + Crop (For Any Size Target)

Best when the target might be larger than the viewport or when precise cropping is needed.

```bash
# Take full-page screenshot
agent-browser --auto-connect screenshot --full /tmp/browser-screenshot-full.png

# Get the target element's bounding box
agent-browser --auto-connect get box "<selector>"
# Output: { x: 200, y: 450, width: 680, height: 520 }
```

Then crop (see [Cropping](#cropping)).

### Cropping

Use ImageMagick (`magick` on IMv7, `convert` is deprecated) to crop the screenshot to the target region. Add padding for visual breathing room.

#### Retina Display Handling

**Critical**: On macOS Retina displays, screenshots are captured at 2x resolution. A 1728x940 viewport produces a 3456x1880 image. You MUST account for this:

1. **Detect the scale factor**: Compare viewport size vs actual image dimensions:
   ```bash
   # Check actual image dimensions
   magick identify /tmp/screenshot.png
   # → 3456x1880 means 2x scale on a 1728x940 viewport
   ```

2. **Multiply `get box` coordinates by the scale factor** before cropping:
   ```bash
   # get box returns viewport coordinates: { x: 200, y: 450, width: 680, height: 520 }
   # For 2x Retina, actual image coordinates are:
   SCALE=2
   X=$((200 * SCALE))
   Y=$((450 * SCALE))
   W=$((680 * SCALE))
   H=$((520 * SCALE))
   PADDING=$((16 * SCALE))
   ```

#### Crop Command

```bash
magick /tmp/browser-screenshot-full.png \
  -crop $((W + PADDING*2))x$((H + PADDING*2))+$((X - PADDING))+$((Y - PADDING)) \
  +repage \
  <output-path>.png
```

> **Important**: `get box` returns floating-point values. Round them to integers before passing to ImageMagick.

> **Padding**: Use 12–20px (viewport px). Increase to ~30px if the target has a distinct visual boundary (card, bordered box). Use 0 if the user wants a tight crop.

### Output Path

- If the user specifies an output path, use that
- Otherwise, save to a descriptive name in the current directory, e.g., `reddit-post-screenshot.png`, `tweet-screenshot.png`

---

## Step 4: Verify the Result

After cropping, **read the output image** to verify it captured the right content:

```bash
# Use the Read tool to visually inspect the cropped screenshot
```

If the crop is wrong (missed content, too much whitespace, wrong element), adjust the selector or bounding box and retry.

---

## Fallback: Visual Highlight Confirmation

When DOM-based location is uncertain — the selector might be wrong, multiple candidates exist, or the target is ambiguous — use **JS-injected highlighting** to visually confirm before cropping.

### How It Works

1. **Inject a highlight border** on the candidate element:
   ```bash
   agent-browser --auto-connect eval --stdin <<'EOF'
   (function() {
     const el = document.querySelector('<selector>');
     if (!el) { console.log('NOT_FOUND'); return; }
     el.style.outline = '4px solid red';
     el.style.outlineOffset = '2px';
     el.scrollIntoView({ block: 'center' });
   })();
   EOF
   ```

2. **Take a screenshot** and visually inspect:
   ```bash
   agent-browser --auto-connect screenshot /tmp/highlight-check.png
   ```
   Read the screenshot to check if the red border surrounds the correct content.

3. **If correct**, remove the highlight and proceed with cropping:
   ```bash
   agent-browser --auto-connect eval "document.querySelector('<selector>').style.outline = ''; document.querySelector('<selector>').style.outlineOffset = '';"
   ```

4. **If wrong**, try the next candidate or refine the selector, re-highlight, and re-check.

### When to Use This Fallback

- The page has complex/nested components and you're not sure which container is right
- Multiple similar elements exist and you need to pick the correct one
- The user's description is vague ("that chart in the middle of the page")
- The `get box` result looks suspicious (too large, too small, zero-sized)

---

## Page Preparation: Clean Up Before Capture

Before taking the final screenshot, clean up the page for a better result:

```bash
# Dismiss cookie banners, popups, overlays
agent-browser --auto-connect eval --stdin <<'EOF'
(function() {
  // Common cookie/popup selectors
  const selectors = [
    '[class*="cookie"] button',
    '[class*="consent"] button',
    '[class*="banner"] [class*="close"]',
    '[class*="modal"] [class*="close"]',
    '[class*="popup"] [class*="close"]',
    '[aria-label="Close"]',
    '[data-testid="close"]'
  ];
  selectors.forEach(sel => {
    document.querySelectorAll(sel).forEach(el => {
      if (el.offsetParent !== null) el.click();
    });
  });

  // Hide fixed/sticky elements that overlay content (nav bars, banners)
  document.querySelectorAll('*').forEach(el => {
    const style = getComputedStyle(el);
    if ((style.position === 'fixed' || style.position === 'sticky') && el.tagName !== 'HTML' && el.tagName !== 'BODY') {
      el.style.display = 'none';
    }
  });
})();
EOF
```

> **Use with caution**: Hiding fixed elements might remove important context. Only run this when overlays visibly obstruct the target region.

### Cookie Banners That Won't Dismiss

Some cookie consent banners (e.g., Jina AI's Usercentrics) live in shadow DOM or iframes and cannot be dismissed via JS `click()` or `remove()`. Don't waste time with multiple JS attempts. Instead:

1. **Crop it out** — if the banner is at the top or bottom, simply adjust the crop region to exclude it. This is the fastest and most reliable approach.
2. **Scroll past it** — scroll the target content away from the banner area before capturing.

---

## Viewport Sizing

For consistent, high-quality screenshots, set the viewport before capturing:

```bash
# Standard desktop viewport
agent-browser --auto-connect set viewport 1280 800

# Wider for dashboard/data-heavy pages
agent-browser --auto-connect set viewport 1440 900

# Narrower for mobile-like content (social media posts)
agent-browser --auto-connect set viewport 800 600
```

Choose a viewport width that makes the target content render cleanly — not too cramped, not too stretched.

---

## Complete Example: Screenshot a Reddit Post

User: "Screenshot the top post on r/programming"

```bash
# 1. List existing tabs
agent-browser --auto-connect tab list

# 2. Navigate to subreddit
agent-browser --auto-connect open https://www.reddit.com/r/programming/
agent-browser --auto-connect wait 2000

# 3. Find the first post container
agent-browser --auto-connect eval "document.querySelector('shreddit-post')?.getBoundingClientRect()"

# 4. Scroll it into view
agent-browser --auto-connect scrollintoview "shreddit-post"
agent-browser --auto-connect wait 500

# 5. Get bounding box
agent-browser --auto-connect get box "shreddit-post"
# → { x: 312, y: 80, width: 656, height: 420 }

# 6. Take full-page screenshot
agent-browser --auto-connect screenshot --full /tmp/reddit-raw.png

# 7. Crop with padding
convert /tmp/reddit-raw.png \
  -crop 688x452+296+64 +repage \
  reddit-post-screenshot.png

# 8. Verify by reading the output image
```

---

## Key Commands Quick Reference

| Command | Purpose |
|---------|---------|
| `tab list` | List open tabs |
| `open <url>` | Navigate to URL |
| `wait 2000` | Wait for content to settle |
| `snapshot -i` | See interactive elements |
| `screenshot --annotate` | Visual overview with labels |
| `screenshot --full <path>` | Full-page screenshot |
| `get box "<selector>"` | Get element bounding box |
| `scrollintoview "<sel>"` | Scroll element into view |
| `eval <js>` | Run JavaScript in page |
| `set viewport <w> <h>` | Set viewport dimensions |

---

## Troubleshooting

### `get box` returns null or zero-sized
- The selector doesn't match any element. Use `get count "<selector>"` to verify.
- The element may be hidden or not yet rendered. Try `wait 2000` and retry.

### Cropped image is blank or wrong area
- The full-page screenshot coordinates may differ from viewport coordinates. Use `screenshot --full` with `get box` (they use the same coordinate system).
- Check if the page has horizontal scroll — `get box` x values may be offset.

### Target element is inside an iframe
- `get box` and `snapshot -i` cannot see inside iframes.
- Use `eval` to access iframe content:
  ```bash
  agent-browser --auto-connect eval "document.querySelector('iframe').contentDocument.querySelector('<sel>').getBoundingClientRect()"
  ```
  Note: Only works for same-origin iframes.

### `open` succeeded but page content is wrong
- The browser may have switched to a different tab (e.g., a popup or redirect opened a new tab). Always verify after navigation:
  ```bash
  agent-browser --auto-connect eval "document.location.href"
  ```
- If the URL is wrong, use `tab list` to find the correct tab and `tab goto <N>` to switch.

### Screenshot command times out on fonts
- Some pages (e.g., Google developer docs) hang on `document.fonts.ready`. Force-resolve it first:
  ```bash
  agent-browser --auto-connect eval "document.fonts.ready.then(() => 'ok')"
  ```
  Then retry the screenshot.

### Page has lazy-loaded content
- Scroll down to trigger loading before taking the screenshot:
  ```bash
  agent-browser --auto-connect scroll down 1000
  agent-browser --auto-connect wait 1500
  agent-browser --auto-connect scroll up 1000
  ```
Source

Creator's repository · zc277584121/marketing-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk