image-to-text

Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.

Skill file

Preview skill file↓↑

---
name: image-to-text
description: Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.
metadata:
  author: pascalorg
  version: "1.0.0"
---

# Image to Text

Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.

## When to Use

- Reading text content from a screenshot or design mockup
- Extracting UI copy (labels, buttons, headings) so you don't have to retype it
- Getting text positions and bounding boxes from a design image

## How It Works

1. The image is passed to Tesseract.js for optical character recognition
2. Tesseract segments the image into lines and words
3. Returns the full text plus word-level details (position, confidence)

## Usage

```bash
bash <skill-path>/scripts/image-to-text.sh <image-path> [language]
```

**Arguments:**
- `image-path` — Path to the image file (required)
- `language` — OCR language code (optional, defaults to `eng`). Common: `eng`, `fra`, `deu`, `spa`, `chi_sim`, `jpn`

**Examples:**

```bash
# Extract text from a screenshot
bash <skill-path>/scripts/image-to-text.sh ./screenshot.png

# Extract French text
bash <skill-path>/scripts/image-to-text.sh ./mockup.png fra
```

## Output

```json
{
  "text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",
  "confidence": 87.4,
  "words": [
    {
      "text": "Request",
      "confidence": 94.2,
      "bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }
    },
    {
      "text": "work",
      "confidence": 96.1,
      "bbox": { "x0": 274, "y0": 180, "x1": 332, "y1": 204 }
    }
  ],
  "lines": [
    {
      "text": "Request work",
      "confidence": 95.1,
      "bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }
    }
  ]
}
```

| Field      | Type   | Description                                      |
|------------|--------|--------------------------------------------------|
| text       | String | Full extracted text, newline-separated            |
| confidence | Number | Overall confidence score (0-100)                  |
| words      | Array  | Each word with text, confidence, and bounding box |
| lines      | Array  | Each line with text, confidence, and bounding box |

## Present Results to User

After extracting text, present the content grouped by lines:

```
Extracted text (87.4% confidence):

  Request work
  Suggestions
  Plumbing
  HVAC
  Cleaning
  Electrical

Found 6 lines, 6 words.
```

Use the extracted text directly when implementing UI copy from a design.

## Troubleshooting

**Low confidence / garbled text** — Tesseract works best with clean, high-contrast text. Screenshots of rendered UI work well. Photos of text at angles or with noise may produce poor results.

**Wrong language** — Pass the correct language code as the second argument. Tesseract needs the right language model to recognize characters.

**First run is slow** — Tesseract downloads language data (~4MB for English) on the first run. Subsequent runs are faster.

Source

Creator's repository · pascalorg/skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk