image-to-text

Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.

Skill file

Preview skill file
---
name: image-to-text
description: Extract text from images using OCR. Use when the user shares a screenshot and you need to read the text content, copy UI labels, or extract copy from a design mockup.
metadata:
  author: pascalorg
  version: "1.0.0"
---

# Image to Text

Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.

## When to Use

- Reading text content from a screenshot or design mockup
- Extracting UI copy (labels, buttons, headings) so you don't have to retype it
- Getting text positions and bounding boxes from a design image

## How It Works

1. The image is passed to Tesseract.js for optical character recognition
2. Tesseract segments the image into lines and words
3. Returns the full text plus word-level details (position, confidence)

## Usage

```bash
bash <skill-path>/scripts/image-to-text.sh <image-path> [language]
```

**Arguments:**
- `image-path` — Path to the image file (required)
- `language` — OCR language code (optional, defaults to `eng`). Common: `eng`, `fra`, `deu`, `spa`, `chi_sim`, `jpn`

**Examples:**

```bash
# Extract text from a screenshot
bash <skill-path>/scripts/image-to-text.sh ./screenshot.png

# Extract French text
bash <skill-path>/scripts/image-to-text.sh ./mockup.png fra
```

## Output

```json
{
  "text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",
  "confidence": 87.4,
  "words": [
    {
      "text": "Request",
      "confidence": 94.2,
      "bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }
    },
    {
      "text": "work",
      "confidence": 96.1,
      "bbox": { "x0": 274, "y0": 180, "x1": 332, "y1": 204 }
    }
  ],
  "lines": [
    {
      "text": "Request work",
      "confidence": 95.1,
      "bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }
    }
  ]
}
```

| Field      | Type   | Description                                      |
|------------|--------|--------------------------------------------------|
| text       | String | Full extracted text, newline-separated            |
| confidence | Number | Overall confidence score (0-100)                  |
| words      | Array  | Each word with text, confidence, and bounding box |
| lines      | Array  | Each line with text, confidence, and bounding box |

## Present Results to User

After extracting text, present the content grouped by lines:

```
Extracted text (87.4% confidence):

  Request work
  Suggestions
  Plumbing
  HVAC
  Cleaning
  Electrical

Found 6 lines, 6 words.
```

Use the extracted text directly when implementing UI copy from a design.

## Troubleshooting

**Low confidence / garbled text** — Tesseract works best with clean, high-contrast text. Screenshots of rendered UI work well. Photos of text at angles or with noise may produce poor results.

**Wrong language** — Pass the correct language code as the second argument. Tesseract needs the right language model to recognize characters.

**First run is slow** — Tesseract downloads language data (~4MB for English) on the first run. Subsequent runs are faster.

Source

Creator's repository · pascalorg/skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk