firecrawl-knowledge-base

Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic corpora, or LLM-ready markdown organized from web sources.

Skill file

Preview skill file
---
name: firecrawl-knowledge-base
description: Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic corpora, or LLM-ready markdown organized from web sources.
license: ISC
metadata:
  author: firecrawl
  version: "0.1.0"
  homepage: https://www.firecrawl.dev
  source: https://github.com/firecrawl/firecrawl-workflows
inputs:
  - name: FIRECRAWL_API_KEY
    description: Firecrawl API key for hosted Firecrawl requests.
    required: true
---

# Firecrawl Knowledge Base

Use this to turn URLs or topics into organized LLM-ready content.

## Onboarding Interview

Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.

## Firecrawl Collection Plan

Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.

For files, follow the Firecrawl download-style convention:

```text
.firecrawl/
  <hostname>/
    <path>/
      index.md
```

## Parallel Work

If appropriate, use sub-agents or equivalent parallel task runners:

- one docs section per researcher
- official docs, tutorials, community discussions, and references by source type
- source scraping vs chunk generation vs manifest generation

## Output Modes

- Reference: markdown files, `index.md`, and `sources.json`.
- RAG: markdown files plus chunk files and `manifest.json`.
- Training: scraped source files plus `training-data.jsonl` and `training-metadata.json`.
- Docs mirror: complete markdown mirror with a table of contents.

## Final Deliverable

```markdown
# Knowledge Base: [Source]

## Summary
[What was collected and why]

## Output Structure
[Files/directories created]

## Coverage
[Sections, source types, counts]

## Usage Notes
[How to use in RAG, docs, training, or agent context]

## Sources
[URLs collected]

## Rerun Inputs
workflow: firecrawl-knowledge-base
source: [url/topic]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]
```

## Quality Bar

- Preserve code examples and formatting.
- Remove boilerplate navigation where possible.
- Include source URLs in frontmatter or metadata.

Source

Creator's repository · firecrawl/firecrawl-workflows

View on GitHub

License: ISC

Security

Verified — safe to install
Passed all 3 independent security checks
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?NoSAFE · Gen Agent Trust Hub
Does it sneak in hidden code?NoNo alerts · Socket
Does it have known bugs?NoMed risk · Snyk