agent-reach-internet-access

Give AI agents eyes to see the internet — scrape Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu with zero API fees

Skill file

Preview skill file
---
name: agent-reach-internet-access
description: Give AI agents eyes to see the internet — scrape Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu with zero API fees
triggers:
  - search twitter for mentions
  - get youtube video transcript
  - scrape this reddit thread
  - read this xiaohongshu post
  - search github repositories
  - get bilibili video subtitles
  - read this web page content
  - search the internet for
---

# Agent Reach — Internet Access for AI Agents

> Skill by [ara.so](https://ara.so) — AI Agent Skills collection.

Agent Reach is a scaffolding tool that gives AI agents the ability to read and search across Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu, and more — all without paid APIs. It orchestrates best-in-class upstream tools (yt-dlp, twitter-cli, rdt-cli, gh CLI, etc.) and provides a unified interface for AI agents.

## Installation

Agent Reach is installed via pip and automatically sets up dependencies:

```bash
# Basic installation
pip install agent-reach

# The tool will auto-detect and install:
# - Node.js (for some MCP servers)
# - gh CLI (for GitHub)
# - mcporter (for MCP integrations)
# - twitter-cli (for Twitter/X)
# - rdt-cli (for Reddit)
# - yt-dlp (for YouTube/Bilibili)
```

After installation, run diagnostics to check what's working:

```bash
agent-reach doctor
```

This shows status for each channel: ✅ (works out of box), 🔧 (needs config), or ❌ (not available).

## Core Capabilities

### 1. Web Page Reading (Jina Reader)

Read any web page as clean markdown:

```bash
# Read a web page
curl https://r.jina.ai/https://example.com

# Get JSON format
curl https://r.jina.ai/https://example.com \
  -H "Accept: application/json"

# With images
curl https://r.jina.ai/https://example.com \
  -H "X-With-Images-Summary: true"
```

**Python usage:**

```python
import requests

url = "https://example.com"
response = requests.get(f"https://r.jina.ai/{url}")
markdown_content = response.text

# With options
headers = {
    "X-With-Links-Summary": "true",
    "X-With-Images-Summary": "true"
}
response = requests.get(f"https://r.jina.ai/{url}", headers=headers)
```

### 2. YouTube & Video (yt-dlp)

Extract subtitles, metadata, and search videos:

```bash
# Get video metadata + subtitles
yt-dlp --dump-json --write-auto-subs --skip-download \
  "https://www.youtube.com/watch?v=VIDEO_ID"

# Search YouTube
yt-dlp "ytsearch5:AI agents tutorial" --dump-json

# Get specific subtitle language
yt-dlp --write-subs --sub-lang en --skip-download URL

# Bilibili videos (works same way)
yt-dlp --dump-json "https://www.bilibili.com/video/BV..."
```

**Python usage:**

```python
import subprocess
import json

def get_video_info(url):
    result = subprocess.run(
        ["yt-dlp", "--dump-json", "--write-auto-subs", 
         "--skip-download", url],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

# Search videos
def search_youtube(query, max_results=5):
    result = subprocess.run(
        ["yt-dlp", f"ytsearch{max_results}:{query}", "--dump-json"],
        capture_output=True, text=True
    )
    return [json.loads(line) for line in result.stdout.strip().split('\n')]
```

### 3. Twitter/X (twitter-cli)

Requires Cookie authentication. Export cookies using [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm) Chrome extension.

```bash
# Configure (paste exported cookies when prompted)
twitter configure

# Read a tweet
twitter tweet https://twitter.com/user/status/123456789

# Search tweets
twitter search "AI agents" --limit 20

# Get user timeline
twitter timeline @username --limit 50

# Get tweet thread
twitter thread https://twitter.com/user/status/123456789
```

**Configuration file location:** `~/.twitter-cli/config.json`

### 4. Reddit (rdt-cli)

Requires Cookie authentication:

```bash
# Login with cookies
rdt login

# Search posts
rdt search "machine learning" --limit 20

# Read post with comments
rdt post https://reddit.com/r/programming/comments/...

# Get subreddit posts
rdt subreddit r/python --limit 30
```

### 5. GitHub (gh CLI)

```bash
# Login (opens browser OAuth flow)
gh auth login

# View repository
gh repo view owner/repo

# Search repositories
gh search repos "LLM framework" --limit 20

# Search issues
gh search issues "bug" --repo owner/repo

# View issue
gh issue view 123 --repo owner/repo

# Create issue
gh issue create --repo owner/repo \
  --title "Bug report" --body "Description"
```

**Python usage:**

```python
import subprocess
import json

def search_repos(query, limit=20):
    result = subprocess.run(
        ["gh", "search", "repos", query, 
         "--limit", str(limit), "--json", "name,description,url"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

def get_repo_info(owner_repo):
    result = subprocess.run(
        ["gh", "repo", "view", owner_repo, "--json", 
         "description,stargazerCount,forkCount,url"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)
```

### 6. XiaoHongShu (xhs-cli via mcporter)

Requires Cookie authentication:

```bash
# Configure (sets up MCP server)
mcporter add xiaohongshu

# The MCP server provides these tools:
# - search_notes: Search XHS posts
# - get_note_detail: Get post content
# - post_note: Create new post
# - comment_note: Add comment
# - like_note: Like a post
```

Configuration stored in: `~/.mcporter/xiaohongshu/config.json`

### 7. Bilibili Enhanced (bili-cli)

```bash
# Get hot videos
bili hot --limit 20

# Search videos
bili search "Python tutorial" --limit 30

# Get video info
bili video BV1xx411c7mD

# Get user dynamics
bili user-dynamic 123456
```

### 8. Internet Search (Exa via mcporter)

Semantic search across the web:

```bash
# Add Exa MCP server (no API key needed for basic use)
mcporter add exa

# The MCP server provides:
# - search: AI-powered semantic search
# - find_similar: Find similar pages
# - get_contents: Extract page contents
```

**For advanced features, set API key:**

```bash
export EXA_API_KEY=your_key_here
```

### 9. RSS Feeds

```python
import feedparser

# Parse RSS feed
feed = feedparser.parse("https://example.com/feed.xml")

for entry in feed.entries:
    print(f"Title: {entry.title}")
    print(f"Link: {entry.link}")
    print(f"Published: {entry.published}")
    print(f"Summary: {entry.summary}")
    print("---")
```

### 10. WeChat Official Accounts

Search and read WeChat articles via Exa + Camoufox:

```python
# Use Exa search to find WeChat articles
# Articles are auto-extracted when URLs contain mp.weixin.qq.com
```

### 11. Weibo (微博)

```bash
# Search content
agent-reach weibo search "AI" --type content

# Get hot search
agent-reach weibo hot

# Get user posts
agent-reach weibo user USER_ID

# Get comments
agent-reach weibo comments POST_ID
```

### 12. V2EX

```bash
# Get hot topics
agent-reach v2ex hot

# Get node topics
agent-reach v2ex node python

# Get topic details
agent-reach v2ex topic 123456
```

## Configuration Patterns

### Cookie-Based Services

For Twitter, Reddit, XiaoHongShu — use Cookie-Editor:

1. Login to the service in browser
2. Install [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm)
3. Click extension → Export → Copy
4. Paste into CLI config command

**Never commit cookies to version control.** They're stored in:
- Twitter: `~/.twitter-cli/config.json`
- Reddit: `~/.rdt-cli/cookies.json`
- XHS: `~/.mcporter/xiaohongshu/config.json`

### Proxy Configuration (Server Deployments)

For Bilibili access from servers:

```bash
# Set proxy environment variables
export HTTP_PROXY=http://proxy-server:port
export HTTPS_PROXY=http://proxy-server:port

# Or configure per-tool
yt-dlp --proxy http://proxy-server:port URL
```

### GitHub Authentication

```bash
# OAuth login (recommended)
gh auth login

# Or use token
export GITHUB_TOKEN=ghp_your_token_here
gh auth login --with-token <<< $GITHUB_TOKEN
```

## Common Workflows

### Scrape Twitter Thread for Research

```python
import subprocess
import json

def get_twitter_thread(url):
    result = subprocess.run(
        ["twitter", "thread", url],
        capture_output=True, text=True
    )
    return result.stdout

thread_content = get_twitter_thread(
    "https://twitter.com/user/status/123456789"
)
```

### Extract YouTube Video Summary

```python
import subprocess
import json

def get_video_transcript(url):
    # Get metadata + subtitles
    result = subprocess.run(
        ["yt-dlp", "--dump-json", "--write-auto-subs", 
         "--skip-download", url],
        capture_output=True, text=True
    )
    data = json.loads(result.stdout)
    
    # Subtitles are in data['subtitles'] or data['automatic_captions']
    return {
        'title': data.get('title'),
        'description': data.get('description'),
        'duration': data.get('duration'),
        'subtitles': data.get('automatic_captions', {})
    }
```

### Search GitHub for Solutions

```python
import subprocess
import json

def search_github_issues(query, repo=None):
    cmd = ["gh", "search", "issues", query, 
           "--limit", "20", "--json", 
           "title,url,state,body,comments"]
    if repo:
        cmd.extend(["--repo", repo])
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

# Search across all repos
issues = search_github_issues("memory leak in agents")

# Search specific repo
issues = search_github_issues("bug", repo="openai/gpt-4")
```

### Monitor Reddit for Mentions

```bash
# Search and save results
rdt search "your_product_name" --limit 50 > mentions.txt

# Get specific subreddit
rdt subreddit r/artificial --limit 100
```

### Read Web Page Content for Analysis

```python
import requests

def get_clean_content(url):
    response = requests.get(
        f"https://r.jina.ai/{url}",
        headers={
            "X-With-Links-Summary": "true",
            "X-No-Cache": "true"
        }
    )
    return response.text

content = get_clean_content("https://news.ycombinator.com")
```

## Troubleshooting

### Doctor Command Shows ❌

Run diagnostics:

```bash
agent-reach doctor
```

Each ❌ includes a fix suggestion. Common issues:

**Twitter/Reddit not working:**
- Need Cookie authentication
- Use Cookie-Editor to export cookies
- Run `twitter configure` or `rdt login`

**Bilibili 403 on server:**
- Need proxy for non-CN IPs
- Set `HTTP_PROXY` and `HTTPS_PROXY` env vars

**GitHub rate limited:**
- Authenticate: `gh auth login`
- Authenticated rate: 5,000/hour vs 60/hour

**yt-dlp fails:**
- Update to latest: `pip install -U yt-dlp`
- Tool is actively maintained, updates frequently

### MCP Server Connection Issues

```bash
# Check mcporter status
mcporter list

# Restart a server
mcporter restart xiaohongshu

# View server logs
mcporter logs exa
```

### Proxy Not Working

```bash
# Test proxy connection
curl -x http://proxy:port https://api.bilibili.com

# Set for specific command
export HTTPS_PROXY=http://proxy:port
yt-dlp URL
```

### Cookie Expired

Re-export fresh cookies:
1. Login to service in browser
2. Export with Cookie-Editor
3. Reconfigure CLI tool

## Environment Variables

```bash
# Proxy (for server deployments)
export HTTP_PROXY=http://proxy:port
export HTTPS_PROXY=http://proxy:port

# GitHub
export GITHUB_TOKEN=ghp_xxxxx

# Exa (optional, for advanced features)
export EXA_API_KEY=your_key_here

# Custom config paths (optional)
export AGENT_REACH_CONFIG_DIR=~/.config/agent-reach
```

## Safety & Privacy

- **All cookies stored locally** in `~/.twitter-cli/`, `~/.rdt-cli/`, etc.
- **No data uploaded** to Agent Reach servers (there are none)
- **Code is open source** — audit anytime
- Use `--safe` mode during install to review system package installs

## Updating

```bash
# Update agent-reach
pip install -U agent-reach

# Update individual tools
pip install -U yt-dlp
gh extension upgrade --all
npm update -g mcporter
```

Check for breaking changes: https://github.com/Panniantong/agent-reach/blob/main/CHANGELOG.md

## Platform Support Matrix

| Platform | Out of Box | After Config | Notes |
|----------|-----------|--------------|-------|
| Web | ✅ | — | Jina Reader, no limits |
| YouTube | ✅ | — | yt-dlp, 1800+ sites |
| RSS | ✅ | — | feedparser |
| GitHub | ✅ | 🔧 Auth for private | gh CLI |
| Twitter | 🔧 Cookie | 🔧 Cookie | twitter-cli |
| Reddit | 🔧 Cookie | 🔧 Cookie | rdt-cli |
| Bilibili | ✅ Local | 🔧 Proxy (server) | yt-dlp |
| XiaoHongShu | 🔧 Cookie | 🔧 Cookie | xhs-cli via MCP |
| Search | 🔧 MCP | 🔧 API key (optional) | Exa |
| WeChat | ✅ | — | Via Exa search |
| Weibo | ✅ | — | Direct API |
| V2EX | ✅ | — | Direct API |

Legend: ✅ Works immediately | 🔧 Needs configuration

Source

Creator's repository · aradotso/ai-agent-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk