agent-reach-internet-access

Give AI agents eyes to see the internet — scrape Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu with zero API fees

Skill file

Preview skill file↓↑

---
name: agent-reach-internet-access
description: Give AI agents eyes to see the internet — scrape Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu with zero API fees
triggers:
  - search twitter for mentions
  - get youtube video transcript
  - scrape this reddit thread
  - read this xiaohongshu post
  - search github repositories
  - get bilibili video subtitles
  - read this web page content
  - search the internet for
---

# Agent Reach — Internet Access for AI Agents

> Skill by [ara.so](https://ara.so) — AI Agent Skills collection.

Agent Reach is a scaffolding tool that gives AI agents the ability to read and search across Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu, and more — all without paid APIs. It orchestrates best-in-class upstream tools (yt-dlp, twitter-cli, rdt-cli, gh CLI, etc.) and provides a unified interface for AI agents.

## Installation

Agent Reach is installed via pip and automatically sets up dependencies:

```bash
# Basic installation
pip install agent-reach

# The tool will auto-detect and install:
# - Node.js (for some MCP servers)
# - gh CLI (for GitHub)
# - mcporter (for MCP integrations)
# - twitter-cli (for Twitter/X)
# - rdt-cli (for Reddit)
# - yt-dlp (for YouTube/Bilibili)
```

After installation, run diagnostics to check what's working:

```bash
agent-reach doctor
```

This shows status for each channel: ✅ (works out of box), 🔧 (needs config), or ❌ (not available).

## Core Capabilities

### 1. Web Page Reading (Jina Reader)

Read any web page as clean markdown:

```bash
# Read a web page
curl https://r.jina.ai/https://example.com

# Get JSON format
curl https://r.jina.ai/https://example.com \
  -H "Accept: application/json"

# With images
curl https://r.jina.ai/https://example.com \
  -H "X-With-Images-Summary: true"
```

**Python usage:**

```python
import requests

url = "https://example.com"
response = requests.get(f"https://r.jina.ai/{url}")
markdown_content = response.text

# With options
headers = {
    "X-With-Links-Summary": "true",
    "X-With-Images-Summary": "true"
}
response = requests.get(f"https://r.jina.ai/{url}", headers=headers)
```

### 2. YouTube & Video (yt-dlp)

Extract subtitles, metadata, and search videos:

```bash
# Get video metadata + subtitles
yt-dlp --dump-json --write-auto-subs --skip-download \
  "https://www.youtube.com/watch?v=VIDEO_ID"

# Search YouTube
yt-dlp "ytsearch5:AI agents tutorial" --dump-json

# Get specific subtitle language
yt-dlp --write-subs --sub-lang en --skip-download URL

# Bilibili videos (works same way)
yt-dlp --dump-json "https://www.bilibili.com/video/BV..."
```

**Python usage:**

```python
import subprocess
import json

def get_video_info(url):
    result = subprocess.run(
        ["yt-dlp", "--dump-json", "--write-auto-subs", 
         "--skip-download", url],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

# Search videos
def search_youtube(query, max_results=5):
    result = subprocess.run(
        ["yt-dlp", f"ytsearch{max_results}:{query}", "--dump-json"],
        capture_output=True, text=True
    )
    return [json.loads(line) for line in result.stdout.strip().split('\n')]
```

### 3. Twitter/X (twitter-cli)

Requires Cookie authentication. Export cookies using [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm) Chrome extension.

```bash
# Configure (paste exported cookies when prompted)
twitter configure

# Read a tweet
twitter tweet https://twitter.com/user/status/123456789

# Search tweets
twitter search "AI agents" --limit 20

# Get user timeline
twitter timeline @username --limit 50

# Get tweet thread
twitter thread https://twitter.com/user/status/123456789
```

**Configuration file location:** `~/.twitter-cli/config.json`

### 4. Reddit (rdt-cli)

Requires Cookie authentication:

```bash
# Login with cookies
rdt login

# Search posts
rdt search "machine learning" --limit 20

# Read post with comments
rdt post https://reddit.com/r/programming/comments/...

# Get subreddit posts
rdt subreddit r/python --limit 30
```

### 5. GitHub (gh CLI)

```bash
# Login (opens browser OAuth flow)
gh auth login

# View repository
gh repo view owner/repo

# Search repositories
gh search repos "LLM framework" --limit 20

# Search issues
gh search issues "bug" --repo owner/repo

# View issue
gh issue view 123 --repo owner/repo

# Create issue
gh issue create --repo owner/repo \
  --title "Bug report" --body "Description"
```

**Python usage:**

```python
import subprocess
import json

def search_repos(query, limit=20):
    result = subprocess.run(
        ["gh", "search", "repos", query, 
         "--limit", str(limit), "--json", "name,description,url"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

def get_repo_info(owner_repo):
    result = subprocess.run(
        ["gh", "repo", "view", owner_repo, "--json", 
         "description,stargazerCount,forkCount,url"],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)
```

### 6. XiaoHongShu (xhs-cli via mcporter)

Requires Cookie authentication:

```bash
# Configure (sets up MCP server)
mcporter add xiaohongshu

# The MCP server provides these tools:
# - search_notes: Search XHS posts
# - get_note_detail: Get post content
# - post_note: Create new post
# - comment_note: Add comment
# - like_note: Like a post
```

Configuration stored in: `~/.mcporter/xiaohongshu/config.json`

### 7. Bilibili Enhanced (bili-cli)

```bash
# Get hot videos
bili hot --limit 20

# Search videos
bili search "Python tutorial" --limit 30

# Get video info
bili video BV1xx411c7mD

# Get user dynamics
bili user-dynamic 123456
```

### 8. Internet Search (Exa via mcporter)

Semantic search across the web:

```bash
# Add Exa MCP server (no API key needed for basic use)
mcporter add exa

# The MCP server provides:
# - search: AI-powered semantic search
# - find_similar: Find similar pages
# - get_contents: Extract page contents
```

**For advanced features, set API key:**

```bash
export EXA_API_KEY=your_key_here
```

### 9. RSS Feeds

```python
import feedparser

# Parse RSS feed
feed = feedparser.parse("https://example.com/feed.xml")

for entry in feed.entries:
    print(f"Title: {entry.title}")
    print(f"Link: {entry.link}")
    print(f"Published: {entry.published}")
    print(f"Summary: {entry.summary}")
    print("---")
```

### 10. WeChat Official Accounts

Search and read WeChat articles via Exa + Camoufox:

```python
# Use Exa search to find WeChat articles
# Articles are auto-extracted when URLs contain mp.weixin.qq.com
```

### 11. Weibo (微博)

```bash
# Search content
agent-reach weibo search "AI" --type content

# Get hot search
agent-reach weibo hot

# Get user posts
agent-reach weibo user USER_ID

# Get comments
agent-reach weibo comments POST_ID
```

### 12. V2EX

```bash
# Get hot topics
agent-reach v2ex hot

# Get node topics
agent-reach v2ex node python

# Get topic details
agent-reach v2ex topic 123456
```

## Configuration Patterns

### Cookie-Based Services

For Twitter, Reddit, XiaoHongShu — use Cookie-Editor:

1. Login to the service in browser
2. Install [Cookie-Editor](https://chromewebstore.google.com/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm)
3. Click extension → Export → Copy
4. Paste into CLI config command

**Never commit cookies to version control.** They're stored in:
- Twitter: `~/.twitter-cli/config.json`
- Reddit: `~/.rdt-cli/cookies.json`
- XHS: `~/.mcporter/xiaohongshu/config.json`

### Proxy Configuration (Server Deployments)

For Bilibili access from servers:

```bash
# Set proxy environment variables
export HTTP_PROXY=http://proxy-server:port
export HTTPS_PROXY=http://proxy-server:port

# Or configure per-tool
yt-dlp --proxy http://proxy-server:port URL
```

### GitHub Authentication

```bash
# OAuth login (recommended)
gh auth login

# Or use token
export GITHUB_TOKEN=ghp_your_token_here
gh auth login --with-token <<< $GITHUB_TOKEN
```

## Common Workflows

### Scrape Twitter Thread for Research

```python
import subprocess
import json

def get_twitter_thread(url):
    result = subprocess.run(
        ["twitter", "thread", url],
        capture_output=True, text=True
    )
    return result.stdout

thread_content = get_twitter_thread(
    "https://twitter.com/user/status/123456789"
)
```

### Extract YouTube Video Summary

```python
import subprocess
import json

def get_video_transcript(url):
    # Get metadata + subtitles
    result = subprocess.run(
        ["yt-dlp", "--dump-json", "--write-auto-subs", 
         "--skip-download", url],
        capture_output=True, text=True
    )
    data = json.loads(result.stdout)
    
    # Subtitles are in data['subtitles'] or data['automatic_captions']
    return {
        'title': data.get('title'),
        'description': data.get('description'),
        'duration': data.get('duration'),
        'subtitles': data.get('automatic_captions', {})
    }
```

### Search GitHub for Solutions

```python
import subprocess
import json

def search_github_issues(query, repo=None):
    cmd = ["gh", "search", "issues", query, 
           "--limit", "20", "--json", 
           "title,url,state,body,comments"]
    if repo:
        cmd.extend(["--repo", repo])
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

# Search across all repos
issues = search_github_issues("memory leak in agents")

# Search specific repo
issues = search_github_issues("bug", repo="openai/gpt-4")
```

### Monitor Reddit for Mentions

```bash
# Search and save results
rdt search "your_product_name" --limit 50 > mentions.txt

# Get specific subreddit
rdt subreddit r/artificial --limit 100
```

### Read Web Page Content for Analysis

```python
import requests

def get_clean_content(url):
    response = requests.get(
        f"https://r.jina.ai/{url}",
        headers={
            "X-With-Links-Summary": "true",
            "X-No-Cache": "true"
        }
    )
    return response.text

content = get_clean_content("https://news.ycombinator.com")
```

## Troubleshooting

### Doctor Command Shows ❌

Run diagnostics:

```bash
agent-reach doctor
```

Each ❌ includes a fix suggestion. Common issues:

**Twitter/Reddit not working:**
- Need Cookie authentication
- Use Cookie-Editor to export cookies
- Run `twitter configure` or `rdt login`

**Bilibili 403 on server:**
- Need proxy for non-CN IPs
- Set `HTTP_PROXY` and `HTTPS_PROXY` env vars

**GitHub rate limited:**
- Authenticate: `gh auth login`
- Authenticated rate: 5,000/hour vs 60/hour

**yt-dlp fails:**
- Update to latest: `pip install -U yt-dlp`
- Tool is actively maintained, updates frequently

### MCP Server Connection Issues

```bash
# Check mcporter status
mcporter list

# Restart a server
mcporter restart xiaohongshu

# View server logs
mcporter logs exa
```

### Proxy Not Working

```bash
# Test proxy connection
curl -x http://proxy:port https://api.bilibili.com

# Set for specific command
export HTTPS_PROXY=http://proxy:port
yt-dlp URL
```

### Cookie Expired

Re-export fresh cookies:
1. Login to service in browser
2. Export with Cookie-Editor
3. Reconfigure CLI tool

## Environment Variables

```bash
# Proxy (for server deployments)
export HTTP_PROXY=http://proxy:port
export HTTPS_PROXY=http://proxy:port

# GitHub
export GITHUB_TOKEN=ghp_xxxxx

# Exa (optional, for advanced features)
export EXA_API_KEY=your_key_here

# Custom config paths (optional)
export AGENT_REACH_CONFIG_DIR=~/.config/agent-reach
```

## Safety & Privacy

- **All cookies stored locally** in `~/.twitter-cli/`, `~/.rdt-cli/`, etc.
- **No data uploaded** to Agent Reach servers (there are none)
- **Code is open source** — audit anytime
- Use `--safe` mode during install to review system package installs

## Updating

```bash
# Update agent-reach
pip install -U agent-reach

# Update individual tools
pip install -U yt-dlp
gh extension upgrade --all
npm update -g mcporter
```

Check for breaking changes: https://github.com/Panniantong/agent-reach/blob/main/CHANGELOG.md

## Platform Support Matrix

| Platform | Out of Box | After Config | Notes |
|----------|-----------|--------------|-------|
| Web | ✅ | — | Jina Reader, no limits |
| YouTube | ✅ | — | yt-dlp, 1800+ sites |
| RSS | ✅ | — | feedparser |
| GitHub | ✅ | 🔧 Auth for private | gh CLI |
| Twitter | 🔧 Cookie | 🔧 Cookie | twitter-cli |
| Reddit | 🔧 Cookie | 🔧 Cookie | rdt-cli |
| Bilibili | ✅ Local | 🔧 Proxy (server) | yt-dlp |
| XiaoHongShu | 🔧 Cookie | 🔧 Cookie | xhs-cli via MCP |
| Search | 🔧 MCP | 🔧 API key (optional) | Exa |
| WeChat | ✅ | — | Via Exa search |
| Weibo | ✅ | — | Direct API |
| V2EX | ✅ | — | Direct API |

Legend: ✅ Works immediately | 🔧 Needs configuration

Source

Creator's repository · aradotso/ai-agent-skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk