Let Claude query your data warehouse safely

Wraps your data warehouse in an executable context layer that teaches Claude the schema, business logic, and access rules — so Claude can answer questions without breaking things or hallucinating.

Best for: Teams who want their data warehouse accessible to AI agents without manual SQL writing or security risk.

Engineering / pipelines-databundlefor-engineersneeds-integrationfrom-repo

Skill file

Preview skill file
---
name: ktx-context-layer-data-agents
description: Teach AI agents how to query data warehouses accurately using ktx - an executable context layer with skills, memory, and a semantic layer
triggers:
  - setup ktx for data warehouse queries
  - configure ktx context layer
  - build ktx semantic layer from database
  - integrate ktx with claude code
  - query warehouse using ktx
  - ingest dbt metrics into ktx
  - search ktx wiki or semantic layer
  - troubleshoot ktx mcp server
---

# ktx Context Layer for Data Agents

> Skill by [ara.so](https://ara.so) — AI Agent Skills collection.

ktx is an executable context layer that teaches AI agents how to query data warehouses accurately. It automatically builds a semantic layer from your database, ingests business knowledge from wikis and tools like dbt/Looker, detects joinable columns, resolves fan/chasm traps, and exposes everything through CLI and MCP tools for agent execution.

## Installation

Install ktx globally via npm:

```bash
npm install -g @kaelio/ktx
```

Or add to a project:

```bash
npm install --save-dev @kaelio/ktx
```

## Quick Setup

Run the interactive setup wizard:

```bash
ktx setup
```

This will:
- Create or resume a ktx project in the current directory
- Configure LLM provider (Anthropic API, Google Vertex AI, or Claude Agent SDK)
- Configure embedding provider (OpenAI, Anthropic, Vertex AI)
- Set up database connections (PostgreSQL, Snowflake, BigQuery, etc.)
- Configure context sources (dbt, LookML, Looker, Metabase, Notion)
- Build initial context
- Install agent integration (Codex, Claude Code, etc.)

Check project status:

```bash
ktx status
```

Expected output after successful setup:

```text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
```

## Project Structure

ktx creates this structure:

```text
my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets (git-ignored)
```

Commit `ktx.yaml`, `semantic-layer/`, and `wiki/`. Keep `.ktx/` local.

## Configuration

### ktx.yaml

Example project configuration:

```yaml
version: 1
project_id: analytics_project
llm_provider: anthropic
embedding_provider: openai

connections:
  - id: warehouse
    type: postgres
    config:
      host: localhost
      port: 5432
      database: analytics
      user: readonly_user
      password_env: POSTGRES_PASSWORD
      ssl: false

context_sources:
  - id: dbt_main
    type: dbt
    config:
      manifest_path: ./dbt/target/manifest.json
      catalog_path: ./dbt/target/catalog.json
  
  - id: looker_metrics
    type: looker
    config:
      base_url_env: LOOKER_BASE_URL
      client_id_env: LOOKER_CLIENT_ID
      client_secret_env: LOOKER_CLIENT_SECRET

agent_integrations:
  - type: codex
    scope: project
```

### Environment Variables

Store secrets in environment variables:

```bash
# LLM Provider
export ANTHROPIC_API_KEY=sk-ant-...
# or for Vertex AI
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

# Embedding Provider
export OPENAI_API_KEY=sk-...

# Database Credentials
export POSTGRES_PASSWORD=yourpassword
export SNOWFLAKE_PASSWORD=yourpassword

# Tool Integrations
export LOOKER_BASE_URL=https://company.looker.com
export LOOKER_CLIENT_ID=your_client_id
export LOOKER_CLIENT_SECRET=your_secret
export NOTION_TOKEN=secret_...
```

## Core Commands

### Build Context

Ingest from all configured connections and sources:

```bash
ktx ingest
```

Ingest from specific connection:

```bash
ktx ingest --connection warehouse
```

Ingest from specific context source:

```bash
ktx ingest --context-source dbt_main
```

### Search Semantic Layer

Search for metrics, dimensions, and entities:

```bash
ktx sl "revenue"
ktx sl "customer lifetime value"
ktx sl "monthly active users"
```

Example output:

```text
Found 3 semantic sources matching "revenue":

1. metric.monthly_recurring_revenue
   Type: metric
   Connection: warehouse
   Description: Sum of all active subscription values in a given month
   SQL: SUM(subscriptions.monthly_value)
   
2. dimension.revenue_tier
   Type: dimension
   Entity: customer
   Description: Customer revenue bracket (low/medium/high)
   
3. entity.revenue_events
   Type: entity
   Table: prod.revenue_events
   Primary key: event_id
```

### Search Wiki

Search business knowledge and documentation:

```bash
ktx wiki "refund policy"
ktx wiki "customer segmentation"
```

### Query Warehouse

Execute SQL queries through ktx:

```bash
ktx query "SELECT customer_tier, COUNT(*) FROM customers GROUP BY customer_tier"
```

Query with metric resolution:

```bash
ktx query --use-metrics "SELECT monthly_recurring_revenue FROM time WHERE month = '2024-01'"
```

### MCP Server

Start the Model Context Protocol server for agent integration:

```bash
ktx mcp start
```

Specify project directory:

```bash
ktx mcp start --project-dir /path/to/project
```

The MCP server exposes these tools to agents:
- `ktx_search_semantic_layer` - Search metrics and dimensions
- `ktx_search_wiki` - Search business knowledge
- `ktx_query` - Execute SQL queries
- `ktx_get_schema` - Retrieve table schemas
- `ktx_get_metric_definition` - Get canonical metric SQL

## Working with Semantic Sources

ktx automatically generates semantic sources during ingestion. You can also define them manually.

### Metric Definition

```yaml
# semantic-layer/warehouse/metrics/mrr.yaml
name: monthly_recurring_revenue
type: metric
description: Sum of all active subscription values in a given month
entity: subscription
sql: SUM(subscriptions.monthly_value)
filters:
  - sql: subscriptions.status = 'active'
dimensions:
  - customer_tier
  - plan_type
time_dimension: subscription_start_date
```

### Entity Definition

```yaml
# semantic-layer/warehouse/entities/customer.yaml
name: customer
type: entity
table: prod.customers
primary_key: customer_id
description: Customer master table
dimensions:
  - name: customer_tier
    type: categorical
    sql: tier
  - name: signup_date
    type: time
    sql: created_at
```

### Join Configuration

ktx auto-detects joins, but you can override:

```yaml
# semantic-layer/warehouse/joins/customer_subscription.yaml
from_entity: customer
to_entity: subscription
type: one_to_many
join_sql: customers.customer_id = subscriptions.customer_id
```

## Common Patterns

### Pattern 1: Setup New Project

```bash
# Navigate to analytics directory
cd ~/analytics

# Initialize ktx
ktx setup

# Follow prompts to configure:
# - LLM provider (Anthropic API recommended)
# - Embedding provider (OpenAI recommended)
# - Database connection (read-only credentials)
# - dbt integration (point to manifest.json)

# Verify setup
ktx status

# Build initial context
ktx ingest
```

### Pattern 2: Query with Agent Context

From Claude Code, Cursor, Codex, or OpenCode:

```text
User: What was our MRR in January 2024?

Agent uses ktx:
1. ktx sl "monthly recurring revenue" → finds metric definition
2. ktx query --use-metrics "SELECT monthly_recurring_revenue FROM time WHERE month = '2024-01'"
3. Returns accurate result using canonical metric logic
```

### Pattern 3: Add Business Context

Create wiki pages for business knowledge:

```bash
# Create global wiki page
mkdir -p wiki/global
cat > wiki/global/refund-policy.md << 'EOF'
# Refund Policy

Customers can request refunds within 30 days of purchase.

## Refund Eligibility
- Full refund if < 7 days
- Prorated refund if 7-30 days
- No refund if > 30 days

## Accounting Treatment
Refunds are recorded as negative revenue in the month issued, not the original purchase month.
EOF

# Ingest wiki content
ktx ingest
```

### Pattern 4: Integrate dbt Metrics

```bash
# Ensure dbt artifacts exist
cd dbt-project
dbt compile
dbt docs generate

# Configure in ktx.yaml
cat >> ktx.yaml << 'EOF'
context_sources:
  - id: dbt_main
    type: dbt
    config:
      manifest_path: ./dbt-project/target/manifest.json
      catalog_path: ./dbt-project/target/catalog.json
EOF

# Ingest dbt metrics
ktx ingest --context-source dbt_main

# Search for dbt metrics
ktx sl "customers"
```

### Pattern 5: Agent Integration

For Codex:

```bash
# Install during setup or manually
ktx setup
# Select "codex" when prompted for agent integration

# Or install explicitly
npx skills add Kaelio/ktx --skill ktx
```

For Claude Code:

```bash
# ktx setup handles this automatically
# Adds MCP server config to Claude Code settings
```

Manual MCP configuration for Claude Desktop:

```json
{
  "mcpServers": {
    "ktx": {
      "command": "ktx",
      "args": ["mcp", "start", "--project-dir", "/path/to/project"]
    }
  }
}
```

## TypeScript API (Programmatic Usage)

While ktx is primarily a CLI tool, you can use it programmatically:

```typescript
import { KtxProject } from '@kaelio/ktx';

// Load existing project
const project = await KtxProject.load('/path/to/project');

// Search semantic layer
const results = await project.searchSemanticLayer('revenue');
console.log(results);

// Search wiki
const wikiResults = await project.searchWiki('refund policy');
console.log(wikiResults);

// Execute query
const queryResult = await project.query(
  'warehouse',
  'SELECT * FROM customers LIMIT 10'
);
console.log(queryResult.rows);

// Get metric definition
const metric = await project.getMetric('monthly_recurring_revenue');
console.log(metric.sql);
```

## Troubleshooting

### MCP Server Not Starting

**Symptom**: Agent can't connect to ktx

**Solution**:

```bash
# Check status
ktx status

# Start MCP server manually
ktx mcp start --project-dir /path/to/project

# Check logs
tail -f ~/.ktx/logs/mcp.log
```

### Ingestion Failures

**Symptom**: `ktx ingest` fails with connection errors

**Solution**:

```bash
# Test database connection
ktx test-connection warehouse

# Verify environment variables
env | grep -E 'POSTGRES|SNOWFLAKE|ANTHROPIC|OPENAI'

# Check ktx.yaml syntax
ktx validate

# Ingest with verbose logging
ktx ingest --verbose
```

### Missing Metrics

**Symptom**: `ktx sl "metric_name"` returns no results

**Solution**:

```bash
# Re-ingest context
ktx ingest --force

# Check semantic layer directory
ls -la semantic-layer/warehouse/metrics/

# Manually create metric if needed
mkdir -p semantic-layer/warehouse/metrics
cat > semantic-layer/warehouse/metrics/my_metric.yaml << 'EOF'
name: my_metric
type: metric
description: My custom metric
entity: my_entity
sql: COUNT(*)
EOF

# Rebuild context
ktx ingest
```

### LLM Provider Issues

**Symptom**: Context building fails with API errors

**Solution**:

```bash
# Verify API key
echo $ANTHROPIC_API_KEY

# Test LLM connection
ktx test-llm

# Switch provider if needed
ktx setup
# Select different LLM provider

# Check rate limits and quota
# Anthropic: https://console.anthropic.com
# OpenAI: https://platform.openai.com/usage
```

### Permission Errors

**Symptom**: Can't write to project directory

**Solution**:

```bash
# Check directory permissions
ls -la .

# Fix ownership
sudo chown -R $USER:$USER .

# Or specify writable project directory
ktx setup --project-dir ~/my-ktx-project
```

### Semantic Layer Contradictions

**Symptom**: ktx flags conflicting metric definitions

**Solution**:

```bash
# Review contradictions report
cat raw-sources/warehouse/contradictions.json

# Resolve by editing semantic sources
vim semantic-layer/warehouse/metrics/revenue.yaml

# Or update source (dbt, Looker, etc.)
# Then re-ingest
ktx ingest --force
```

## Advanced Configuration

### Custom Sampling Strategy

Control how ktx samples tables during ingestion:

```yaml
# ktx.yaml
connections:
  - id: warehouse
    type: postgres
    config:
      host: localhost
      database: analytics
      sample_strategy: adaptive
      max_sample_rows: 10000
      min_sample_rows: 100
```

### Join Detection Tuning

Adjust automatic join detection:

```yaml
# ktx.yaml
semantic_layer:
  join_detection:
    min_confidence: 0.8
    sample_size: 1000
    detect_fan_traps: true
    detect_chasm_traps: true
```

### Wiki Organization

Structure wiki for better retrieval:

```bash
wiki/
├── global/
│   ├── metrics/
│   │   ├── revenue-definitions.md
│   │   └── user-engagement.md
│   ├── policies/
│   │   ├── data-retention.md
│   │   └── refund-policy.md
│   └── glossary/
│       └── business-terms.md
└── user/
    └── <user-id>/
        └── scratch.md
```

## Best Practices

1. **Use read-only database credentials** - ktx never writes, but enforce it at the DB level
2. **Commit semantic layer and wiki** - Share context across team
3. **Keep .ktx/ local** - Contains secrets and local state
4. **Re-ingest after schema changes** - `ktx ingest` after dbt runs or migrations
5. **Document metrics in wiki** - Add business context beyond SQL definitions
6. **Use environment variables for secrets** - Never commit credentials
7. **Test queries before agents use them** - `ktx query` validates SQL
8. **Review contradiction reports** - Resolve conflicting definitions promptly

## Project Resolution

ktx finds projects in this order:

1. `--project-dir` flag
2. `KTX_PROJECT_DIR` environment variable
3. Nearest `ktx.yaml` in parent directories
4. Current working directory

For scripting, always use explicit project dir:

```bash
ktx ingest --project-dir /opt/analytics
ktx mcp start --project-dir /opt/analytics
```

---

**Documentation**: https://docs.kaelio.com/ktx  
**GitHub**: https://github.com/Kaelio/ktx  
**Slack Community**: https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ

Source

Creator's repository · aradotso/ai-agent-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk