Query your data warehouse in plain English

Installs ktx as a semantic layer between Claude and your warehouse. Ask questions about revenue, churn, or unit economics in natural language and get accurate SQL queries without writing DDL.

Best for: Founders and operators who need weekly dashboards but don't want to learn SQL or bug the data team.

Finance / reporting-dashboardsatomicfor-foundersfor-opsneeds-integration

Skill file

Preview skill file
---
name: ktx-ai-data-context-layer
description: Install and configure ktx, an executable context layer for data and analytics agents that enables accurate warehouse queries through MCP with semantic layer and skills
triggers:
  - set up ktx for data agent queries
  - configure ktx semantic layer for warehouse
  - install ktx context layer with MCP
  - help me query my data warehouse with ktx
  - set up ktx for Claude Code data analysis
  - configure ktx ingestion from dbt and warehouse
  - troubleshoot ktx mcp server connection
  - build ktx context from database metadata
---

# ktx AI Data Context Layer

> Skill by [ara.so](https://ara.so) — AI Agent Skills collection.

**ktx** is a self-improving context layer that teaches AI agents how to query your data warehouse accurately. It ingests metadata from databases, dbt, Looker, Metabase, and wikis to build a semantic layer with approved metric definitions, joinable columns, and business knowledge. Agents access this through MCP (Model Context Protocol) tools and CLI commands.

## What ktx Does

- **Learns from company knowledge** — ingests wiki content, removes duplicates, flags contradictions
- **Maps the data stack** — samples tables, detects joinable columns, captures metadata and usage patterns
- **Builds a semantic layer** — combines raw tables and high-level metrics with automatic join resolution
- **Serves agents at execution** — exposes CLI and MCP tools with semantic search across wiki and semantic-layer entities

Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.

## Installation

### Global CLI Installation

```bash
npm install -g @kaelio/ktx
```

### Project-Specific Installation

```bash
cd /path/to/your/project
npm install @kaelio/ktx
```

### Verify Installation

```bash
ktx --version
ktx --help
```

## Quick Setup

```bash
# Create or resume a ktx project in current directory
ktx setup

# Check project readiness
ktx status
```

The `ktx setup` command will:
1. Create or resume a local ktx project
2. Configure LLM and embedding providers
3. Set up database connections
4. Configure context sources (dbt, Looker, etc.)
5. Build initial context
6. Install agent integration (MCP)

## Project Structure

```
my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets (git-ignored)
```

**Important**: Commit `ktx.yaml`, `semantic-layer/`, and `wiki/`. Keep `.ktx/` local.

## Configuration

### ktx.yaml Structure

```yaml
version: 1
name: my-analytics-project

llm:
  provider: anthropic
  model: claude-sonnet-4-6

embeddings:
  provider: openai
  model: text-embedding-3-small

databases:
  warehouse:
    type: postgres
    host: localhost
    port: 5432
    database: analytics
    # Credentials stored in .ktx/secrets.yaml

context-sources:
  dbt_main:
    type: dbt
    path: ./dbt-project
    target: prod
  
  looker_main:
    type: looker
    base_url: https://company.looker.com
    # API credentials in .ktx/secrets.yaml

agent-integration:
  type: mcp
  clients:
    - codex:project
    - claude-code
```

### LLM Configuration

ktx supports multiple LLM providers:

```bash
# During setup, choose provider
ktx setup

# Supported providers:
# - anthropic (requires ANTHROPIC_API_KEY)
# - vertex-ai (requires Google Cloud credentials)
# - ai-gateway (custom endpoint)
# - claude-agent-sdk (local Claude Code session)
```

Set API keys via environment variables:

```bash
export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
```

### Database Connection

Example PostgreSQL connection in `ktx.yaml`:

```yaml
databases:
  warehouse:
    type: postgres
    host: localhost
    port: 5432
    database: analytics
    schema: public
    ssl: true
```

Credentials in `.ktx/secrets.yaml` (auto-generated, git-ignored):

```yaml
databases:
  warehouse:
    user: readonly_user
    password: ${DB_PASSWORD}
```

### Context Sources

#### dbt Integration

```yaml
context-sources:
  dbt_main:
    type: dbt
    path: ./dbt-project
    target: prod
    manifest_path: ./dbt-project/target/manifest.json
```

#### Looker Integration

```yaml
context-sources:
  looker_main:
    type: looker
    base_url: https://company.looker.com
    # client_id and client_secret in .ktx/secrets.yaml
```

#### Notion Integration

```yaml
context-sources:
  company_wiki:
    type: notion
    # token in .ktx/secrets.yaml
    database_ids:
      - abc123def456
```

## Key Commands

### Setup and Status

```bash
# Create/resume project
ktx setup

# Check project status
ktx status

# Example output:
# ktx project: /home/user/analytics
# Project ready: yes
# LLM ready: yes (claude-sonnet-4-6)
# Embeddings ready: yes (text-embedding-3-small)
# Databases configured: yes (warehouse)
# Context sources configured: yes (dbt_main)
# ktx context built: yes
# Agent integration ready: yes (codex:project)
```

### Building Context

```bash
# Ingest all configured sources
ktx ingest

# Ingest specific connection
ktx ingest --connection warehouse

# Ingest specific context source
ktx ingest --source dbt_main

# Force re-ingestion (skip cache)
ktx ingest --force
```

### Searching Context

```bash
# Search semantic layer (metrics, dimensions, entities)
ktx sl "revenue"
ktx sl "monthly active users"

# Search wiki pages
ktx wiki "refund policy"
ktx wiki "data retention"

# Combined search (semantic layer + wiki)
ktx search "customer churn"
```

### MCP Server

```bash
# Start MCP server for agent clients
ktx mcp start

# Start with specific project directory
ktx mcp start --project-dir /path/to/project

# Check MCP server status
ktx mcp status

# Stop MCP server
ktx mcp stop
```

### Semantic Layer Management

```bash
# List semantic sources
ktx sl list

# Validate semantic layer definitions
ktx sl validate

# Show semantic source details
ktx sl show users
ktx sl show revenue_metrics
```

### Project Management

```bash
# Initialize new project (alternative to setup)
ktx init

# Update project configuration
ktx config set llm.provider anthropic
ktx config set llm.model claude-sonnet-4-6

# Show current configuration
ktx config show

# Clean cached context
ktx clean --cache

# Clean everything (reset project)
ktx clean --all
```

## Real Usage Examples

### Example 1: Set Up ktx for PostgreSQL Warehouse

```bash
# Install globally
npm install -g @kaelio/ktx

# Navigate to project
cd ~/my-analytics-project

# Set LLM API key
export ANTHROPIC_API_KEY=sk-ant-...

# Run setup (interactive)
ktx setup
# Choose: anthropic, claude-sonnet-4-6
# Choose: openai, text-embedding-3-small
# Add database: postgres
# Host: localhost
# Port: 5432
# Database: analytics
# User: readonly_user
# Password: [enter securely]

# Verify setup
ktx status
```

### Example 2: Ingest dbt and Build Context

```bash
# Configure dbt source in ktx.yaml
cat >> ktx.yaml <<EOF
context-sources:
  dbt_main:
    type: dbt
    path: ./dbt-project
    target: prod
EOF

# Run ingestion
ktx ingest

# Search for dbt models
ktx sl "dim_customers"
ktx sl "fct_orders"
```

### Example 3: Query Semantic Layer from TypeScript

```typescript
import { execSync } from 'child_process';

// Search semantic layer
function searchSemanticLayer(query: string): string {
  const result = execSync(`ktx sl "${query}"`, { encoding: 'utf-8' });
  return result;
}

// Example usage
const revenueMetrics = searchSemanticLayer('revenue');
console.log(revenueMetrics);

// Search wiki
function searchWiki(query: string): string {
  const result = execSync(`ktx wiki "${query}"`, { encoding: 'utf-8' });
  return result;
}

const refundPolicy = searchWiki('refund policy');
console.log(refundPolicy);
```

### Example 4: Set Up MCP for Claude Code

```bash
# Navigate to project
cd ~/my-analytics-project

# Run setup with MCP integration
ktx setup
# During setup, select: "Install MCP integration? yes"
# Select agent: "Claude Code (codex)"

# Start MCP server
ktx mcp start --project-dir ~/my-analytics-project

# Keep terminal open, open Claude Code in another window
# Claude Code will auto-discover the MCP server
```

### Example 5: Create Semantic Layer Definition

Create `semantic-layer/warehouse/customers.yaml`:

```yaml
kind: SemanticSource
name: customers
type: entity

description: Customer dimension with lifetime metrics

columns:
  - name: customer_id
    type: dimension
    data_type: integer
    primary_key: true
    
  - name: email
    type: dimension
    data_type: string
    
  - name: created_at
    type: dimension
    data_type: timestamp
    
  - name: lifetime_value
    type: metric
    data_type: numeric
    sql: SUM(order_total)
    aggregation: sum
    
  - name: total_orders
    type: metric
    data_type: integer
    sql: COUNT(DISTINCT order_id)
    aggregation: count

sql: |
  SELECT 
    c.id as customer_id,
    c.email,
    c.created_at,
    COALESCE(SUM(o.total), 0) as lifetime_value,
    COUNT(o.id) as total_orders
  FROM customers c
  LEFT JOIN orders o ON c.id = o.customer_id
  GROUP BY c.id, c.email, c.created_at

joins:
  - to: orders
    type: one_to_many
    on: customers.customer_id = orders.customer_id
```

Validate and ingest:

```bash
ktx sl validate
ktx ingest --force
ktx sl show customers
```

### Example 6: Add Wiki Context

Create `wiki/global/data-definitions.md`:

```markdown
# Data Definitions

## Revenue Recognition

Revenue is recognized at the point of sale, not when payment is received.

**Key Metrics:**
- `gross_revenue`: Total sales before discounts
- `net_revenue`: Sales minus refunds and discounts
- `arr`: Annual Recurring Revenue (subscription only)

## Customer Classification

- **Active**: Made a purchase in last 90 days
- **Churned**: No purchase in 90+ days
- **New**: First purchase within last 30 days

## Refund Policy

Refunds are processed within 7 days and deducted from revenue in the same period.
```

Ingest wiki:

```bash
ktx ingest
ktx wiki "revenue recognition"
```

## Common Patterns

### Pattern 1: Daily Context Refresh

```bash
#!/bin/bash
# refresh-context.sh

cd /path/to/ktx/project
ktx ingest --force
ktx sl validate
```

Run via cron:

```bash
# Run daily at 2am
0 2 * * * /path/to/refresh-context.sh
```

### Pattern 2: Multi-Environment Setup

```yaml
# ktx.yaml
databases:
  warehouse_prod:
    type: postgres
    host: prod.example.com
    database: analytics
    
  warehouse_staging:
    type: postgres
    host: staging.example.com
    database: analytics
```

```bash
# Ingest only production
ktx ingest --connection warehouse_prod

# Ingest only staging
ktx ingest --connection warehouse_staging
```

### Pattern 3: CI/CD Validation

```yaml
# .github/workflows/validate-ktx.yml
name: Validate ktx Semantic Layer

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install ktx
        run: npm install -g @kaelio/ktx
        
      - name: Validate semantic layer
        run: ktx sl validate
```

### Pattern 4: Environment-Specific Secrets

```bash
# .env.production
ANTHROPIC_API_KEY=sk-ant-prod-...
DB_PASSWORD=prod_password

# .env.staging
ANTHROPIC_API_KEY=sk-ant-staging-...
DB_PASSWORD=staging_password
```

Load before running:

```bash
source .env.production
ktx ingest
```

## Troubleshooting

### Issue: `ktx command not found`

**Solution**: Ensure global installation or use npx:

```bash
npm install -g @kaelio/ktx
# or
npx @kaelio/ktx setup
```

### Issue: MCP server not connecting

**Solution**: Check server is running and project directory is correct:

```bash
ktx mcp status

# If not running:
ktx mcp start --project-dir /absolute/path/to/project

# Verify in Claude Code settings:
# MCP servers should show ktx with status "Connected"
```

### Issue: Database connection fails

**Solution**: Verify credentials and network access:

```bash
# Test connection manually
psql -h localhost -p 5432 -U readonly_user -d analytics

# Check ktx configuration
ktx config show

# Re-run setup to update credentials
ktx setup
```

### Issue: Ingestion fails with "LLM error"

**Solution**: Verify API key and quota:

```bash
# Check API key is set
echo $ANTHROPIC_API_KEY

# Verify in ktx config
ktx config show

# Try with different model
ktx config set llm.model claude-sonnet-3-5-20240620
ktx ingest
```

### Issue: Semantic layer validation errors

**Solution**: Check YAML syntax and required fields:

```bash
# Run validation with verbose output
ktx sl validate --verbose

# Common issues:
# - Missing 'kind' field
# - Invalid column types (must be: dimension, metric, or attribute)
# - Invalid join syntax
# - Missing SQL or columns definition
```

### Issue: Wiki search returns no results

**Solution**: Verify wiki ingestion completed:

```bash
# Re-run ingestion
ktx ingest --force

# Check wiki files exist
ls -la wiki/global/

# Verify embeddings configured
ktx config show | grep embeddings
```

### Issue: Slow ingestion performance

**Solution**: Use incremental ingestion and tune sampling:

```bash
# Skip force refresh (use cache)
ktx ingest

# Reduce table sampling (in ktx.yaml)
databases:
  warehouse:
    sampling:
      max_rows: 1000  # default is 10000
```

### Issue: Agent can't find ktx MCP tools

**Solution**: Ensure MCP server is running and configured:

```bash
# Start MCP server in background
ktx mcp start --project-dir $(pwd) &

# In Claude Code, verify MCP connection:
# Settings > MCP Servers > ktx should show "Connected"

# If using Cursor/other, check their MCP configuration
```

## Advanced Usage

### Custom LLM Provider (AI Gateway)

```yaml
llm:
  provider: ai-gateway
  endpoint: https://gateway.example.com/v1/chat/completions
  model: custom-model
  api_key_env: CUSTOM_API_KEY
```

### Multiple Semantic Layer Sources

```yaml
context-sources:
  dbt_core:
    type: dbt
    path: ./dbt-core
    
  dbt_marketing:
    type: dbt
    path: ./dbt-marketing
    
  metabase:
    type: metabase
    url: https://metabase.example.com
```

### Programmatic Access (Node.js)

```typescript
import { spawn } from 'child_process';

async function queryKtx(query: string): Promise<string> {
  return new Promise((resolve, reject) => {
    const proc = spawn('ktx', ['search', query]);
    let output = '';
    
    proc.stdout.on('data', (data) => {
      output += data.toString();
    });
    
    proc.on('close', (code) => {
      if (code === 0) {
        resolve(output);
      } else {
        reject(new Error(`ktx exited with code ${code}`));
      }
    });
  });
}

// Usage
const results = await queryKtx('customer metrics');
console.log(results);
```

## Project Resolution

ktx resolves the project directory in this order:

1. `--project-dir` flag
2. `KTX_PROJECT_DIR` environment variable
3. Nearest `ktx.yaml` (walk up from current directory)
4. Current directory

```bash
# Explicit project directory
ktx status --project-dir /path/to/project

# Using environment variable
export KTX_PROJECT_DIR=/path/to/project
ktx status

# Auto-discover from current directory
cd /path/to/project
ktx status
```

## Resources

- [Documentation](https://docs.kaelio.com/ktx/docs/)
- [CLI Reference](https://docs.kaelio.com/ktx/docs/cli-reference/ktx)
- [Agent Quickstart](https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart)
- [GitHub Repository](https://github.com/Kaelio/ktx)
- [Slack Community](https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ)

Source

Creator's repository · aradotso/ai-agent-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk