Fetches aggregated trace metrics (token usage, latency, trace counts, quality evaluations) from MLflow tracking servers. Triggers on requests to show metrics, analyze token usage, view LLM costs, check usage trends, or query trace statistics.
---
name: querying-mlflow-metrics
description: Fetches aggregated trace metrics (token usage, latency, trace counts, quality evaluations) from MLflow tracking servers. Triggers on requests to show metrics, analyze token usage, view LLM costs, check usage trends, or query trace statistics.
---
# MLflow Metrics
Run `scripts/fetch_metrics.py` to query metrics from an MLflow tracking server.
## Examples
**Token usage summary:**
```bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG
```
Output: `AVG: 223.91 SUM: 7613`
**Hourly token trend (last 24h):**
```bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
-t 3600 --start-time="-24h" --end-time=now
```
Output: Time-bucketed token sums per hour
**Latency percentiles by trace:**
```bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name
```
**Error rate by status:**
```bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status
```
**Quality scores by evaluator (assessments):**
```bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
-m assessment_value -a AVG,P50 -d assessment_name
```
Output: Average and median scores for each evaluator (e.g., correctness, relevance)
**Assessment count by name:**
```bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
-m assessment_count -a COUNT -d assessment_name
```
**JSON output:** Add `-o json` to any command.
## Arguments
| Arg | Required | Description |
|-----|----------|-------------|
| `-s, --server` | Yes | MLflow server URL |
| `-x, --experiment-ids` | Yes | Experiment IDs (comma-separated) |
| `-m, --metric` | Yes | `trace_count`, `latency`, `input_tokens`, `output_tokens`, `total_tokens` |
| `-a, --aggregations` | Yes | `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, `P50`, `P95`, `P99` |
| `-d, --dimensions` | No | Group by: `trace_name`, `trace_status` |
| `-t, --time-interval` | No | Bucket size in seconds (3600=hourly, 86400=daily) |
| `--start-time` | No | `-24h`, `-7d`, `now`, ISO 8601, or epoch ms |
| `--end-time` | No | Same formats as start-time |
| `-o, --output` | No | `table` (default) or `json` |
For SPANS metrics (`span_count`, `latency`), add `-v SPANS`.
For ASSESSMENTS metrics, add `-v ASSESSMENTS`.
See [references/api_reference.md](references/api_reference.md) for filter syntax and full API details.
Creator's repository · mlflow/skills