> ## Documentation Index
> Fetch the complete documentation index at: https://docs.litigationlabs.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Embedding Atlas

> Visualization and interpretation guide for the 2D embedding atlas of evaluation ratings.

## Overview

The Embedding Atlas is an interactive visualization tool that displays all evaluation ratings as points in a 2D semantic space. Each point represents an agent response (witness, judge, or opposing counsel) that has been rated either by a human evaluator or by automated LLM evaluation.

## What Are Embeddings?

Embeddings are high-dimensional vector representations of text that capture semantic meaning. When we evaluate an agent response, we create an embedding that encodes:

* **The agent's response** (the actual message text)
* **Context** (preceding conversation/question that prompted the response)
* **Feedback** (human or automated reasoning about quality)
* **Metadata** (agent type, courtroom phase)

These embeddings use OpenAI's `text-embedding-ada-002` model, producing **1536-dimensional vectors**.

## Understanding the Axes

The visualization projects 1536 dimensions onto a 2D plane by selecting the **two dimensions with the highest variance** across all embeddings. The axes do NOT have fixed semantic meanings -- they represent whichever embedding dimensions contain the most variation in your current dataset.

**Key insight:** The absolute position on X or Y is not meaningful. What matters is the **relative distances** between points.

## Interpreting the Visualization

### Clusters

| Pattern               | Possible Interpretation                                          |
| --------------------- | ---------------------------------------------------------------- |
| Cluster by agent type | Different agents produce fundamentally different response styles |
| Cluster by rating     | Quality issues manifest as similar semantic patterns             |
| Cluster by phase      | Courtroom phases elicit different response types                 |
| Isolated outliers     | Unusual responses that don't fit normal patterns                 |
| Overlapping clusters  | Multiple factors contribute to semantic similarity               |

### Filters

* **Source**: Human ratings vs Automated (LLM-judged) ratings
* **Agent Type**: Witness, Judge, Opposing Counsel
* **Rating Label**: Excellent (5), Good (4), Average (3), Poor (2), Bad (1)
* **Phase**: Courtroom phase where response occurred

## Practical Use Cases

1. **Quality Pattern Detection** -- Look for clusters of low-rated responses to reveal systematic quality issues in agent prompts.
2. **Human vs Automated Agreement** -- Compare cluster positions between human and automated ratings to verify evaluation criteria alignment.
3. **Agent Behavior Analysis** -- Filter by agent type to assess whether responses are repetitive (tight clusters) or varied (spread out).
4. **Finding Outliers** -- Points far from all clusters may be edge cases, data quality issues, or novel scenarios.
5. **Evaluating Prompt Changes** -- After modifying an agent's prompt, compare new response positions to historical data.

## Data Pipeline

```
Agent Response -> Eval Rating -> Embedding Generation -> eval_embeddings DB -> /api/embeddings -> Atlas Visualization
```

## Troubleshooting

* **"No Embeddings Yet"**: Embeddings are generated when human or automated eval batches are completed.
* **Points All in One Cluster**: Check for low semantic variety or verify embeddings are generating correctly.
* **Visualization Not Loading**: Check browser console for CORS errors, DuckDB-WASM initialization failures, or memory issues.