Documentation Index
Fetch the complete documentation index at: https://docs.litigationlabs.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Embedding Atlas is an interactive visualization tool that displays all evaluation ratings as points in a 2D semantic space. Each point represents an agent response (witness, judge, or opposing counsel) that has been rated either by a human evaluator or by automated LLM evaluation.What Are Embeddings?
Embeddings are high-dimensional vector representations of text that capture semantic meaning. When we evaluate an agent response, we create an embedding that encodes:- The agent’s response (the actual message text)
- Context (preceding conversation/question that prompted the response)
- Feedback (human or automated reasoning about quality)
- Metadata (agent type, courtroom phase)
text-embedding-ada-002 model, producing 1536-dimensional vectors.
Understanding the Axes
The visualization projects 1536 dimensions onto a 2D plane by selecting the two dimensions with the highest variance across all embeddings. The axes do NOT have fixed semantic meanings — they represent whichever embedding dimensions contain the most variation in your current dataset. Key insight: The absolute position on X or Y is not meaningful. What matters is the relative distances between points.Interpreting the Visualization
Clusters
| Pattern | Possible Interpretation |
|---|---|
| Cluster by agent type | Different agents produce fundamentally different response styles |
| Cluster by rating | Quality issues manifest as similar semantic patterns |
| Cluster by phase | Courtroom phases elicit different response types |
| Isolated outliers | Unusual responses that don’t fit normal patterns |
| Overlapping clusters | Multiple factors contribute to semantic similarity |
Filters
- Source: Human ratings vs Automated (LLM-judged) ratings
- Agent Type: Witness, Judge, Opposing Counsel
- Rating Label: Excellent (5), Good (4), Average (3), Poor (2), Bad (1)
- Phase: Courtroom phase where response occurred
Practical Use Cases
- Quality Pattern Detection — Look for clusters of low-rated responses to reveal systematic quality issues in agent prompts.
- Human vs Automated Agreement — Compare cluster positions between human and automated ratings to verify evaluation criteria alignment.
- Agent Behavior Analysis — Filter by agent type to assess whether responses are repetitive (tight clusters) or varied (spread out).
- Finding Outliers — Points far from all clusters may be edge cases, data quality issues, or novel scenarios.
- Evaluating Prompt Changes — After modifying an agent’s prompt, compare new response positions to historical data.
Data Pipeline
Troubleshooting
- “No Embeddings Yet”: Embeddings are generated when human or automated eval batches are completed.
- Points All in One Cluster: Check for low semantic variety or verify embeddings are generating correctly.
- Visualization Not Loading: Check browser console for CORS errors, DuckDB-WASM initialization failures, or memory issues.