Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.litigationlabs.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Embedding Atlas is an interactive visualization tool that displays all evaluation ratings as points in a 2D semantic space. Each point represents an agent response (witness, judge, or opposing counsel) that has been rated either by a human evaluator or by automated LLM evaluation.

What Are Embeddings?

Embeddings are high-dimensional vector representations of text that capture semantic meaning. When we evaluate an agent response, we create an embedding that encodes:
  • The agent’s response (the actual message text)
  • Context (preceding conversation/question that prompted the response)
  • Feedback (human or automated reasoning about quality)
  • Metadata (agent type, courtroom phase)
These embeddings use OpenAI’s text-embedding-ada-002 model, producing 1536-dimensional vectors.

Understanding the Axes

The visualization projects 1536 dimensions onto a 2D plane by selecting the two dimensions with the highest variance across all embeddings. The axes do NOT have fixed semantic meanings — they represent whichever embedding dimensions contain the most variation in your current dataset. Key insight: The absolute position on X or Y is not meaningful. What matters is the relative distances between points.

Interpreting the Visualization

Clusters

PatternPossible Interpretation
Cluster by agent typeDifferent agents produce fundamentally different response styles
Cluster by ratingQuality issues manifest as similar semantic patterns
Cluster by phaseCourtroom phases elicit different response types
Isolated outliersUnusual responses that don’t fit normal patterns
Overlapping clustersMultiple factors contribute to semantic similarity

Filters

  • Source: Human ratings vs Automated (LLM-judged) ratings
  • Agent Type: Witness, Judge, Opposing Counsel
  • Rating Label: Excellent (5), Good (4), Average (3), Poor (2), Bad (1)
  • Phase: Courtroom phase where response occurred

Practical Use Cases

  1. Quality Pattern Detection — Look for clusters of low-rated responses to reveal systematic quality issues in agent prompts.
  2. Human vs Automated Agreement — Compare cluster positions between human and automated ratings to verify evaluation criteria alignment.
  3. Agent Behavior Analysis — Filter by agent type to assess whether responses are repetitive (tight clusters) or varied (spread out).
  4. Finding Outliers — Points far from all clusters may be edge cases, data quality issues, or novel scenarios.
  5. Evaluating Prompt Changes — After modifying an agent’s prompt, compare new response positions to historical data.

Data Pipeline

Agent Response -> Eval Rating -> Embedding Generation -> eval_embeddings DB -> /api/embeddings -> Atlas Visualization

Troubleshooting

  • “No Embeddings Yet”: Embeddings are generated when human or automated eval batches are completed.
  • Points All in One Cluster: Check for low semantic variety or verify embeddings are generating correctly.
  • Visualization Not Loading: Check browser console for CORS errors, DuckDB-WASM initialization failures, or memory issues.