Trace Observability - LitigationLabs

Overview

Every courtroom session in CaseSim involves dozens of AI decisions: the opposing counsel chooses whether to object, the judge weighs a ruling, a witness formulates an answer. Trace observability is the system that records each of these decisions — what went in, what came out, and how long it took — so you can review the full reasoning chain after a session ends. Think of it as a court reporter for the AI. Where the session transcript shows you what happened in the courtroom, trace observability shows you why it happened — the exact instructions each agent received and the raw responses it produced.

Key Concepts

Traces

A trace is a single request-level record. When you ask a question during cross-examination, the system creates one trace for that entire interaction. The trace captures:

Input — What triggered the request (your question, an objection, a procedural action).
Output — The final result (the witness answer, the judge’s ruling, the OCA’s decision).
Metadata — Session ID, scenario, witness, trial phase, and timing.

Each courtroom action maps to a named trace:

Action	Trace Name	What It Records
You ask a witness a question	`courtroom/turn-stream`	Your question, OCA’s objection decision, witness response, score changes
You object to OCA’s question	`courtroom/objection`	Your objection type and basis, judge’s ruling, whether the witness answered
You reply to OCA’s objection	`courtroom/reply`	Your reply strategy, OCA’s counter-argument, judge’s ruling
You allow or proceed past an OCA question	`courtroom/proceed`	The procedural action, judge ruling or witness response
OCA generates a new question	`courtroom/oca-question`	The generated question, whether it was intentionally defective, linked facts
Cross-examination begins	`courtroom/begin-cross`	Which side examines first, the witness on the stand

Generations

Nested inside each trace are one or more generations — individual calls to the language model. A single trace for a player question might contain three generations:

OCA objection check — Did opposing counsel object?
Judge ruling — If so, was it sustained or overruled?
Witness answer — What did the witness say?

Each generation records the system prompt, the conversation messages sent to the model, the model’s raw response, token usage, and latency.

Sessions and Users

Traces are grouped by session ID, which corresponds to your CaseSim session. This means you can open a single session in the dashboard and see every AI interaction from start to finish, in order — a complete audit trail of the entire simulated trial.

What You Can See

The Full Decision Chain

For any courtroom moment, you can trace the complete chain of AI reasoning. For example, when you ask a leading question during direct examination:

Input: Your question (“Isn’t it true that you saw the defendant at 9 PM?”)
OCA generation: The opposing counsel agent receives the question, the trial context, and its instructions. It decides to object for leading.
Judge generation: The judge agent receives the objection, the question, and the applicable rules. It rules to sustain.
Output: The objection was sustained, the witness did not answer, and your score reflects the sustained objection.

Every step is visible. You can read the exact prompt the judge received and understand precisely why it ruled the way it did.

Timing and Performance

Each generation includes latency data. If a session felt slow, you can identify which agent call took the longest and whether the delay was in the model response or in processing.

Patterns Across Sessions

Because traces carry metadata (scenario, phase, witness, tags), you can filter across sessions to answer questions like:

How often does the judge sustain objections in the direct examination phase?
Which witness generates the longest model responses?
Are OCA questions becoming repetitive in longer sessions?

Accessing the Dashboard

The trace dashboard is hosted on Langfuse, a dedicated observability platform. Access it through LLabs Connect at the URL provided by your team administrator.

Navigating a Session

Filter by session — Enter the session ID (visible in your browser URL during a CaseSim session) to see all traces for that session, ordered chronologically.
Select a trace — Click any trace to expand it. The top level shows the input and output summary.
Inspect generations — Expand the nested generation spans to see the full prompt, model response, and token counts.

Reading a Trace

Each trace displays:

Field	What It Means
Name	The courtroom action (e.g., `courtroom/turn-stream`)
Input	The triggering data — your question, objection details, or procedural action
Output	The result — witness response, ruling, score changes, or flow outcome
Tags	Categorization labels like `courtroom`, the phase (`direct`, `cross`), and action type
Duration	Total time from request start to stream completion
Metadata	Scenario ID, witness ID, player side, and trial phase

Reading a Generation

Inside each trace, generation spans show:

Field	What It Means
Function ID	Which agent produced this generation (e.g., `judge-ruling`, `witness-answer`, `oca-objection-check`)
Input messages	The system prompt and conversation history sent to the model
Output	The model’s raw response text
Model	Which language model was used
Tokens	Input and output token counts
Latency	Time for this specific model call

Traced Routes

Beyond courtroom interactions, the following operations are also traced:

Route	Trace Name	Purpose
Session insights generation	`insights/generate`	Post-session analysis and performance summary
Automated evaluations	`evals/automated`	Batch quality scoring of agent responses
Prompt tuning	`prompt-tuning/generate`	Generating prompt variations for agent improvement
Scenario generation	`generate-scenario`	AI-assisted creation of new case scenarios

Practical Uses

Reviewing a Specific Ruling

If a judge ruling felt incorrect during your session, locate the trace for that turn. Expand the judge generation to read the full prompt and the model’s reasoning. This reveals whether the issue was in the prompt instructions, the context provided, or the model’s interpretation.

Understanding OCA Behavior

If opposing counsel seemed to object too frequently or not enough, filter traces by courtroom/turn-stream for your session. Each trace’s output includes the OCA’s decision and reasoning, letting you see the pattern across the full examination.

Validating Score Calculations

Turn-stream traces include score deltas in their output — points awarded, elicits unlocked, and rebuttal coverage. If a score seems off, the trace shows exactly which facts the system matched and what points were assigned.

Comparing Sessions

Run the same scenario twice with different questioning strategies. Filter each session’s traces side by side to see how different approaches affected OCA behavior, judge rulings, and witness responses.

Architecture Summary

The observability system operates in two layers:

Trace layer — Each route handler wraps its logic in a trace context that records input, output, session metadata, and tags. This is the high-level “what happened” record.
Generation layer — Each call to the language model (via the AI SDK) emits telemetry that Langfuse captures as a nested generation span. This is the low-level “what the model saw and said” record.

Both layers flush their data asynchronously after the response is sent to your browser, so tracing adds no perceptible latency to your session.

Setup & Operations

Documentation Index

​Overview

​Key Concepts

​Traces

​Generations

​Sessions and Users

​What You Can See

​The Full Decision Chain

​Timing and Performance

​Patterns Across Sessions

​Accessing the Dashboard

​Navigating a Session

​Reading a Trace

​Reading a Generation

​Traced Routes

​Practical Uses

​Reviewing a Specific Ruling

​Understanding OCA Behavior

​Validating Score Calculations

​Comparing Sessions

​Architecture Summary