Documentation Index
Fetch the complete documentation index at: https://docs.litigationlabs.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Every AI call in CaseSim produces metrics — token counts, latency, model cost, and custom scores that capture the outcome of each courtroom interaction. These metrics flow into Langfuse automatically, giving you a real-time view of system performance and agent quality.Automatic Metrics
Langfuse captures these for every generation (LLM call) without any additional configuration:| Metric | What It Measures |
|---|---|
| Input tokens | How many tokens were in the prompt sent to the model |
| Output tokens | How many tokens the model generated in response |
| Total tokens | Sum of input + output |
| Latency | Time from request to complete response, in milliseconds |
| Model | Which model handled the call (e.g., gpt-4o, claude-sonnet-4-20250514) |
| Cost | Estimated cost based on token counts and model pricing |
Custom Scores
Beyond automatic metrics, each traced route pushes custom scores into its trace output. These capture the outcome of the courtroom interaction, not just the infrastructure cost.Turn Stream (courtroom/turn-stream)
When a player asks a question, the trace output includes:
- scoreDelta — Points earned from the witness’s response (based on elicits unlocked).
- rebuttalPoints — Points from cross-examination rebuttal coverage.
- ocaDecision — Whether opposing counsel objected (
objectionorno_objection). - witnessResponse — The full text of the witness answer.
Player Objection (courtroom/objection)
When a player objects to an OCA question:
- playerObjectionAccuracy — Whether the player’s objection was legally correct.
- playerObjectionRuling — The judge’s ruling (
sustainoroverrule). - playerCorrectCount — Running total of correct objections in the session.
- playerIncorrectCount — Running total of incorrect objections.
OCA Question (courtroom/oca-question)
When the OCA generates a question during cross-examination:
- questionGenerated — The full text of the question.
- isIntentionallyDefective — Whether the question was designed to be objectionable (pedagogical trap).
- coveredTopics — Topics already addressed in the examination.
- pendingTopics — Topics remaining for OCA to cover.
Session-Level Metrics
These metrics accumulate across an entire CaseSim session and are persisted in the database:| Metric | Description |
|---|---|
questionsAsked | Total questions submitted by the player |
objectionsRaised | Total objections raised by OCA |
objectionsSustained | How many OCA objections the judge sustained |
playerObjectionsRaised | Total objections raised by the player |
playerCorrectObjections | Player objections that were legally correct |
playerIncorrectObjections | Player objections that were incorrect |
playerMissedObjections | Defective OCA questions the player failed to catch |
intentionallyIncorrectObjections | Pedagogical traps set by OCA |
score.points | Running point total |
score.elicitsHit | Number of key facts successfully established |
score.unlockedElicitIds | Which specific facts have been proven |
Using Metrics in Langfuse
Identifying Slow Calls
Filter generations by latency to find which agent calls are taking the longest. Common patterns:- Witness answers tend to be the slowest (longest output).
- Judge rulings are typically fast (short, structured JSON output).
- OCA objection checks vary — complex transcript context increases latency.
Tracking Cost
Use Langfuse’s cost dashboard to see spend broken down by agent type, model, and time period. If cost spikes, you can identify whether it’s due to increased usage, longer prompts, or a model override change.Spotting Quality Trends
Filter traces by tag (e.g.,direct vs. cross) and compare custom scores. If scoreDelta is consistently low during cross-examination, it may indicate the witness agent isn’t surfacing elicits effectively during that phase.