Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.litigationlabs.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Every AI call in CaseSim produces metrics — token counts, latency, model cost, and custom scores that capture the outcome of each courtroom interaction. These metrics flow into Langfuse automatically, giving you a real-time view of system performance and agent quality.

Automatic Metrics

Langfuse captures these for every generation (LLM call) without any additional configuration:
MetricWhat It Measures
Input tokensHow many tokens were in the prompt sent to the model
Output tokensHow many tokens the model generated in response
Total tokensSum of input + output
LatencyTime from request to complete response, in milliseconds
ModelWhich model handled the call (e.g., gpt-4o, claude-sonnet-4-20250514)
CostEstimated cost based on token counts and model pricing
These metrics are captured at the generation level, so you can compare performance across individual agent calls within a single trace.

Custom Scores

Beyond automatic metrics, each traced route pushes custom scores into its trace output. These capture the outcome of the courtroom interaction, not just the infrastructure cost.

Turn Stream (courtroom/turn-stream)

When a player asks a question, the trace output includes:
  • scoreDelta — Points earned from the witness’s response (based on elicits unlocked).
  • rebuttalPoints — Points from cross-examination rebuttal coverage.
  • ocaDecision — Whether opposing counsel objected (objection or no_objection).
  • witnessResponse — The full text of the witness answer.

Player Objection (courtroom/objection)

When a player objects to an OCA question:
  • playerObjectionAccuracy — Whether the player’s objection was legally correct.
  • playerObjectionRuling — The judge’s ruling (sustain or overrule).
  • playerCorrectCount — Running total of correct objections in the session.
  • playerIncorrectCount — Running total of incorrect objections.

OCA Question (courtroom/oca-question)

When the OCA generates a question during cross-examination:
  • questionGenerated — The full text of the question.
  • isIntentionallyDefective — Whether the question was designed to be objectionable (pedagogical trap).
  • coveredTopics — Topics already addressed in the examination.
  • pendingTopics — Topics remaining for OCA to cover.

Session-Level Metrics

These metrics accumulate across an entire CaseSim session and are persisted in the database:
MetricDescription
questionsAskedTotal questions submitted by the player
objectionsRaisedTotal objections raised by OCA
objectionsSustainedHow many OCA objections the judge sustained
playerObjectionsRaisedTotal objections raised by the player
playerCorrectObjectionsPlayer objections that were legally correct
playerIncorrectObjectionsPlayer objections that were incorrect
playerMissedObjectionsDefective OCA questions the player failed to catch
intentionallyIncorrectObjectionsPedagogical traps set by OCA
score.pointsRunning point total
score.elicitsHitNumber of key facts successfully established
score.unlockedElicitIdsWhich specific facts have been proven

Using Metrics in Langfuse

Identifying Slow Calls

Filter generations by latency to find which agent calls are taking the longest. Common patterns:
  • Witness answers tend to be the slowest (longest output).
  • Judge rulings are typically fast (short, structured JSON output).
  • OCA objection checks vary — complex transcript context increases latency.

Tracking Cost

Use Langfuse’s cost dashboard to see spend broken down by agent type, model, and time period. If cost spikes, you can identify whether it’s due to increased usage, longer prompts, or a model override change. Filter traces by tag (e.g., direct vs. cross) and compare custom scores. If scoreDelta is consistently low during cross-examination, it may indicate the witness agent isn’t surfacing elicits effectively during that phase.