Metrics

Overview

Every AI call in CaseSim produces metrics — token counts, latency, model cost, and custom scores that capture the outcome of each courtroom interaction. These metrics flow into Langfuse automatically, giving you a real-time view of system performance and agent quality.

Automatic Metrics

Langfuse captures these for every generation (LLM call) without any additional configuration:

Metric	What It Measures
Input tokens	How many tokens were in the prompt sent to the model
Output tokens	How many tokens the model generated in response
Total tokens	Sum of input + output
Latency	Time from request to complete response, in milliseconds
Model	Which model handled the call (e.g., `gpt-4o`, `claude-sonnet-4-20250514`)
Cost	Estimated cost based on token counts and model pricing

These metrics are captured at the generation level, so you can compare performance across individual agent calls within a single trace.

Custom Scores

Beyond automatic metrics, each traced route pushes custom scores into its trace output. These capture the outcome of the courtroom interaction, not just the infrastructure cost.

Turn Stream (`courtroom/turn-stream`)

When a player asks a question, the trace output includes:

scoreDelta — Points earned from the witness’s response (based on elicits unlocked).
rebuttalPoints — Points from cross-examination rebuttal coverage.
ocaDecision — Whether opposing counsel objected (objection or no_objection).
witnessResponse — The full text of the witness answer.

Player Objection (`courtroom/objection`)

When a player objects to an OCA question:

playerObjectionAccuracy — Whether the player’s objection was legally correct.
playerObjectionRuling — The judge’s ruling (sustain or overrule).
playerCorrectCount — Running total of correct objections in the session.
playerIncorrectCount — Running total of incorrect objections.

OCA Question (`courtroom/oca-question`)

When the OCA generates a question during cross-examination:

questionGenerated — The full text of the question.
isIntentionallyDefective — Whether the question was designed to be objectionable (pedagogical trap).
coveredTopics — Topics already addressed in the examination.
pendingTopics — Topics remaining for OCA to cover.

Session-Level Metrics

These metrics accumulate across an entire CaseSim session and are persisted in the database:

Metric	Description
`questionsAsked`	Total questions submitted by the player
`objectionsRaised`	Total objections raised by OCA
`objectionsSustained`	How many OCA objections the judge sustained
`playerObjectionsRaised`	Total objections raised by the player
`playerCorrectObjections`	Player objections that were legally correct
`playerIncorrectObjections`	Player objections that were incorrect
`playerMissedObjections`	Defective OCA questions the player failed to catch
`intentionallyIncorrectObjections`	Pedagogical traps set by OCA
`score.points`	Running point total
`score.elicitsHit`	Number of key facts successfully established
`score.unlockedElicitIds`	Which specific facts have been proven

Using Metrics in Langfuse

Identifying Slow Calls

Filter generations by latency to find which agent calls are taking the longest. Common patterns:

Witness answers tend to be the slowest (longest output).
Judge rulings are typically fast (short, structured JSON output).
OCA objection checks vary — complex transcript context increases latency.

Tracking Cost

Use Langfuse’s cost dashboard to see spend broken down by agent type, model, and time period. If cost spikes, you can identify whether it’s due to increased usage, longer prompts, or a model override change.

Spotting Quality Trends

Filter traces by tag (e.g., direct vs. cross) and compare custom scores. If scoreDelta is consistently low during cross-examination, it may indicate the witness agent isn’t surfacing elicits effectively during that phase.

Observability

Prompts & Testing

Quality & Evaluation

Overview

Automatic Metrics

Custom Scores

Turn Stream (`courtroom/turn-stream`)

Player Objection (`courtroom/objection`)

OCA Question (`courtroom/oca-question`)

Session-Level Metrics

Using Metrics in Langfuse

Identifying Slow Calls

Tracking Cost

Spotting Quality Trends

Observability

Prompts & Testing

Quality & Evaluation

Documentation Index

​Overview

​Automatic Metrics

​Custom Scores

​Turn Stream (courtroom/turn-stream)

​Player Objection (courtroom/objection)

​OCA Question (courtroom/oca-question)

​Session-Level Metrics

​Using Metrics in Langfuse

​Identifying Slow Calls

​Tracking Cost

​Spotting Quality Trends

Overview

Automatic Metrics

Custom Scores

Turn Stream (`courtroom/turn-stream`)

Player Objection (`courtroom/objection`)

OCA Question (`courtroom/oca-question`)

Session-Level Metrics

Using Metrics in Langfuse

Identifying Slow Calls

Tracking Cost

Spotting Quality Trends