Documentation Index
Fetch the complete documentation index at: https://docs.litigationlabs.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Langfuse captures the full prompt, messages, and model response for every agent call in CaseSim. While LitigationLabs doesn’t use Langfuse’s Playground feature directly, the generation data in Langfuse gives you everything you need to test and iterate on prompts.Using Generations as a Playground
Every generation span in Langfuse contains:- The system prompt — The full instructions the agent received.
- The message history — The conversation context sent to the model.
- The model response — What the agent produced.
- The model and parameters — Which model, temperature, and tokens were used.
Replaying a Prompt
- Open a trace in Langfuse and find the generation you want to test.
- Copy the system prompt and messages from the generation details.
- Paste them into any LLM playground (OpenAI Playground, Anthropic Console, or Langfuse’s built-in playground if enabled).
- Modify the prompt and re-run to see how the output changes.
- When satisfied, update the agent config in Payload to deploy the new prompt.
Comparing Across Sessions
To evaluate a prompt change:- Note the generation outputs for a specific scenario before the change.
- Apply the prompt update via the Prompt Editor.
- Run the same scenario again.
- Compare the new generations against the old ones in Langfuse.
courtroom/turn-stream) and date range to isolate before/after data.
Testing with Evaluations
For more structured testing, use the evaluation system:- Run an automated eval batch against sessions that used the old prompt.
- Update the prompt and run new sessions.
- Run another eval batch against the new sessions.
- Compare metric scores (Affidavit Faithfulness, Ruling Correctness, etc.) between the two batches.