Click the Evaluate button on any trace turn to score its branches. Each evaluation makes two judge calls with swapped candidate orderings to cancel position bias, then averages the scores. LLM evaluations are subjective — treat scores as a directional signal, not a measurement.
127.0.0.1 only. Do not expose it on a network. Your API keys are sent from this page to the local backend and forwarded to Anthropic or OpenAI — they are never logged.
cache_r ≈ 10% of input; cache_w ≈ 125% of input). Approximate — verify against current pricing. Prices are hardcoded in public/app.js. Note: Opus 4.7 uses a new tokenizer that may use up to ~35% more tokens for the same text — effective cost can be higher than the per-MTok rate suggests.
A Playground for the Claude Advisor Tool
Created by iBuildWith.ai
Want to run it locally? Get it on GitHub
Check out the release notes
Contact Marcelo Lewin