Benchmarks
Structured evaluations of frontier AI models on TarantuBench scenarios. Each benchmark tests a specific set of models and configurations under controlled conditions. New benchmarks are added as models and tooling evolve.
TarantuBench is an open evaluation framework for measuring how frontier AI models perform on realistic web security scenarios — from simple injections to multi-step exploit chains.
SQL injection, XSS, auth bypass, SSRF, IDOR, command injection, JWT exploits, and multi-step chains.
Claude 4.5 Sonnet, GPT-5, and Gemini 3 Pro evaluated with identical tooling and constraints.
Every scenario runs live in WebContainers. Interact with real vulnerable applications — no setup required.
Interactive security scenarios. Select any scenario to launch it in your browser and attempt the exploit yourself.
Structured evaluations of frontier AI models on TarantuBench scenarios. Each benchmark tests a specific set of models and configurations under controlled conditions. New benchmarks are added as models and tooling evolve.
Preparing scenario...