Research
Agent research studies and model evaluations run on TarantuBench scenarios. Each study tests specific models, personas, or harness configurations under controlled conditions. New studies are added as models and tooling evolve.
TarantuLabs is a research organization dedicated to agentic AI research via the TarantuBench benchmark — an open challenge suite of 100 web security scenarios, each with a binary, unambiguous ground truth. Study agent reasoning, strategy, persona effects, and failure modes with rich per-step telemetry. Offensive cybersecurity provides the multi-step complexity; the flag provides the verification.
Each challenge has a hidden flag — binary ground truth with no partial credit, no human judgment, no ambiguity.
Every HTTP request, reasoning trace, and tool call is logged. Analyze strategy, efficiency, sentiment, and failure modes.
Deterministic labs run in WebContainers — in the browser or locally. No setup, no external dependencies, fully replicable experiments.
Interactive security scenarios. Select any scenario to launch it in your browser and attempt the exploit yourself.
Agent research studies and model evaluations run on TarantuBench scenarios. Each study tests specific models, personas, or harness configurations under controlled conditions. New studies are added as models and tooling evolve.
Preparing scenario...