Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
security benchmarking benchmark research ai evaluations hacking artificial-intelligence cybersecurity ctf agents offensive-security ai-agents benchmark-datasets llm cyber-evals
-
Updated
Sep 3, 2025 - Jupyter Notebook