v1.0 ReleasedRead the announcement
DT-Agent Logo

DecodingTrust Agent Platform

A Real-World Simulation Platform forAdvanced Red-Teaming of AI Agents

Powered by DT-Red, our autonomous red-teaming agent, and DT-Bench, a comprehensive benchmark with 30+ sandbox environments, 15+ domains, and 500+ tasks per domain.

A research collaboration • Paper available on arXiv

Core Capabilities

Comprehensive Security Evaluation

Built for researchers and practitioners to rigorously test AI agent security across real-world regulatory scenarios.

30+ environments

High-Fidelity Sandboxes

30+ realistic environments including Gmail, PayPal, Databricks across finance, healthcare, and e-commerce.

Real policies

Policy-Aligned Evaluation

Risks derived from domain-specific policies like FINRA in Finance and Salesforce AI Use Policy for regulatory compliance.

Novel approach

DT-Red: Autonomous Red-Team Agent

First autonomous agent that iteratively optimizes attack vectors and injection locations to uncover vulnerabilities.

Universal

Black-Box Evaluation

Unified protocol supporting evaluation of any agentic system including Claude Code, Cursor, and custom agents.

500+ tasks

Comprehensive Task Coverage

Over 500 benign and malicious tasks per domain ensuring thorough security evaluation across attack surfaces.

High ASR

Scalable Discovery

Efficiently discover diverse, policy-aligned attack vectors with high success rates through automated optimization.

Spanning 15+ Real-World Domains

Each domain includes policy-aligned evaluation scenarios based on actual regulatory and compliance requirements.

Finance

FINRA4 envs

Healthcare

HIPAA3 envs

E-commerce

PCI-DSS3 envs

Email

CAN-SPAM2 envs

Payments

PCI3 envs

Cloud Infra

SOC24 envs

Data Platforms

GDPR3 envs

Productivity

Enterprise2 envs

Communication

Internal2 envs

Scheduling

Privacy1 envs

HR Systems

Employment2 envs

Logistics

Supply Chain2 envs

Education

FERPA2 envs

Enterprise

Salesforce3 envs

Web Services

ToS2 envs

Featured Sandbox Environments

GmailGoogle CalendarPayPalZoomSlackDatabricksSnowflakeSalesforceGoogle FormEbayTravelSuiteServiceNowAtlassian JiraRecommendation SystemOrangeHRMArxivWindows OSMac OSMicrosoft 365FilesystemTerminalHospital EHRSMS Messager+ more
Live Rankings

Security Robustness Leaderboard

Defense rates for top AI agents on DT-Bench v1.0

1
GPT-5(gpt-5)
72.3%
2
Claude-4(claude-4-opus)
68.9%
3
Gemini Pro(gemini-2.0)
65.4%
4
DeepSeek-V4(deepseek-v4)
62.1%
5
Llama-4(llama-4-70b)
58.7%
Defense rate measures resistance to policy-aligned attacksView All
View Full Leaderboard