EQS AI Benchmark Volume 2: Latest Frontier Models Make Agentic Compliance Workflows a Practical Reality

Second benchmark edition shows major gains in open-ended compliance work, shifting the focus from model choice to real-world deployment.

AI has crossed a practical threshold in compliance & ethics. The EQS AI Benchmark Volume 2 shows that the latest generation of AI models not only improves performance, but can now reliably handle multi-step compliance workflows – a capability that was out of reach just six months ago.

Building on the first volume published in October 2025. EQS Group tested four newly released frontier AI models on the same set of 120 real-world compliance tasks. The updated benchmark, created in collaboration with the German association Berufsverband der Compliance Manager e.V. (BCM), now compares a total of ten leading models, providing a direct view of how the latest generation performs against last year’s frontier.

Frontier models converge at the top

In Volume 2. OpenAI’s GPT-5.4 now leads the benchmark with a score of 87.6%, closely followed by Google’s Gemini 3.1 Pro (87.4%) and Anthropic’s Claude Opus 4.6 (86.1%). The leading models are now separated by little more than one percentage point. This clustering signals a clear shift: while performance gains continue, leading models are approaching a practical ceiling for general compliance tasks, making deployment strategy more important than marginal differences in model capability.

Biggest gains in open-ended compliance work

The most meaningful improvements are seen in open-ended tasks such as drafting reports, policies, or investigation plans – tasks that closely mirror the work compliance teams deliver to internal stakeholders, management, and regulators. Across all vendors, performance in these tasks increased significantly, with improvements of up to +17-18 percentage points compared to the first report, moving outputs from “usable with heavy editing” to “usable with light review.”

Agentic compliance workflows cross a key threshold

The most important finding of the benchmark lies beyond individual task performance: AI models are now approaching the capability needed to support multi-step compliance workflows end-to-end. In a simulated Conflict of Interest process – covering classification, risk assessment, review routing, and mitigation – a single frontier model (GPT-5.4) achieved above 90% performance across each individual workflow step. While the benchmark did not test a fully connected agentic workflow, the results indicate that such workflows are becoming significantly more feasible than they were just six months ago.

“The benchmark shows how quickly AI is becoming a real driver of innovation in Compliance”, said Dr. Martin Benda, President of BCM. “The opportunity now is to translate these capabilities into practical applications – in a way that strengthens both effectiveness and responsible oversight.”

“Six months ago, the question was whether AI could support real compliance work. Today, the question is how we design workflows around it,” said Moritz Homann, Head of AI at EQS Group. “Agentic compliance is no longer a question of feasibility, but of design, especially where to place the right human oversight. The latest models are strong enough to handle multi-step processes, but the real differentiator is the context around them: the tools and checkpoints that make AI reliable in practice.”

EQS AI Benchmark Volume 2: Latest Frontier Models Make Agentic Compliance Workflows a Practical Reality

Up next

One evacuated passenger tests positive for hantavirus and another develops symptoms on flight home

Share article

发表回复取消回复

CCTV4：The documentary Homeland Restored: 80 Years Since Taiwan’s Recovery from Japanese Occupation is set to premiere

CGTN：Dialogue, development and shared prosperity

Jinjiang Culture & Tourism Group · 2026 “Samaranch Cup” Asian Basketball Masters Invitational Tournament opens in Jinjiang, SE China’s Fujian Province

CCTV4：Global Harmony for Reunion, a Marvelous Adventure of the Chinese New Year!

Beijing Review: Walking Through Time: China and U.S. Youths Explore Dali’s Past and Future

Salmonella outbreak tied to eggs has sickened 98 people since November

China hits travel platform Trip.com with $765M in penalties over monopoly abuses

US says it will begin lifting a suspension of Mexican cattle imports starting in August

EQS AI Benchmark Volume 2: Latest Frontier Models Make Agentic Compliance Workflows a Practical Reality

Up next

Share article

发表回复 取消回复

You May Also Like

发表回复取消回复