Fri Jun 20 2025News

Task Spotlight June 20th 2025: Scientific Computing and Cryptography

Highlighting two new tasks for Terminal-Bench

Since releasing Terminal-Bench, we've added a dozen new tasks, and we're not stopping anytime soon!

This is the first in a series of posts highlighting exciting new tasks. We're continuing to expand the set of tasks to help you test and improve your agents, with an emphasis on realistic, challenging, and valuable tasks.

Parallelize Graph Algorithm: Scientific Computing with UPC

Difficulty: Hard | Domain: High-Performance Computing + Bioinformatics

Contributed by Jason Poulos

Our parallelize-graph task ventures into the specialized world of Unified Parallel C (UPC) programming, requiring agents to implement parallel de Bruijn graph construction for genome assembly. It's based on a real scientific computing problem: completing a complete genome from tiny pieces of DNA (k-mers).

Why This Challenges AI Agents

This task presents unique difficulties for agents operating in terminal environments:

  • Complex Tool Management: Agents must navigate building Berkeley UPC from source, managing SMP configurations, and debugging compilation issues through command-line interfaces.
  • Multi-step Verification: Agents must compile, run, and compare outputs across different implementations while managing complex dependencies.

The combination of complex build systems and domain expertise makes this a great task for Terminal-Bench.

Terminus in Action

Terminus with GPT-4.1 is unable to complete this task. It implements a serial version of the algorithm, but is unable to parallelize it correctly:

FEAL Linear Cryptanalysis: Breaking Ciphers

Difficulty: Hard | Domain: Cryptography + Security

Contributed by Nicholas Carlini

The feal-linear-cryptanalysis task brings academic cryptanalysis into a hands-on programming challenge, requiring agents to implement a sophisticated known-plaintext attack against a FEAL-like cipher. The FEAL cipher was is a family of block ciphers that were published in the 1980s and 1990s. It's known to be vulnerable to linear cryptanalysis, which is a type of cryptanalysis that uses linear approximations of the cipher's behavior to break it. Read more here.

Why This Challenges AI Agents

Cryptanalysis presents a perfect storm of difficulties for AI agents working in terminal environments:

  • Iterative Hypothesis Testing: The attack requires trying different approaches, analyzing results, and adapting strategy based on intermediate findings.
  • Bit Manipulation: Success depends on precise understanding of binary operations, bit patterns, and cipher internals that are difficult to verify without domain expertise.
  • Multi-stage Problem Solving: Agents must coordinate statistical analysis, key recovery, and verification steps while managing multiple executables and data files.

Terminus in Action

Terminus with GPT-4.1 also fails this task. It's over-confident in its approach and tries to use a method that doesn't work, ultimately signaling completion prematurely.

Get Started

Both tasks (and 90 more!) are available in the Terminal-Bench repository.

Written by

Mike Merrill