Research / Benchmarks

Benchmarks before claims.

We publish benchmark results with full methodology — parameter count, training compute, energy use, and accuracy — before any product positioning.

Evaluation suite

Primary benchmarks: ARC-AGI for abstract reasoning, Lean theorem proving for formal verification, and planning/maze tasks for sequential decision making.

Secondary metrics track sample efficiency, inference latency on consumer and datacenter GPUs, and energy per correct answer.

Shipped capabilities

  • ARC-AGI public leaderboard submissions
  • Lean proof success rate tracking
  • Planning task accuracy and step efficiency
  • RTX 4090 and datacenter GPU baselines
  • Reproducible eval harness in open source

The future of AI requires sovereign infrastructure, trustworthy reasoning and enterprise governance.