Lightwheel Robofinals

Lightwheel Unveils RoboFinals

The Industrial-Grade Simulation Evaluation Platform that Finally
Challenges Frontier Robotics Foundation Models

Lightwheel Team

Dec 4, 2025

Today, Lightwheel is proud to announce RoboFinals, the industry’s first difficult-enough, industrial-grade, and frontier-model-capable simulation evaluation platform, purpose-built to measure real improvements in robotics foundation models (VLA models) at the cutting edge.

Coming soon, RoboFinals is designed for Frontier Labs, the teams pushing the limits of robotics foundation models and now facing their most urgent bottleneck: the lack of a sufficiently challenging, scalable, and trustworthy benchmark.

Why Frontier Labs
Need RoboFinals

Many VLA labs now face the same pattern: their robotics foundation models have outgrown nearly all existing academic simulation benchmarks. Models easily surpass these benchmarks, yet teams still lack a reliable way to understand true capability, measure progress, or compare systems at the frontier.

In response, labs fall back to real-world testing, but this approach does not scale. Unlike autonomous driving, robotics has no “shadow mode” equivalent, and meaningful evaluation requires hundreds of physical setups, continuous equipment maintenance, and strict safety procedures. The result is slow, resource-intensive testing that cannot keep pace with the speed of model development.

Even where simulation benchmarks do exist, they suffer from a deeper structural flaw: tasks are either overly simplified or unrealistically designed. This misalignment prevents teams from treating benchmark performance as a meaningful indicator of real-world behavior, creating a widening trust gap between simulation and deployment.

RoboFinals is built to solve all of these problems, establishing a new industry standard for evaluating frontier-scale robotics models.

The Benchmark:
RoboFinals-100

Washing Dishes: Start Dishwasher Cycle

Examples of RoboFinals-100 Benchmark

At the core of the platform is RoboFinals-100, a 100-task benchmark built on top of Lightwheel’s SimReady Asset ecosystem. RoboFinals-100 spans progressive difficulty, high task diversity, and industry-aligned realism, ensuring that models are evaluated under conditions that closely reflect real-world challenges. The benchmark covers major application domains—including household tasks such as cleaning, organizing, storage, and object placement; factory tasks involving part handling, assembly, and machine interaction; and retail tasks such as restocking, sorting, and shelf operations.

A key differentiator of RoboFinals-100 is its comprehensive asset and interaction coverage, enabled by the Lightwheel SimReady Asset standard. The benchmark focuses on the hardest object classes and manipulation behaviors found in real environments, including rigid objects (tools, utensils, containers), articulated objects (appliances, cabinets, fridges, dials, knobs), and deformable materials such as cables, wires, cloth, and liquids. This breadth ensures that policy performance reflects real-world complexity rather than simplified simulation abstractions. All tasks follow unified success criteria, enabling consistent, fair, and comparable evaluation across teams and organizations. It also supports cross-robot evaluation, which measures model performance across three major robot embodiments, tabletop arms, mobile manipulators, and full loco-manipulation systems, providing a unified full-stack benchmark for today’s most advanced VLA systems.

The Platform:
Scalable, Reproducible,
Industry-Grade

The RoboFinals Platform is built directly on NVIDIA Isaac Lab — Arena, the upcoming unified robotics evaluation framework co-developed by Lightwheel and NVIDIA. It brings true large-scale evaluation to robotics by enabling massive batch execution under fully controlled, deterministic conditions. Tasks are automatically executed, logged, and analyzed, with integrated metrics covering performance across task types, difficulty levels, and domains. This allows teams to evaluate VLA and generalist robot models with unprecedented consistency, scale, and scientific rigor.

It supports both cloud-based and on-premise deployment, giving teams the flexibility to run evaluations in the environment that best fits their needs. The Cloud API is ideal for fast iteration, large-scale experimentation, and on-demand access to the full evaluation suite. For organizations requiring maximum security, customization, or tight integration with internal systems, RoboFinals can be deployed fully on-premise, ensuring complete control over data, workflows, and infrastructure.

RoboFinals enables labs to benchmark their agents across several physics engines to ensure robustness and cross-simulator generalization. Supported backends include NVIDIA Isaac Lab with Newton Physics as the primary industrial-grade solver, NVIDIA Isaac Lab with NVIDIA PhysX physics, MuJoCo, and Genesis. By consolidating results across these modalities, RoboFinals provides teams with a unified, comparable scoreboard for evaluating Embodied AI systems.

Real2Sim & Sim2Real Validation

RoboFinals incorporates full Real2Sim calibration across its SimReady asset library, aligning simulated object dynamics with their real-world counterparts to ensure physically grounded evaluation. To complement this, Lightwheel is building a controlled real-world benchmark designed to validate RoboFinals outcomes and to establish the industry’s first rigorous Sim–Real correlation dataset for frontier VLA and generalist robotic models. This dual-track approach enables quantitative assessment of model transferability across both domains.

In Collaboration
with Qwen

The Qwen Team is a partner in the development and adoption of RoboFinals. Together, we co-defined several of the industrial-grade scenes, task structures, and evaluation standards that power RoboFinals-100.

Qwen now uses RoboFinals for high-throughput, industry-aligned evaluation of their frontier Embodied AI models. RoboFinals enables Qwen to rapidly iterate, diagnose bottlenecks, and measure real capability gains beyond academic benchmarks. As one of the fastest-moving foundation model teams globally, Qwen plays a pivotal role in stress-testing RoboFinals and shaping its evolution into the industry standard for world-model-scale robotics evaluation.

How to Participate

Frontier labs interested in accessing RoboFinals can contact us directly

Lightwheel Unveils RoboFinals

The Industrial-Grade Simulation Evaluation Platform that Finally
Challenges Frontier Robotics Foundation Models

Lightwheel Team

Dec 4, 2025

Why Frontier Labs
Need RoboFinals

The Benchmark:
RoboFinals-100

Washing Dishes: Start Dishwasher Cycle

Examples of RoboFinals-100 Benchmark

The Platform:
Scalable, Reproducible,
Industry-Grade

Real2Sim & Sim2Real Validation

In Collaboration
with Qwen

How to Participate

Frontier labs interested in accessing RoboFinals can contact us directly

Lightwheel Robofinals

Lightwheel Unveils RoboFinals

Why Frontier Labs Need RoboFinals

The Benchmark: RoboFinals-100

The Platform: Scalable, Reproducible, Industry-Grade

Real2Sim & Sim2Real Validation

In Collaboration with Qwen

How to Participate

Lightwheel Robofinals

Lightwheel Unveils RoboFinals

Why Frontier Labs Need RoboFinals

The Benchmark: RoboFinals-100

The Platform: Scalable, Reproducible, Industry-Grade

Real2Sim & Sim2Real Validation

In Collaboration with Qwen

How to Participate

Why Frontier Labs
Need RoboFinals

The Benchmark:
RoboFinals-100

The Platform:
Scalable, Reproducible,
Industry-Grade

In Collaboration
with Qwen

Why Frontier Labs
Need RoboFinals

The Benchmark:
RoboFinals-100

The Platform:
Scalable, Reproducible,
Industry-Grade

In Collaboration
with Qwen