Lightwheel Unveils RoboFinals
The Industrial-Grade Simulation Evaluation Platform that Finally
Challenges Frontier Robotics Foundation Models
Lightwheel Team
Dec 4, 2025
Today, Lightwheel is proud to announce RoboFinals, the industry’s first difficult-enough, industrial-grade, and frontier-model-capable simulation evaluation platform, purpose-built to measure real improvements in robotics foundation models (VLA models) at the cutting edge.
Coming soon, RoboFinals is designed for Frontier Labs, the teams pushing the limits of robotics foundation models and now facing their most urgent bottleneck: the lack of a sufficiently challenging, scalable, and trustworthy benchmark.
Why Frontier Labs
Need RoboFinals
Many VLA labs now face the same pattern: their robotics foundation models have outgrown nearly all existing academic simulation benchmarks. Models easily surpass these benchmarks, yet teams still lack a reliable way to understand true capability, measure progress, or compare systems at the frontier.
In response, labs fall back to real-world testing, but this approach does not scale. Unlike autonomous driving, robotics has no “shadow mode” equivalent, and meaningful evaluation requires hundreds of physical setups, continuous equipment maintenance, and strict safety procedures. The result is slow, resource-intensive testing that cannot keep pace with the speed of model development.
Even where simulation benchmarks do exist, they suffer from a deeper structural flaw: tasks are either overly simplified or unrealistically designed. This misalignment prevents teams from treating benchmark performance as a meaningful indicator of real-world behavior, creating a widening trust gap between simulation and deployment.
RoboFinals is built to solve all of these problems, establishing a new industry standard for evaluating frontier-scale robotics models.
The Benchmark:
RoboFinals-100
Examples of RoboFinals-100 Benchmark
At the core of the platform is RoboFinals-100, a 100-task benchmark built on top of Lightwheel’s SimReady Asset ecosystem. RoboFinals-100 spans progressive difficulty, high task diversity, and industry-aligned realism, ensuring that models are evaluated under conditions that closely reflect real-world challenges. The benchmark covers major application domains—including household tasks such as cleaning, organizing, storage, and object placement; factory tasks involving part handling, assembly, and machine interaction; and retail tasks such as restocking, sorting, and shelf operations.
A key differentiator of RoboFinals-100 is its comprehensive asset and interaction coverage, enabled by the Lightwheel SimReady Asset standard. The benchmark focuses on the hardest object classes and manipulation behaviors found in real environments, including rigid objects (tools, utensils, containers), articulated objects (appliances, cabinets, fridges, dials, knobs), and deformable materials such as cables, wires, cloth, and liquids. This breadth ensures that policy performance reflects real-world complexity rather than simplified simulation abstractions. All tasks follow unified success criteria, enabling consistent, fair, and comparable evaluation across teams and organizations. It also supports cross-robot evaluation, which measures model performance across three major robot embodiments, tabletop arms, mobile manipulators, and full loco-manipulation systems, providing a unified full-stack benchmark for today’s most advanced VLA systems.
The Platform:
Scalable, Reproducible,
Industry-Grade
The RoboFinals Platform is built directly on NVIDIA Isaac Lab — Arena, the upcoming unified robotics evaluation framework co-developed by Lightwheel and NVIDIA. It brings true large-scale evaluation to robotics by enabling massive batch execution under fully controlled, deterministic conditions. Tasks are automatically executed, logged, and analyzed, with integrated metrics covering performance across task types, difficulty levels, and domains. This allows teams to evaluate VLA and generalist robot models with unprecedented consistency, scale, and scientific rigor.
It supports both cloud-based and on-premise deployment, giving teams the flexibility to run evaluations in the environment that best fits their needs. The Cloud API is ideal for fast iteration, large-scale experimentation, and on-demand access to the full evaluation suite. For organizations requiring maximum security, customization, or tight integration with internal systems, RoboFinals can be deployed fully on-premise, ensuring complete control over data, workflows, and infrastructure.
RoboFinals enables labs to benchmark their agents across several physics engines to ensure robustness and cross-simulator generalization. Supported backends include NVIDIA Isaac Lab with Newton Physics as the primary industrial-grade solver, NVIDIA Isaac Lab with NVIDIA PhysX physics, MuJoCo, and Genesis. By consolidating results across these modalities, RoboFinals provides teams with a unified, comparable scoreboard for evaluating Embodied AI systems.
Real2Sim & Sim2Real Validation
RoboFinals incorporates full Real2Sim calibration across its SimReady asset library, aligning simulated object dynamics with their real-world counterparts to ensure physically grounded evaluation. To complement this, Lightwheel is building a controlled real-world benchmark designed to validate RoboFinals outcomes and to establish the industry’s first rigorous Sim–Real correlation dataset for frontier VLA and generalist robotic models. This dual-track approach enables quantitative assessment of model transferability across both domains.
In Collaboration
with Qwen
The Qwen Team is a partner in the development and adoption of RoboFinals. Together, we co-defined several of the industrial-grade scenes, task structures, and evaluation standards that power RoboFinals-100.
Qwen now uses RoboFinals for high-throughput, industry-aligned evaluation of their frontier Embodied AI models. RoboFinals enables Qwen to rapidly iterate, diagnose bottlenecks, and measure real capability gains beyond academic benchmarks. As one of the fastest-moving foundation model teams globally, Qwen plays a pivotal role in stress-testing RoboFinals and shaping its evolution into the industry standard for world-model-scale robotics evaluation.
How to Participate
Frontier labs interested in accessing RoboFinals can contact us directly