At General Robotics, our mission is bold - build general-purpose intelligence for every robot—across any form, task or environment. A true generalist robot needs to extend to all form factors and morphologies, be easily programmable and most importantly adhere to rigorous safety standards.
We have a distinguished perspective that true general-purpose intelligence in robotics emerges from a modular, composable set of AI capabilities—what we call robot AI skills. Our methodology, unlike the prevalent foundation models based efforts, is data efficient, interpretable, and verifiable for safety.
Ability to compose skills is a fundamental capability in humans as well – for example, when learning to drive a car or fly an aircraft, —we are taught to first master a set of distinct skills and learn to combine them through practice and feedback.
"Robots must follow the same path to be generally intelligent: develop reliable atomic capabilities, and gradually learn how to compose them to solve more complex, long-horizon tasks."
Robot AI Skills: Building blocks of Physical Intelligence
Each robot AI skill is a well-defined, interpretable unit of physical intelligence: a capability that enables a robot to reliably perform a task under uncertainty. Such robot AI skills (see Table 1), often implemented as but not limited to neural networks, include the ability for the robots to comprehend language, perceive the world, plan and take actions.

Green skills are neural techniques, and the blue ones manifest either as classical or neural techniques.
The AI skills-based approach offers several key advantages over monolithic learning systems. Training individual skills is far more efficient both in terms of data and compute, and many can deliver real-world value immediately. For example, training a forklift to precisely position its lifters is both tractable and immediately valuable. Similarly, having a manipulator arm handle a car part, or a drone inspect a cell tower are all capabilities that can be achieved rapidly without the need for massive datasets through a skill-based approach.
Just as importantly, individual skills have well-defined behaviors and failure modes, making them interpretable, testable, and debuggable—essential traits for safety-critical systems. Many of these skills also transfer naturally across robot morphologies and hardware platforms, enabling scalable intelligence with minimal re-engineering.
Crucially, the field already offers a vast and growing library of AI and robotics skills—spanning perception, planning, control, and language—that can be adopted and composed immediately. Moreover, it is possible to continuously update individual skills as new capabilities are developed. A skill-based framework makes it possible to harness such an ecosystem while remaining flexible across different sensor configurations, unlike monolithic models often tied to narrow, fixed use cases and inputs.
Emergence through Skill Compositions
Skill composition is a mechanism where robots can combine skills in a variety of ways to accomplish non-trivial long horizon tasks. These compositions can either be reasoned or learned.
Reasoned composition relies on explicit structure and logic—often generated through program synthesis or large language models. In this paradigm, skills are composed in sequential, parallel, interleaved, or hierarchical configurations. This approach is interpretable by design and offers a natural integration point between modern neural capabilities and classical robotics systems [1].
Example of a reasoned composition:
A household humanoid robot receives a command to"Go to the kitchen, and clean up the counter near the sink”, and generates a structured plan:
- Navigate to the kitchen (semantic memory + path planning)
- Approach and align with the counter (segmentation + visual servoing)
- Pick up and move objects from the surface (grasping, inverse kinematics, collision avoidance, motion planning)


Learned compositions on the other hand use mechanisms such as reinforcement learning, imitation learning and self-supervised learning to combine multiple skills. One possibility is where the basic skills act as tokens or features into a larger Neural Network – many existing foundation models in robotics follow such a design.
Example of a learned composition:
- A quadruped robot that intelligently switches between locomotion policies based on terrain using vision (e.g. regular walking on flat terrain vs parkour-like behavior on rocky uneven terrain)

Reasoning-based composition has a key advantage: it is interpretable and easily integrates neural skills with classical modules. It also benefits from mature, readily available tools and reasoning workflows — making it practical to deploy today. [2] However, its expressiveness is limited. Some behaviors, like full-body humanoid control, may require altering a specific underlying skill itself. Learning-based composition addresses this by exploring a broader space of combinations, including the ability to adapt or reshape base skills, but often at the cost of interpretability. In practice, both approaches are complementary.
Interpretability and Safety
At General Robotics, safety is not an afterthought—it is foundational to everything we build. We believe that deploying AI systems on physical machines without understanding their behavior is not just irresponsible—it’s dangerous. For example if we ask a monolithic VLA to “land the drone next to the blue landing pad in the backyard” and it accidentally lands in the swimming pool, it is nontrivial in general to backtrack and understand if model misunderstood the language, or visually confused the landing pad for the pool or simply predicted an incorrect path — whereas a modular approach would allow us to debug more directly. Our bottoms-up methodology is safety-first, where individual AI skills are amenable to rigorous component wise testing, verification, and understanding their limits.
But safety doesn’t stop at the skill level. When skills are composed into larger behaviors, we apply integrative testing informed by frontier research - leveraging techniques such as simulation feedback, experimental design, and finally real-world validation. Critically, simulation is our primary sandbox—the most effective environment for both individual skill validation and compositional stress testing. It enables structured experimentation, repeatable evaluation, and accelerated iteration, well before any code touches real hardware.
"The most effective safety measures come from a combination of proactive research and careful real-world testing. Thus, through GRID, we are already enabling rapid experimentation and controlled deployment of skills and compositions across all robots rapidly."
At the core of GRID is a broad library of AI Skills—modular, interpretable capabilities spanning perception, planning, control, and language. These skills are composable and deployable across both real and simulated robots, enabling the rapid construction of complex behaviors from well-defined parts.
These robot AI models are cloud-native and are accessible through a rich, unified API layer, unlocking advanced capabilities with minimal integration effort. But more than just a runtime, GRID is also an experimentation engine. It offers a vast library of simulation environments, robot morphologies, and application scenarios, enabling developers and robots alike to train, test, and iterate at scale.
Simulation plays a crucial role—not only for skill training, but for learning how to compose those skills effectively to solve higher-level tasks. It provides a fast, safe, and controllable environment where composition strategies can be explored and refined before deployment to the real world. This seamless integration of skills, APIs, and simulation enables truly agentic workflows—where robots can learn, compose, and improve autonomously.
GRID will continue to evolve, adding new skills, updating existing skills, and building upon the momentum robotics as a field is experiencing.
Making your robots useful faster,