Intelligence is leaving the screen

For decades, making a robot do something useful meant writing thousands of lines of very specific code. Pick up this object at these coordinates, rotate 47 degrees, place it here. Every edge case manually handled. Every new task, basically a new program. A very expensive if-else statement bolted to a mechanical arm.

New models called VLAs, vision-language-action models, collapse the entire robotics stack into a single neural network. The robot sees through a camera, receives an instruction in plain language, and the model outputs motor commands directly. No need for hand-coded perception pipeline. One model does the seeing, the understanding, and the moving. Google demonstrated this with RT-2, and what started as a research demo is now something companies are building real products around. This mirrors what happened in software. We went from writing assembly to writing Python. From managing servers to deploying on serverless. Each step removed a layer of complexity and let more people build useful things. Computers still execute instructions, but the abstraction layer rose high enough that you didn't need to understand the entire stack to get something done. You trust that your Python code gets translated to assembly.

Physical AI is doing the same thing for the real world.

Instead of programming a robot for every specific task, you train a model that generalizes across tasks. Show it a few demonstrations of folding laundry, and it figures out how to fold a shirt it's never seen. The interesting thing is that the model learned to manipulate things whose shape changes as you touch them. Cloth, dough, rope. You can't hard-code the geometry of a crumpled t-shirt because it's different every single time. You'd need to define every possible state of the fabric, which is infinite. Traditional robotics cannot solve this challenge.

These models skip the problem entirely. They build an internal sense of how flexible things behave (in latent space), and that understanding transfers to anything with similar properties.

NVIDIA open-sourced foundation models for this last year, which dropped the barrier from "you need a world-class robotics lab" to "you need a good team and enough data." And if we have learned something from the LLM space is that when the tooling is open, adoption accelerates. Open weights, community fine-tuning, specialized models for specific industries.

I think about this from the consulting side constantly. We spend a lot of time helping companies adopt AI for their digital operations. All of that is intelligence applied to information on screens/databases.

Physical AI is intelligence applied to atoms.

Similar concepts impacting a different surface. And most companies haven't even started thinking about it. This will likely translate to a steep adoption curve when technology is more mature.

Two years ago, a robot folding a towel was a headline. Now companies are training single architectures that handle dozens of manipulation tasks. The capability curve looks like LLMs circa 2023, fast enough that the gap between lab demos and production deployments is shrinking every quarter. Especially as the technology becomes more capable.

It's worth paying more attention. Companies figuring out how to deploy intelligence in the physical world are going to operate at a completely different level than the ones still programming robots line by line.

The if-else era built an entire industry. There's room for a new one.