🤖 AI’s True Power Isn’t Text or Images — It’s Robotics

Cyrus Kurd
8 min readFeb 7, 2025

--

The Real AI Revolution is in Intelligent Machines

While everyone continues to fixate on AI writing essays and generating art, the real revolution is happening below the surface — in robotics. AI-powered machines are already making their way into factories, grocery stores, warehouses, hospitals, and homes.

Making robots truly intelligent is the next major frontier of AI. It’s not just about handing an LLM to a human- or dog-shaped set of screws and scrap metal — it’s about physics, perception, control, and decision-making, even in unseen or uncertain conditions.

Movement, something humans take for granted, is extraordinarily complex for machines. Unlike AI models that process static text, robots must function in real time, reacting to changes in a dynamic world. We’ve had robots working in their own rooms or alone in factories for years, but it’s much harder to have a robot working alongside a human or in an uncontrolled environment. They must first perceive their environment through sensors — LiDAR, cameras, and IMUs — then interpret that data accurately, often making inferences when information is incomplete. The difficulty of perception extends to fundamental tasks such as route planning, object detection, and depth estimation, where even small miscalculations can be catastrophic.

Source: Multi-Camera Light Field Capture (Elijs Dima)

And not to mention the fact that an image simply does not have the necessary information for depth, as it is a 2D projection of a 3D world. Without sufficient cues like shading, texture gradients, occlusion, or stereoscopic vision, depth cannot be directly inferred from a single image. Just getting a computer to see the world with some relationship to how we understand it is a massive undertaking that took researchers decades to uncover, with the advent of deep neural networks.

Once a robot has built a model of its surroundings, it faces the challenge of control — how to interact with the world. Precision and adaptability are crucial, whether a robot is assembling microchips or moving warehouse inventory. Yet control isn’t simply about executing programmed motions; it requires responding to unexpected conditions. Objects slip, deform, or shift unpredictably, forcing AI-driven manipulation models to adjust in real time. Motion planning becomes exponentially harder in complex environments, where a robot must dodge moving obstacles or coordinate with human workers. Unlike humans, robots struggle with generalization; a model trained to handle a coffee mug might fail when presented with a wine glass or a banana. Transfer learning remains a significant hurdle, and the gap between knowing and doing remains one of the toughest frontiers in robotics today. We have developed particularly strong models in certain specific realms over the years, but still face the challenges of integration and generalization.

Beyond the difficulty of grasping — what happens next? A human can glance at a knife and immediately know whether it’s for cutting vegetables, opening a package, or spreading butter, depending on the situation. Robots, however, require explicit training to recognize an object’s function and how to interact with it correctly in context. For now, multi-modal AI is being used in an attempt to pair object recognition with context so that they can dynamically adjust their approach. However, lots of models still function on preprogrammed or pretrained heuristics. Another fundamental issue with robotic perception is not just identifying objects but semantic understanding — of their purpose and interaction dynamics.

Or what about using that object, while working alongside humans in less controlled environments? Most AI-powered robots today operate in isolation. Whether in fulfillment warehouses or on self-driving test tracks, they are optimized for a controlled environment where human unpredictability is minimized. But for robots to become truly integrated into daily life, they must not only perceive and act but also interpret human intent in real time. For example, imagine a robot working in a restaurant kitchen. If a human chef reaches for a pan that the robot is about to grab, the robot must immediately recognize the priority shift and change its course. AI-driven humanoids will have to process implicit social hierarchies, gesture-based cues, and speech intonation in environments where communication isn’t always verbal or clear. It’s no easy task. Traditional rule-based approaches fail here because they lack adaptability. Instead, reinforcement learning with human feedback (RLHF) is being explored to help robots better navigate collaborative spaces.

Despite these challenges and many more, AI is pushing robotics into a new era, where machines are learning to operate in unstructured environments, manipulate objects with dexterity, and make split-second decisions in unpredictable conditions. In humanoid robotics, reinforcement learning is helping Tesla’s Optimus and Figure AI’s robots refine their locomotion and manipulation capabilities. The idea is that robots will be set out into the world to learn physics, movement, and control on their own, based on rewards and punishments. In other words, much like how human children learn to interact with the world.

Boston Dynamics’ humanoid robot ‘Atlas’: https://www.youtube.com/watch?v=F_7IPm7f1vI (I highly recommend going down this rabbit-hole)

Google DeepMind’s RT-1 has demonstrated how a single model can generalize across multiple robotic tasks, while OpenAI’s Dactyl and Google’s Robotics Transformer-2 (RT-2) have shown that large language models can interpret instructions and translate them into physical actions. Autonomous vehicles are fast-evolving as well. Companies like Waymo and Tesla are shifting from rule-based autonomy to deep learning-based decision-making, allowing self-driving vehicles to dynamically adapt to their environments. Quadrupedal (dog-like) robots such as Spot, Unitree, and ANYmal are being used for industrial inspections, disaster response, and even military applications, navigating environments too dangerous for humans.

In other words, one of the problems at hand is that we need to bridge the gap between weak generalization and rigid specificity — the classic challenge of balancing overfitting and underfitting in machine learning.

If you recall how much worse some of the first mainstream LLMs were with math and current events, the breakthrough we needed was to implement code execution and search augmentation. When they were given enough information via specific subroutines or tasks, they were then able to more effectively generalize and move from isolated, pretrained test cases, into more autonomy.

A major shift in AI is happening — not just in robotics, but across all intelligent systems. The ability to call upon specialized functions or subroutines when needed is becoming the next frontier. A robot should be able to recognize when a task requires reinforcement learning, when it needs a physics simulation, and when it should call a specialized function that’s already solved for. It should be able to determine the right type of grip for a delicate object like a glass versus a tool like a hammer. Many of these individual skills already exist — the challenge is in making them work together in a self-directed, context-aware system. Just as humans break down complex tasks into sequences of smaller, callable functions, robots need the ability to plan multi-step actions dynamically. The shift from static, end-to-end training to function-based adaptable learning is what I believe will finally allow AI-powered robots to move beyond controlled test environments and into real-world autonomy.

Intuitively, one of the ways to begin seamlessly integrating multiple specializations with some cohesiveness is by using LLMs. And indeed, OpenAI seems to be planning on making this jump into humanoid robots in the future. Unfortunately, however, the security of LLMs in robots remains unresolved for now. Research from Penn Robotics has demonstrated how LLM-powered robotic systems can be manipulated through adversarial inputs, raising concerns about security vulnerabilities. Subtle prompt injections and unauthorized function calls have been shown to cause robots to execute unintended actions, a serious risk, for example, in industrial, military, and consumer applications. Ensuring AI security requires hardware-level safeguards, stronger access controls, and adversarial testing to prevent misuse. As robots become more autonomous, ensuring their safety against exploitation will have to come alongside their capabilities.

The shift from software to hardware will be what truly determines which companies and nations lead the technological era. Industrial AI-powered robots won’t just improve efficiency — they’ll redefine production and costs altogether. Humanoid robots will go from clunky prototypes to integrated assistants in the workforce, forcing industries to rethink labor models. Autonomous machines will soon handle critical infrastructure — transportation, healthcare, security — with or without human assistance in these tasks.

As robotics becomes more advanced, there is a growing concern that AI-powered autonomous weapons and security forces will be the first large-scale deployment of humanoid machines. Militaries worldwide are already investing heavily in drone swarms, autonomous defense systems, and AI-assisted battlefield strategy (note, however, that human-operated versions are still significantly outperforming autonomous ones in nearly all cases, particularly dynamic and complex engagements). The leap from remote-controlled drones to fully autonomous robotic soldiers could fundamentally reshape warfare, raising serious ethical and strategic dilemmas. As Pope Francis stated, lethal autonomous weapons could “irreversibly alter the nature of warfare, detaching it further from human agency.” Despite such concerns, U.S. governmental oversight and counteraction have remained minimal, with no binding regulations limiting the deployment of AI-driven military systems. If left unchecked, this shift could accelerate an arms race in autonomous warfare, lowering the threshold for conflict and increasing the risk of unintended escalations (bad).

The challenges of implementation as well as security, governance, and ethical deployment are real, but they should not overshadow the massive potential of AI-powered robotics. Instead of fearing autonomy, we need to actively shape its trajectory — investing not just in regulation, but in fundamental breakthroughs that allow robots to learn, adapt, and integrate dynamically. The real future of robotics isn’t just a machine blindly executing preprogrammed rules — it’s a system that can intelligently call upon subroutines, apply reasoning (and ethics) to complex tasks, and refine its own behavior through learning to benefit humans.

Right now, robotics lags behind other AI fields in its ability to generalize and self-correct, but this doesn’t have to be the case. A truly intelligent robotic system wouldn’t just execute a task — it would recognize when its approach is failing and adjust accordingly. Whether it’s choosing the right grasp for a delicate object, selecting a navigation strategy in an unpredictable environment, or refining its approach to collaborative work with humans, robots should be able to modularly assemble their own solutions rather than relying on brittle, one-size-fits-all models.

Investing in cohesive, modular AI architectures — where robots dynamically select and optimize their own action sets — is how we move past today’s rigid automation. This is what will transform humanoid robots from mechanical tools into truly intelligent systems. The goal isn’t just to eliminate human labor — it’s to expand human capability, unlock new forms of collaboration between humans and machines, and ensure that robotics is built with both resilience and responsibility in mind.

What happens when we succeed? What does a world look like when AI-powered machines can handle everything — freeing humanity from economic necessity? If virtual and physical machines take over all production, all service jobs, and even intellectual labor, then what remains for us to do?

Upcoming: A deep dive into the post-labor world and what it means for our future.

--

--

Cyrus Kurd
Cyrus Kurd

Written by Cyrus Kurd

M.S. Data Science Student at Columbia University | linkedin.com/in/cykurd/

Responses (6)