The sense that robotics often forgets
If you watch a skilled human pick up a fragile object, the
interesting action is not in what their eyes do. It is in what their fingertips
do. They make contact, detect the surface texture, sense the object's
resistance and weight, and continuously adjust the pressure of their grip in
response to what they feel. The whole process happens in a fraction of a
second, largely without conscious thought.
Robots are getting very good at seeing. Computer vision
has advanced enormously, and modern physical AI systems can identify and locate
objects with impressive accuracy. But seeing where an object is and knowing how
to physically handle it are two different problems. The second one requires
touch data, and touch data is one of the most underrepresented inputs in
physical AI training datasets.
This is why you can watch a robot pick up a foam block
flawlessly and then crush a slightly different foam block because it applied
the same grip pressure. The robot did not see the difference. It did not feel
the difference either, because nobody taught it to.
What force and tactile sensors actually capture
Force and torque sensors measure the forces and moments
acting at a specific point, typically at the robot's wrist or gripper. When a
robot gripper closes around an object, force sensors measure how much pressure
is being applied. When the robot moves its arm, force sensors measure the
resistance the object provides.
Tactile sensors go further. Distributed across the surface
of a gripper or robotic hand, they measure the distribution of pressure across
contact points, the texture of the surface being touched, the degree of slip
between the gripper and the object, and the deformation of compliant objects
under grip pressure.
Together, these sensors provide the kind of information
that hands provide to humans: how hard is the object? Is the grip secure or
starting to slip? Is the material deforming under pressure in a way that
suggests it might break? Is this object heavier than it looks based on how much
it resists movement?
This information is not visible in camera images. It is
not present in lidar point clouds. A physical AI model trained only on visual
sensor data will never encounter this information during training and will
never be able to use it during deployment, even if force sensors are physically
present on the robot.
A robot that can see an egg but cannot feel it does not
know how to hold it. Vision tells you where to reach. Touch tells you how to
grip.
Why tactile data is so hard to collect
Collecting useful tactile training data is significantly
more challenging than collecting visual data, and the difficulty is worth
understanding clearly because it explains why most programs skip it.
The volume challenge: a camera captures millions of pixels
per frame. A force sensor captures a handful of values per measurement.
Building a training dataset of tactile interactions with genuine diversity
requires many, many physical interactions with many different objects under
many different conditions. You cannot aggregate internet-scale tactile data the
way you can aggregate visual data.
The consistency challenge: the same object gripped by two
different robots with the same command will produce different force sensor
readings because of small differences in the gripper surface, the calibration
state of the sensor, the exact positioning of the robot. Normalizing and
annotating across this physical variation is harder than normalizing across
camera calibration differences.
The annotation challenge: labeling a force sensor
recording requires understanding what a successful grasp feels like, what a
near-miss feels like, what an impending slip looks like in the force data
before it becomes an actual slip. This is specialized knowledge, and annotation
guidelines for tactile data require more domain understanding than guidelines
for visual data.
What good tactile training data looks like
A useful tactile training dataset for a manipulation task
needs to cover several types of variation that often get ignored in more
straightforward collection programs.
Object variation: the same grasp task should be
demonstrated across objects with different weights, textures, rigidity, and
fragility. A model that has only felt rigid objects will not know what to do
with a compliant one. A model that has only felt lightweight objects will not
adapt its grip appropriately to a heavy one.
Condition variation: grip contact events should be
recorded across different states of the gripper surface, different calibration
states of the force sensor, different speeds of approach and closure. Real
operations will include all of these variations; the training data should too.
Failure variation: near-miss grasps, slipping events, and
unsuccessful grasp attempts are some of the most valuable training examples.
They teach the model what impending failure looks like in the force data, which
is the information it needs to avoid failure in deployment.
Success variation: not all successful grasps look the
same. A precise, controlled grasp of a fragile object produces a different
force profile than a robust, fast grasp of a durable one. Both should be
represented in training data with labels that reflect the different task
requirements.
The connection to real-world manipulation performance
The manipulation tasks that physical AI systems currently
handle well are generally the ones where visual information is sufficient: pick
and place operations with rigid objects of known weight, object identification
and sorting by visual category, navigation and workspace traversal. These tasks
are well-served by camera and lidar data alone.
The manipulation tasks that remain genuinely difficult are
almost universally ones where tactile information is critical: handling fragile
objects, working with soft or deformable materials, performing assembly tasks
that require precise insertion under contact uncertainty, detecting slip and
adjusting grip in real time. These tasks are hard not because the visual models
are not good enough but because the tactile models do not have the training
data to support them.
As physical AI applications push into more demanding
manipulation domains, the investment in tactile data collection and annotation
will become increasingly central to system capability. The teams that build
this expertise now will have capabilities that are genuinely difficult for
later entrants to replicate quickly.
Starting with force data before tactile data
For teams beginning to incorporate contact sensing into
their physical AI programs, wrist-mounted force and torque sensors are a more
accessible starting point than distributed tactile sensing arrays. The hardware
is more mature, the data volumes are more manageable, and the annotation
requirements are somewhat more straightforward.
Even basic force sensor data, if collected systematically
and annotated thoughtfully, can substantially improve manipulation performance
for tasks that involve contact uncertainty. Detecting when a grasp is secure
versus marginal, adapting grip force to object weight, recognizing when an
insertion is meeting unexpected resistance: all of these capabilities can be
built with force data before investing in the more complex infrastructure of
distributed tactile sensing.
Touch is a sense. Robots need to learn it the same way
they learn to see: through data that faithfully represents the physical
experience of contact with the real world.
The sense that robotics often forgets
If you watch a skilled human pick up a fragile object, the interesting action is not in what their eyes do. It is in what their fingertips do. They make contact, detect the surface texture, sense the object's resistance and weight, and continuously adjust the pressure of their grip in response to what they feel. The whole process happens in a fraction of a second, largely without conscious thought.
Robots are getting very good at seeing. Computer vision has advanced enormously, and modern physical AI systems can identify and locate objects with impressive accuracy. But seeing where an object is and knowing how to physically handle it are two different problems. The second one requires touch data, and touch data is one of the most underrepresented inputs in physical AI training datasets.
This is why you can watch a robot pick up a foam block flawlessly and then crush a slightly different foam block because it applied the same grip pressure. The robot did not see the difference. It did not feel the difference either, because nobody taught it to.
What force and tactile sensors actually capture
Force and torque sensors measure the forces and moments acting at a specific point, typically at the robot's wrist or gripper. When a robot gripper closes around an object, force sensors measure how much pressure is being applied. When the robot moves its arm, force sensors measure the resistance the object provides.
Tactile sensors go further. Distributed across the surface of a gripper or robotic hand, they measure the distribution of pressure across contact points, the texture of the surface being touched, the degree of slip between the gripper and the object, and the deformation of compliant objects under grip pressure.
Together, these sensors provide the kind of information that hands provide to humans: how hard is the object? Is the grip secure or starting to slip? Is the material deforming under pressure in a way that suggests it might break? Is this object heavier than it looks based on how much it resists movement?
This information is not visible in camera images. It is not present in lidar point clouds. A physical AI model trained only on visual sensor data will never encounter this information during training and will never be able to use it during deployment, even if force sensors are physically present on the robot.
A robot that can see an egg but cannot feel it does not know how to hold it. Vision tells you where to reach. Touch tells you how to grip.
Why tactile data is so hard to collect
Collecting useful tactile training data is significantly more challenging than collecting visual data, and the difficulty is worth understanding clearly because it explains why most programs skip it.
The volume challenge: a camera captures millions of pixels per frame. A force sensor captures a handful of values per measurement. Building a training dataset of tactile interactions with genuine diversity requires many, many physical interactions with many different objects under many different conditions. You cannot aggregate internet-scale tactile data the way you can aggregate visual data.
The consistency challenge: the same object gripped by two different robots with the same command will produce different force sensor readings because of small differences in the gripper surface, the calibration state of the sensor, the exact positioning of the robot. Normalizing and annotating across this physical variation is harder than normalizing across camera calibration differences.
The annotation challenge: labeling a force sensor recording requires understanding what a successful grasp feels like, what a near-miss feels like, what an impending slip looks like in the force data before it becomes an actual slip. This is specialized knowledge, and annotation guidelines for tactile data require more domain understanding than guidelines for visual data.
What good tactile training data looks like
A useful tactile training dataset for a manipulation task needs to cover several types of variation that often get ignored in more straightforward collection programs.
Object variation: the same grasp task should be demonstrated across objects with different weights, textures, rigidity, and fragility. A model that has only felt rigid objects will not know what to do with a compliant one. A model that has only felt lightweight objects will not adapt its grip appropriately to a heavy one.
Condition variation: grip contact events should be recorded across different states of the gripper surface, different calibration states of the force sensor, different speeds of approach and closure. Real operations will include all of these variations; the training data should too.
Failure variation: near-miss grasps, slipping events, and unsuccessful grasp attempts are some of the most valuable training examples. They teach the model what impending failure looks like in the force data, which is the information it needs to avoid failure in deployment.
Success variation: not all successful grasps look the same. A precise, controlled grasp of a fragile object produces a different force profile than a robust, fast grasp of a durable one. Both should be represented in training data with labels that reflect the different task requirements.
The connection to real-world manipulation performance
The manipulation tasks that physical AI systems currently handle well are generally the ones where visual information is sufficient: pick and place operations with rigid objects of known weight, object identification and sorting by visual category, navigation and workspace traversal. These tasks are well-served by camera and lidar data alone.
The manipulation tasks that remain genuinely difficult are almost universally ones where tactile information is critical: handling fragile objects, working with soft or deformable materials, performing assembly tasks that require precise insertion under contact uncertainty, detecting slip and adjusting grip in real time. These tasks are hard not because the visual models are not good enough but because the tactile models do not have the training data to support them.
As physical AI applications push into more demanding manipulation domains, the investment in tactile data collection and annotation will become increasingly central to system capability. The teams that build this expertise now will have capabilities that are genuinely difficult for later entrants to replicate quickly.
Starting with force data before tactile data
For teams beginning to incorporate contact sensing into their physical AI programs, wrist-mounted force and torque sensors are a more accessible starting point than distributed tactile sensing arrays. The hardware is more mature, the data volumes are more manageable, and the annotation requirements are somewhat more straightforward.
Even basic force sensor data, if collected systematically and annotated thoughtfully, can substantially improve manipulation performance for tasks that involve contact uncertainty. Detecting when a grasp is secure versus marginal, adapting grip force to object weight, recognizing when an insertion is meeting unexpected resistance: all of these capabilities can be built with force data before investing in the more complex infrastructure of distributed tactile sensing.
Touch is a sense. Robots need to learn it the same way they learn to see: through data that faithfully represents the physical experience of contact with the real world.