50–100
annotations per video
97–100%
accuracy achieved
60 days
turnaround
60
annotators deployed
1
The Challenge
Uber’s robotics division required pixel-level video segmentation of robotic arm actions to train physical AI models for autonomous manipulation tasks. Each video contained complex, multi-step robotic sequences that demanded precise segmentation of every component — joints, grippers, objects, and interaction zones across every frame. The volume was significant: 10–15 videos daily, each requiring 50–100 segmentation annotations.
- Achieving pixel-level segmentation accuracy on fast-moving robotic arm sequences
- Maintaining annotation consistency across 10–15 videos per day
- Coordinating a 60-person team to deliver uniform quality at high throughput
- Handling complex multi-step action sequences with overlapping object boundaries
Client
Uber
Industry
Robotics / Physical AI
Timeline
60 days
2
Our Approach
- Deployed a dedicated 60-person annotation team trained specifically on robotic action segmentation
- Built a structured workflow optimized for daily throughput of 10–15 videos
- Created detailed segmentation guidelines covering joint articulation, gripper states, and object interaction zones
- Implemented frame-by-frame consistency checks to prevent annotation drift across sequences
Quality Assurance
- Every annotation was independently reviewed by two different annotators
- Disagreements on segmentation boundaries were escalated to senior QA reviewers
- Cross-frame consistency was validated to ensure temporal coherence
- Only consensus-approved, double-verified annotations were delivered
3
The Results
- Sustained 97–100% annotation accuracy across all video segmentation tasks
- Delivered 10–15 fully annotated videos daily without quality degradation
- Completed the full project within the 60-day timeline
- Enabled Uber's robotics team to accelerate model training cycles significantly
Quality Assurance
OUTCOME
Delivered production-grade segmentation datasets that directly improved Uber's robotic manipulation model performance enabling faster iteration on physical AI systems and reducing time-to-deployment for autonomous robotics applications.