Learning Vision-Based Dribbling for Humanoid Soccer via Privileged Representation Learning

Submitted to Robocup International Symposium 2026

Flavio Maiorana1, Valerio Spagnoli1, Eugenio Bugli1, Flavio Volpi2, Daniele Affinita3, Vincenzo Suriani1, Daniele Nardi1 and Luca Iocchi1
1Dept. of Computer, Control, and Management Engineering Sapienza University of Rome, Rome (Italy), 2Institut de Robòtica i Informàtica Industrial (CSIC-UPC), C Llorens i Artigas 4-6, 08028, Barcelona, Spain, 3École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland,

Abstract

Recent advances in humanoid robotics have highlighted the importance of deployable loco-manipulation skills. Dribbling a soccer ball while evading active opponents requires simultaneous balance, precise ball control, and awareness of a dynamic adversary under onboard sensing and real-time constraints. Existing approaches typically separate perception and motion, which can be effective in controlled settings but may fail under occlusions, fast ball movements, and complex opponent interactions, since perception is not directly optimized for control. We propose an integrated approach in which a temporal depth encoder is embedded into a reinforcement learning policy through a task-specific projection layer. We apply this framework to a simulated Booster T1 humanoid robot and show that it is possible to learn vision-based, opponent-aware dribbling directly from depth observations, without explicit state estimation or privileged scene information. The learned policy achieves (100%) success in nominal target-driven dribbling and (96%) success with a single static obstacle, while reaching (46%) success against an actively moving ball-attacker opponent. These results demonstrate that the proposed framework supports robust vision-based dribbling in nominal and moderately dynamic settings, and provides a strong foundation for handling more challenging moving-adversary scenarios.

Methodology

Curriculum-based training schedule
Curriculum-based training schedule. The policy is first trained through progressively harder stages, from ball approach to dynamic opponent interaction, and then a visual encoder is trained to reproduce the learned latent from depth observations, while keeping the policy frozen.
Curriculum-based training schedule
Overview of the two-phase framework. In Phase 1, a privileged encoder maps ground-truth ball and obstacle states to a latent representation used by the policy. In Phase 2, a visual encoder predicts the same latent from depth observations, enabling deployment with onboard sensing only.

Experimental Results

Reward Term Weight Reward Term Weight
Ball velocity tracking3.0Ball-target reached50.0
Ball speed tracking2.0Robot-obstacle collision-10.0
Ball heading tracking2.0Ball-obstacle collision-10.0
Robot-ball distance0.05Foot slip-0.1
Robot-ball yaw2.0Upright1.0
Ball-target progress2.0Body angular velocity-0.05
Swing phase3.0Angular momentum-0.5
Stance phase2.0DoF position limits-1.0
Pose arms1.0Action rate (L2)-0.1
Pose legs1.0Foot swing height-0.25
Feet distance-6.0Soft landing-1e-5
Foot-foot contact-5.0Self-collisions-1.0
Nonfoot-ball contact-2.0

Table. Reward terms and weights for training the dribbling policy.

Condition SR [%] T2T [s] T2T-C [s] FR [%] LR [%] RCR [%] RC/t BCR [%] BC/t MBC [m]
No obstacles100.0011.4511.450.000.00 ----------
Static obstacle96.0013.2913.954.000.00 8.000.084.000.101.73
Ball attacker46.0011.8221.6452.002.00 68.000.7240.000.701.36

Table. Final-policy main task evaluation.

Condition Segment e_vec [m/s] e_spd [m/s] e_ang [deg]
No obstaclesAll timesteps0.710.6228.39
Static obstacleUnblocked0.740.6235.62
Static obstacleBlocked0.740.6033.96
Ball attackerUnblocked0.740.6134.27
Ball attackerBlocked0.790.6043.00

Table. Velocity-tracking diagnostic for the final policy.

Condition e_ball,pos [m] e_ball,vel [m/s] e_obs,pos [m] e_obs,vel [m/s] c_fov [%]
No obstacles0.050.25----75.43
Static obstacle0.050.241.260.2176.74
Ball attacker0.080.261.260.1378.39

Table. Perception metrics for the final policy.

Condition Stage 1 Stage 2 Stage 3
No obstacles1006890
Static obstacle244288
Ball attacker2246

Table. Success-rate ablation across curriculum stages (%).

Conclusions

This paper presented a visual temporal policy for autonomous humanoid soccer dribbling. The experimental results show that the proposed policy reliably solves nominal target-driven dribbling, achieving (100%) success in the no-obstacle condition, and also performs well in the single static-obstacle setting, where success remains high at (96%). Performance degrades substantially in the ball-attacker setting, where success drops to (46%) and both fall and collision rates increase markedly. This indicates that the main remaining challenge is robust closed-loop control against an actively moving adversary, rather than nominal dribbling itself. The velocity and perception diagnostics support this interpretation: deviations from the commanded ball motion increase during blocked attacker interactions, while perception errors remain relatively stable across conditions. Overall, these results suggest that combining temporal visual adaptation with reinforcement learning is a promising approach for humanoid loco-manipulation tasks. However, the current evidence is limited by the small number of independently trained policies and by the fact that all experiments are conducted in simulation. Beyond humanoid soccer, the same framework could support other embodied robotic tasks requiring perception-driven control under partial observability. Future work will focus on harder and more diverse opponent behaviors, stronger robustness against moving adversaries, and sim-to-real transfer on the Booster T1 humanoid platform.

Riferimenti (BibTeX)

@article{maiorana2026learning,
  title={Learning Vision-Based Dribbling for Humanoid Soccer via Privileged Representation Learning},
  author={Maiorana, Flavio and Spagnoli, Valerio and Bugli, Eugenio and Volpi, Flavio and Affinita, Daniele and Suriani, Vincenzo and Nardi, Daniele and Iocchi, Luca},
  year={2026}
  }