Learning Vision-Based Dribbling for Humanoid Soccer via Privileged Representation Learning

Submitted to Robocup International Symposium 2026

Flavio Maiorana¹, Valerio Spagnoli¹, Eugenio Bugli¹, Flavio Volpi², Daniele Affinita³, Vincenzo Suriani¹, Daniele Nardi¹ and Luca Iocchi¹

¹Dept. of Computer, Control, and Management Engineering Sapienza University of Rome, Rome (Italy), ²Institut de Robòtica i Informàtica Industrial (CSIC-UPC), C Llorens i Artigas 4-6, 08028, Barcelona, Spain, ³École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland,

Abstract

Recent advances in humanoid robotics have highlighted the importance of deployable loco-manipulation skills. Dribbling a soccer ball while evading active opponents requires simultaneous balance, precise ball control, and awareness of a dynamic adversary under onboard sensing and real-time constraints. Existing approaches typically separate perception and motion, which can be effective in controlled settings but may fail under occlusions, fast ball movements, and complex opponent interactions, since perception is not directly optimized for control. We propose an integrated approach in which a temporal depth encoder is embedded into a reinforcement learning policy through a task-specific projection layer. We apply this framework to a simulated Booster T1 humanoid robot and show that it is possible to learn vision-based, opponent-aware dribbling directly from depth observations, without explicit state estimation or privileged scene information. The learned policy achieves (100%) success in nominal target-driven dribbling and (96%) success with a single static obstacle, while reaching (46%) success against an actively moving ball-attacker opponent. These results demonstrate that the proposed framework supports robust vision-based dribbling in nominal and moderately dynamic settings, and provides a strong foundation for handling more challenging moving-adversary scenarios.

Methodology

**Curriculum-based training schedule.** The policy is first trained through progressively harder stages, from ball approach to dynamic opponent interaction, and then a visual encoder is trained to reproduce the learned latent from depth observations, while keeping the policy frozen.

Experimental Results

Reward Term	Weight	Reward Term	Weight
Ball velocity tracking	3.0	Ball-target reached	50.0
Ball speed tracking	2.0	Robot-obstacle collision	-10.0
Ball heading tracking	2.0	Ball-obstacle collision	-10.0
Robot-ball distance	0.05	Foot slip	-0.1
Robot-ball yaw	2.0	Upright	1.0
Ball-target progress	2.0	Body angular velocity	-0.05
Swing phase	3.0	Angular momentum	-0.5
Stance phase	2.0	DoF position limits	-1.0
Pose arms	1.0	Action rate (L2)	-0.1
Pose legs	1.0	Foot swing height	-0.25
Feet distance	-6.0	Soft landing	-1e-5
Foot-foot contact	-5.0	Self-collisions	-1.0
Nonfoot-ball contact	-2.0

Table. Reward terms and weights for training the dribbling policy.

Condition	SR [%]	T2T [s]	T2T-C [s]	FR [%]	LR [%]	RCR [%]	RC/t	BCR [%]	BC/t	MBC [m]
No obstacles	100.00	11.45	11.45	0.00	0.00	--	--	--	--	--
Static obstacle	96.00	13.29	13.95	4.00	0.00	8.00	0.08	4.00	0.10	1.73
Ball attacker	46.00	11.82	21.64	52.00	2.00	68.00	0.72	40.00	0.70	1.36

Table. Final-policy main task evaluation.

Condition	Segment	e_vec [m/s]	e_spd [m/s]	e_ang [deg]
No obstacles	All timesteps	0.71	0.62	28.39
Static obstacle	Unblocked	0.74	0.62	35.62
Static obstacle	Blocked	0.74	0.60	33.96
Ball attacker	Unblocked	0.74	0.61	34.27
Ball attacker	Blocked	0.79	0.60	43.00

Table. Velocity-tracking diagnostic for the final policy.

Condition	e_ball,pos [m]	e_ball,vel [m/s]	e_obs,pos [m]	e_obs,vel [m/s]	c_fov [%]
No obstacles	0.05	0.25	--	--	75.43
Static obstacle	0.05	0.24	1.26	0.21	76.74
Ball attacker	0.08	0.26	1.26	0.13	78.39

Table. Perception metrics for the final policy.

Condition	Stage 1	Stage 2	Stage 3
No obstacles	100	68	90
Static obstacle	24	42	88
Ball attacker	2	2	46

Table. Success-rate ablation across curriculum stages (%).

Conclusions

This paper presented a visual temporal policy for autonomous humanoid soccer dribbling. The experimental results show that the proposed policy reliably solves nominal target-driven dribbling, achieving (100%) success in the no-obstacle condition, and also performs well in the single static-obstacle setting, where success remains high at (96%). Performance degrades substantially in the ball-attacker setting, where success drops to (46%) and both fall and collision rates increase markedly. This indicates that the main remaining challenge is robust closed-loop control against an actively moving adversary, rather than nominal dribbling itself. The velocity and perception diagnostics support this interpretation: deviations from the commanded ball motion increase during blocked attacker interactions, while perception errors remain relatively stable across conditions. Overall, these results suggest that combining temporal visual adaptation with reinforcement learning is a promising approach for humanoid loco-manipulation tasks. However, the current evidence is limited by the small number of independently trained policies and by the fact that all experiments are conducted in simulation. Beyond humanoid soccer, the same framework could support other embodied robotic tasks requiring perception-driven control under partial observability. Future work will focus on harder and more diverse opponent behaviors, stronger robustness against moving adversaries, and sim-to-real transfer on the Booster T1 humanoid platform.

Riferimenti (BibTeX)

@article{maiorana2026learning, title={Learning Vision-Based Dribbling for Humanoid Soccer via Privileged Representation Learning}, author={Maiorana, Flavio and Spagnoli, Valerio and Bugli, Eugenio and Volpi, Flavio and Affinita, Daniele and Suriani, Vincenzo and Nardi, Daniele and Iocchi, Luca}, year={2026} }