The central and concrete objective of this PhD thesis is to develop and deploy cutting-edge locomotion controllers on the bipedal robot Kangaroo [Kangaroo22], developed by PAL Robotics, using reinforcement learning techniques [Hwangbo19]. The Kangaroo platform presents a particularly high degree of mechanical complexity and nonlinearity, which makes it extremely challenging to model accurately in simulation [KangarooRL25]. As a result, standard sim-to-real approaches are difficult to apply, with the Kangaroo robot exhibiting a larger-than-usual gap between simulation and physical reality. The thesis aims to study and characterize this gap, and to propose novel learning-based control strategies capable of overcoming or mitigating it, either by improving transferability or by learning directly on the real system.
This work will rely on strong experimental foundations and a tight collaboration between Gepetto at LAAS-CNRS and PAL Robotics. Gepetto brings extensive expertise in locomotion control and has successfully deployed reinforcement learning–based policies on multiple quadruped and biped robots [SoloParkour2024]. PAL Robotics, on the other hand, has designed and developed the Kangaroo platform and has already demonstrated a partial sim-to-real deployment of locomotion policies on real hardware.
The project will leverage the forthcoming Kangaroo prototype, currently being assembled at PAL Robotics and expected to be delivered in Spring 2026. In addition, the thesis will benefit from the experimental facilities at LAAS-CNRS, including a large motion capture room equipped with a safety crane, a complete fab lab and mechatronics workshop, and several additional humanoid platforms such as Unitree H1 and R1. This environment will provide a unique opportunity to carry out extensive experimental validation and benchmarking of the developed learning-based controllers, in both simulated and real-world conditions.
The PhD position is complemented by a 10-month engineering contract designed to provide technical and experimental support. This preliminary position is primarily intended to precede the start of the PhD, allowing the selected candidate to become familiar with the underlying technologies and to contribute to the preparation of the Kangaroo platform at LAAS-CNRS before the scientific work begins. In this configuration, the recruitment process would cover both stages: an engineering position at LAAS-CNRS (for example, from January to October 2026) followed by the PhD contract under PAL France (from November 2026 to October 2029). Alternatively, the engineering support contract could be allocated in parallel with the PhD to reinforce the experimental aspects of the project and assist in the maintenance, testing, and data collection on the Kangaroo robot throughout the thesis. This flexibility ensures that the candidate and the research team can make the most effective use of the available resources depending on the project’s development timeline.
The proposed research follows a progressive methodology combining simulation-based pretraining, real-world learning, and iterative adaptation. The first phase will consist of training locomotion policies on an existing bipedal simulation model, using modern physics engines such as MuJoCo or Isaac Gym. The simulation parameters will be carefully identified from real robot data [IdRL2025], following methodologies successfully applied to the Bipetto robot at LAAS-CNRS. The specific actuation transmission of Kangaroo will be explicitly modeled to capture its unique mechanical characteristics [Kangaroo24, KangarooRL25]. The main objective of this phase is to obtain a baseline policy capable of generating simple yet stable locomotion behaviors that can be safely transferred to the real hardware.
Building on the initial deployment, the research will then focus on residual learning approaches, where small, task-specific neural networks are trained directly on the real robot to refine the pretrained policy [ResidualRL19, RLPT24]. These residual models will adapt the baseline controller to compensate for unmodeled dynamics, sensor drift, and contact uncertainties. The workflow will follow an iterative cycle alternating between simulation and hardware: simulation phases will enable massive data collection and large-scale policy optimization, while short and safe experimental sessions on the robot will provide sparse but high-value data to progressively refine the model and improve transferability. If relevant or necessary, we may also consider iterating on a pre-trained world model, initizalized in simulation and then fine tuned on the real robot while running the latest iteration of the policy [FineTune25,WM25].
Alternatively, the project will explore direct training on the robot using off-policy reinforcement learning algorithms [20min22, 8min25]. These methods target real-world learning without relying on simulation or explicit modeling, and have recently shown promising results on quadruped platforms. The key scientific challenge will be to adapt these approaches to bipedal locomotion, where data collection is inherently riskier and safety constraints are more stringent. This part of the study will provide insights into how to efficiently and safely gather real-world data for humanoid control, contributing to the broader understanding of reinforcement learning on complex robotic systems.
Overall, the project aims to deliver both theoretical and practical advances: a framework for reinforcement learning that goes beyond the traditional sim-to-real paradigm, validated on an industrial-grade bipedal platform. The work will produce fundamental methodological contributions suitable for publication in top-tier robotics and machine learning venues, supported by open-source software developments. On the industrial side, the project will demonstrate the capabilities of Kangaroo through highly visible experimental achievements, potentially extending beyond standard bipedal walking toward dynamic, whole-body movements such as jumps or parkour-like motions. These demonstrations will enhance both scientific impact and public visibility, strengthening the collaboration between PAL Robotics and the academic partners
PAL Robotics is a leading robotics company based in sunny Barcelona. Our goal is to enhance people’s quality of life through robotics and automation technologies. We have over 15 years of experience in the robotics field and offer daily challenges to everyone in our team to help them grow.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Be the first to apply. Receive an email whenever similar jobs are posted.