RK-MPC: Residual Koopman Model Predictive Control for Quadruped Locomotion in Offroad Environemnts

Sriram S. K. S. Narayanan¹, Umesh Vaidya¹

¹ Department of Mechanical Engineering, Clemson University

Key Contributions

Residual Koopman model: A Koopman-based data-driven framework that learns a compact linear residual correction on top of a nominal template model, along with formal guarantees on multi-step prediction error through bounded residual dynamics.
Convex RK-MPC: A MPC formulation that embeds the learned residual predictor within a receding-horizon controller and runs onboard in real time at 500 Hz.
Validation in simulation and hardware: Extensive Gazebo simulations and Unitree Go1 experiments demonstrating reliable tracking of planar velocity commands and robust blind locomotion across contact disturbances, multiple gait schedules, and off-road terrains (grass, gravel, snow, and ice).

Simulation Experiments Paper (coming soon) Code (coming soon) arXiv

Abstract

This paper presents Residual Koopman MPC (RK-MPC), a Koopman-based, data-driven model predictive control framework for quadruped locomotion that improves prediction fidelity while preserving real-time tractability. RK-MPC augments a nominal single-rigid-body template model with a compact linear residual predictor learned from data in lifted coordinates, enabling systematic correction of model mismatch caused by contact variability and rough terrain.

The learned residual model is embedded within a convex quadratic program MPC formulation, yielding a receding-horizon controller that runs onboard at $500\,\mathrm{Hz}$ and retains optimization-based constraints. We evaluate RK-MPC in Gazebo simulation and Unitree Go1 hardware experiments, demonstrating reliable blind locomotion across disturbances, gait schedules, and off-road terrain (grass, gravel, snow, ice).

We further compare against Koopman/EDMD baselines using monomial and SE(3)-structured observables, showing the residual correction improves multi-step prediction and closed-loop performance while reducing sensitivity to observable choice.

Residual Koopman Model Identification

Residual Koopman model framework, data preprocessing, and prediction performance results.

Key points

Nominal model: $(A^{\mathrm{nom}}, B^{\mathrm{nom}})$
Residual predictor in lifted state: $C^{\mathrm{res}}(A^{\mathrm{res}} z + B^{\mathrm{res}} u)$
Convex MPC executed at $500\,\mathrm{Hz}$ running on Jetson NX
Improves multi-step tracking for off-road locomotion

Dataset Generation

Training data were generated in Gazebo with a nominal locomotion controller over $10$ independent episodes (≈2 min each) at $100\,\mathrm{Hz}$, yielding $N=121{,}753$ synchronized samples after transient removal. Each episode used randomized terrain and friction ($\mu\in[0.5,1.0]$) and uniformly randomized planar velocity commands $v_x,v_y\in[-0.7,0.7]~\mathrm{m/s}$, $\omega_z\in[-0.5,0.5]~\mathrm{rad/s}$, filtered for smooth references.

The dataset coverage is shown in the figure below with sampled velocities, reference commands, GRFs, and trot gait schedule; it supports robust residual Koopman fitting and sim-to-real generalization.

Training dataset coverage for residual Koopman training and evaluation.

Prediction Performance

For the Residual Koopman model identification, we construct residual targets $e_k=[\Delta v_k^\top\;\Delta\omega_k^\top]^\top\in\mathbb{R}^6$, and lift them via degree-2 monomials $z_k=\psi(e_k)$, and use EDMDc to estimate $(A^{\mathrm{res}},B^{\mathrm{res}})$ with output regression for $C^{\mathrm{res}}$. The learned correction is projected back to physical residual space as $e_k\approx C^{\mathrm{res}} z_k$, preserving a lightweight structure around the SRB predictor.

Residual Koopman model achieves low multi-step prediction error: open loop RMSE of $0.033\pm0.0028$ m/s for velocity and $0.033\pm0.0027$ rad/s for angular velocity.
Residual Koopman model is highly sample efficient: RMSE improves from $0.3759\pm0.1385$ at 100 samples to $0.0403\pm0.0046$ at 1000, saturating near $0.033$ at 10,000 samples.
Monomial EDMD (EDMD-mono) has large error growth, while EDMD with SE(3) basis (EDMD-SE3) and Residual Koopman (res-koopman) models perform better than the SRB template model in open loop predictions.
EDMD with SE(3) basis leads to small orientation errors that perturb the lifted coordinates and compound over multi-step rollouts, leading to horizon-dependent attitude drift.

The figure below compares trajectory predictions and per-channel RMSE, and shows sample-count and polynomial-degree sweep behaviors for the residual Koopman correction.

Residual Koopman vs EDMD & SRB models prediction performance and RMSE analysis (open loop).

Residual Koopman MPC Formulation

Given lifted residual state $z_k = \psi(e_k)$, residual correction $C^{\mathrm{res}}$, and nominal dynamics $(A^{\mathrm{nom}},B^{\mathrm{nom}})$, the MPC problem is:

\[ \begin{aligned} \min_{\{u_{k+i}\}_{i=0}^{N_h-1}} \quad & \sum_{i=0}^{N_h-1} (x_{k+i}-x^\star_{k+i})^\top Q (x_{k+i}-x^\star_{k+i}) + u_{k+i}^\top R u_{k+i} \\ &\quad + (x_{k+N_h}-x^\star_{k+N_h})^\top Q_f (x_{k+N_h}-x^\star_{k+N_h}) \\ \text{s.t.} \quad & x_{k+i+1} = A^{\mathrm{nom}}_{k+i} x_{k+i} + B^{\mathrm{nom}}_{k+i} u_{k+i} + C^{\mathrm{res}}(A^{\mathrm{res}} z_{k+i} + B^{\mathrm{res}} u_{k+i}) \\ & z_{k+i} = \psi(e_{k+i}),\qquad e_{k+i} = \begin{bmatrix}v_{k+i}-v^{\mathrm{nom}}_{k+i}\\ \omega_{k+i}-\omega^{\mathrm{nom}}_{k+i} \end{bmatrix} \\ & A_{ineq}(\sigma_{k+i}) u_{k+i} \le b_{ineq}(\sigma_{k+i}) \end{aligned} \]

RK-MPC framework diagram used in the control architecture.

We validate Residual Koopman MPC (RK-MPC) on Unitree Go1 hardware in blind-locomotion conditions (no exteroceptive sensing), using only proprioception and commanded planar references.

Compute platform: full control stack runs onboard an NVIDIA Jetson Xavier NX.
Control law: at each cycle, RK-MPC solves the convex Q with the residual Koopman model, then applies the first optimal command $u_k^\star$.
Reference generation: user commands $(v_x^{\mathrm{cmd}},\,v_y^{\mathrm{cmd}},\,\omega_z^{\mathrm{cmd}})$ are converted by a smooth global planner into a reference sequence $\{x_{k+i}^\star\}_{i=0}^{N}$ for the center of mass of the robot.
Estimation and locomotion stack: SRB state $x_k$ from a linear Kalman filter; contact timing from a wave-based gait scheduler; swing footholds from a Raibert-style heuristic.
Rates: estimator and RK-MPC run at $500\,\mathrm{Hz}$; stance and swing torque control also runs at $500\,\mathrm{Hz}$; MPC discretization step is $0.01\,\mathrm{s}$.