Back to Journal
2026-04-08
Research Entry

MuJoCo Robotics Lab — Part 12: Whole-Body Loco-Manipulation

MuJoCoWhole-Body ControlLoco-ManipulationQPHumanoid

Lab 8 of 9 in the MuJoCo Robotics Lab series. A Unitree G1 humanoid walks up to a table, reaches down, grasps an object, and carries it while continuing to walk — all driven by a single task-priority quadratic program running at 500 Hz. This is the capstone that ties every earlier lab together: kinematics, dynamics, contacts, gait generation, and manipulation collapse into one controller.


Problem

Lab 7 gave me a stable walking gait on the G1. Labs 2–5 gave me end-effector control on a fixed-base arm. The obvious (wrong) idea is to run both controllers in parallel: let the gait controller own the legs and the pelvis, let the arm controller own the hands, and hope the two don't fight.

They fight.

Walking and manipulation share the same robot. The arm hangs off a torso that is already being accelerated to track a CoM reference. Reaching forward shifts the CoM outside the support polygon unless the legs compensate. Lifting a 0.5 kg object changes the mass distribution and pushes the ZMP toward the toes. And every one of these tasks ultimately resolves to torques on the same 23 actuated joints, subject to the same contact constraints and the same equation of motion:

M(q) · q̈ + h(q, q̇) = Sᵀ · τ + J_cᵀ · f_c

You cannot split this equation into "leg part" and "arm part." The mass matrix M(q) is dense — arm motion produces reaction torques at the hips, and pushing down with the feet moves the shoulders. Contact forces f_c must satisfy friction cones that depend on the entire body configuration. The only honest answer is a single controller that resolves all tasks jointly, with an explicit priority order, against the full dynamics.

Theory

The whole-body controller is formulated in task space. Every objective — CoM tracking, swing-foot tracking, hand tracking, posture regularization — is written as a desired acceleration in the space it lives in, and related to the generalized acceleration through a task Jacobian J_i:

ẍ_i_des = J_i · q̈ + J̇_i · v

Given a desired task-space acceleration a_d_i (produced by a PD controller on task error), the controller wants J_i · q̈ - (a_d_i - J̇_i · v) ≈ 0 for every task simultaneously. With more tasks than DOFs — which is always the case here — the system is over-constrained and must be resolved by priority.

There are two classical approaches. Strict hierarchy (HQP) solves a cascade of QPs, projecting each lower-priority task into the null space of all higher-priority ones. It is mathematically clean but expensive. Weighted QP puts every task into a single least-squares cost with large weight gaps to approximate priority:

min   Σ_i w_i · ‖ J_i · q̈ - (a_d_i - J̇_i · v) ‖²
 q̈,τ,f_c

s.t.  M(q) · q̈ + h(q, q̇) = Sᵀ · τ + J_cᵀ · f_c    (equation of motion)
      A_fric · f_c ≤ 0                              (linearized friction cones)
      τ_min ≤ τ ≤ τ_max                             (torque limits)
      q̈_min ≤ q̈ ≤ q̈_max                           (acceleration limits)

With weights w = {1000, 100, 10, 1} for {CoM, feet, hands, posture}, balance is essentially never sacrificed for manipulation — a hand-tracking error of 5 cm only costs as much as a CoM error of 1.5 mm. One OSQP solve per tick replaces a cascade of solves, and at 500 Hz control rate that matters.

The decision vector is [q̈; τ; f_c]. Generalized accelerations, actuated torques, and contact wrenches are all primary unknowns. This is what makes the formulation honest: the dynamics equality is a hard constraint, friction cones are hard constraints, and the solver is free to distribute effort across contacts however it pleases as long as the tasks are satisfied.

Implementation

The controller is built in four layers: a Pinocchio-backed kinematics/dynamics model, a task library, the QP solver, and a finite-state machine that schedules which tasks are active at each phase of the demo.

G1WholeBodyModel wraps Pinocchio with a free-flyer root joint. It exposes mass_matrix(q) (CRBA), nle(q, v) (RNEA for Coriolis + gravity), jacobian_frame(q, name) for feet/hands/pelvis, and jacobian_com(q). MuJoCo never does analytical computation — it only integrates physics. Pinocchio is the brain, MuJoCo is the muscles, and a joint-index map converts state between the two every tick.

Tasks all share an interface: compute_error, compute_jacobian, and PD gains that turn the error into a desired acceleration. Each concrete task is a few lines:

class HandPoseTask(Task):
    def __init__(self, model, frame_name, target_se3,
                 kp=100.0, kd=20.0, weight=10.0):
        self.model = model
        self.frame = frame_name
        self.T_des = target_se3
        self.kp, self.kd, self.weight = kp, kd, weight

    @property
    def dim(self):
        return 6

    def compute_error(self, q, v):
        T = self.model.fk_frame(q, self.frame)
        # SE3 log error in body frame (twist)
        return pin.log6(T.inverse() * self.T_des).vector

    def compute_jacobian(self, q):
        return self.model.jacobian_frame(q, self.frame)  # 6 x nv

    def compute_desired_acceleration(self, q, v):
        e = self.compute_error(q, v)
        J = self.compute_jacobian(q)
        ed = -J @ v  # body twist error rate
        return self.kp * e + self.kd * ed

CoMTask, FootPoseTask, and PostureTask follow the same pattern with different Jacobians and gains. PostureTask regularizes the null space so the redundant DOFs do not drift — without it the controller is free to let the arms flop because the cost is zero.

The WholeBodyQP builds an OSQP problem every tick from the active task set and the current contact schedule. It warm-starts from the previous solution, which is what keeps solve times in the low milliseconds:

def solve(self, q, v, tasks, contacts):
    nv, na, nc = self.model.nv, self.model.na, len(contacts)
    n_vars = nv + na + 3 * nc   # [qdd; tau; f_c]

    # Cost: sum of weighted task residuals
    H = np.zeros((n_vars, n_vars))
    g = np.zeros(n_vars)
    for task in tasks:
        J = task.compute_jacobian(q)                     # dim x nv
        a_d = task.compute_desired_acceleration(q, v)
        Jqdd = np.zeros((task.dim, n_vars))
        Jqdd[:, :nv] = J
        H += task.weight * Jqdd.T @ Jqdd
        g += -task.weight * Jqdd.T @ a_d

    # Equality: M qdd + h = S^T tau + J_c^T f_c
    M  = self.model.mass_matrix(q)
    h  = self.model.nle(q, v)
    Jc = self._stack_contact_jacobians(q, contacts)      # (3nc) x nv
    A_eq = np.hstack([M, -self.model.S.T, -Jc.T])
    b_eq = -h

    # Friction + torque + acceleration bounds
    A_in, l_in, u_in = self._build_inequalities(contacts)

    res = self._osqp_solve(H, g, A_eq, b_eq, A_in, l_in, u_in)
    return QPResult(qdd=res.x[:nv], tau=res.x[nv:nv+na],
                    f_contacts=self._unpack_contacts(res.x, nv, na, nc),
                    solve_time_ms=res.info.solve_time * 1e3,
                    status=res.info.status)

The LocoManipStateMachine decides what is active at each phase. Its job is to activate/deactivate tasks, to tell the gait generator when to start and stop, and to toggle the MuJoCo weld equality that represents the rigid grasp:

IDLE → WALK_TO_TABLE → STOP_AND_STABILIZE → REACH → GRASP →
LIFT → WALK_WITH_OBJECT → STOP_AND_STABILIZE_2 → PLACE →
RELEASE → RETREAT → DONE

During WALK_* states, the gait generator drives the CoM and swing-foot targets from a LIPM/ZMP plan and publishes a contact schedule (which foot is in stance). During REACH and LIFT, the feet are double-support and the hand task is injected with a 5th-order polynomial reference toward the object. During WALK_WITH_OBJECT, both the gait CoM reference and the hand tracking are active at once — this is the moment where the priorities earn their keep. When the weld engages, the CoM task target is shifted to the combined robot+object CoM (compute_com_with_load) so the balance controller plans for the extra 0.5 kg hanging off the left hand.

Grasping itself is intentionally cheap: a MuJoCo equality constraint welds the hand to the object when the FSM says so. The point of this lab is whole-body coordination, not contact-rich grasping — that lives in Lab 5.

Results

The capstone demo runs the full sequence end-to-end: 1 m approach, stabilize, reach, grasp, lift 10 cm, walk backward 0.5 m with the object, stabilize again, and release. No falls, no dropped objects across repeated runs.

Metric Result
QP solve time (median) ~1.2 ms
QP solve time (worst case) Under 2 ms budget at 500 Hz
FK cross-validation (Pinocchio vs MuJoCo) Sub-millimetre
Standing CoM recovery (5 N lateral push) Recovers within 2 s
Hand reach error (standing, double support) Within 2 cm of target
Arm pose drift during walking Well under 5 cm
Walking steps without falling 10+ consecutive steps
Capstone pickup success Reliable across runs
Dynamics equality residual Near solver tolerance (~1e-6)

The QP solve time number is what makes 500 Hz control feasible — OSQP's sparse ADMM solver, warm-started from the previous tick, stays comfortably inside the control budget even with contact forces as decision variables. The dynamics residual being near solver tolerance is the sanity check that matters: it confirms the torques the QP commits to are physically consistent with the accelerations it plans, not an approximation that drifts over a step.

Qualitatively, the most satisfying moment is watching the CoM compensation kick in when the weld engages. The instant the object is "grasped," the target CoM shifts a few centimetres toward the held hand, the pelvis leans to counterbalance, and walking resumes without a stumble. Balance was never tuned to know about the object — the task framework just receives a different CoM target and the QP does the rest.

Key Takeaways

  • Weighted QP is enough. With four-decade priority gaps, strict HQP buys you very little over a single OSQP solve, and you get warm-starting for free. The expensive solver is not the one I needed.
  • Contacts belong in the decision vector. Making f_c a primary unknown, bounded by friction cones, means the solution is physically consistent by construction. No projection, no post-hoc fixes.
  • One model, one truth. Pinocchio does all analytical computation (FK, Jacobians, M, h, CoM); MuJoCo only integrates. Mixing the two is a recipe for subtle bugs around quaternion conventions and joint ordering.
  • The posture task is not optional. With large null spaces, regularization is what keeps the redundant DOFs from drifting into useless configurations. Weight 1 against 1000, but still load-bearing.
  • CoM compensation is a target change, not a controller change. The balance loop is identical whether you are empty-handed or carrying a brick — you only have to tell it where the true combined CoM should go.
  • The FSM is the glue, not the controller. Scheduling tasks and contact modes belongs in a state machine; continuous reasoning about balance and tracking belongs in the QP. Keeping those concerns separate is what makes the capstone debuggable.

Resources

End of Protocol — mujoco-lab-12-whole-body.md