Robot arm kinematics and manipulation learning in MuJoCo physics simulation, featuring Pinocchio-based FK/IK, Catmull-Rom trajectory planning, and a multi-robot RL/IL training framework supporting PPO, SAC, TD3, DDPG and Behavior Cloning.
| Trajectory | Impedence Control |
|---|---|
manipulation_mujoco/
βββ models/
β βββ franka_emika_panda/ # Franka Panda MJCF model + meshes
β βββ trs_so_arm100/ # TRS SO-ARM100 MJCF model + meshes
βββ kinematic/
β βββ panda_kinematics.py # PandaKinematics β FK / IK / Jacobian (Pinocchio)
β βββ trajectory.py # TrajectoryGenerator β cubic, Catmull-Rom, Cartesian arc
β βββ run_fk.py # Forward kinematics demo
β βββ run_ik.py # Inverse kinematics demo
β βββ run_trajectory.py # Figure-8 Lissajous trajectory demo
βββ dynamics/
β βββ impedance_controller.py # Task-space impedance control (Ο = J^T K e + Ο_bias)
β βββ admittance_controller.py # Task-space admittance control (virtual ODE + inner PD)
βββ learning/
β βββ envs/
β β βββ base_env.py # MuJocoRobotEnv β robot-agnostic Gymnasium base class
β β βββ registry.py # make_env("panda"/"so_arm", "reach"/"push"/"pick_place")
β β βββ tasks/
β β βββ base_task.py # BaseTask interface
β β βββ reach.py
β β βββ push.py
β β βββ pick_place.py
β βββ robots/
β β βββ panda.py # FrankaPandaEnv (4D Cartesian delta + gripper)
β β βββ so_arm.py # SoArm100Env (6D joint velocity)
β βββ algos/
β β βββ rl_trainer.py # train_rl β PPO/SAC/TD3/DDPG + SuccessRateCallback
β β βββ il/
β β βββ bc.py # Behavior Cloning (MLP + CosineAnnealingLR)
β βββ utils/
β β βββ visualize.py # MuJoCo overlay helpers
β βββ train.py # Unified training entry point
β βββ play.py # Real-time policy playback
βββ requirements.txt
βββ pyproject.toml
cd manipulation_mujoco
uv venv
source .venv/bin/activate
uv pip install -r requirements.txtsource .venv/bin/activate
# Forward kinematics
python kinematic/run_fk.py
# Inverse kinematics
python kinematic/run_ik.py
# Figure-8 continuous trajectory with live EE trail
python kinematic/run_trajectory.pyBoth demos use torque-controlled actuators (panda_motor.xml + scene_torque.xml).
Interact via Ctrl + drag in the MuJoCo viewer.
source .venv/bin/activate
# Impedance control β drag and release, arm springs back
python dynamics/impedance_controller.py
# Admittance control β arm compliantly follows your applied force
python dynamics/admittance_controller.pyControl law summary:
| Impedance | Admittance | |
|---|---|---|
| Input | Displacement e |
External force F_ext |
| Output | Joint torques directly | Virtual reference β inner PD |
| Feel | Stiff spring | Soft, mass-damper compliant |
| Equation | Ο = J^T(KpΒ·e β KdΒ·αΊ) + Ο_bias |
M_dΒ·αΊ_v = F_ext β D_dΒ·αΊ_v β K_dΒ·(x_vβx_eq) |
Tune the parameters at the top of each file:
| Parameter | File | Effect |
|---|---|---|
KP / KD |
impedance_controller.py |
Spring stiffness / damping |
M_D |
admittance_controller.py |
Lower β faster response |
D_D |
admittance_controller.py |
Higher β smoother motion |
K_D |
admittance_controller.py |
0 = free drift, >0 = spring to eq |
source .venv/bin/activate
# Headless (recommended for speed)
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000
python learning/train.py --robot panda --task push --algo td3 --timesteps 300000
python learning/train.py --robot panda --task pick_place --algo sac --timesteps 500000
# Enable real-time viewer
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000 --render
# Parallel environments (headless only)
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000 --n-envs 4
# TensorBoard logging
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000 --tensorboard
tensorboard --logdir learning/runs/tb
# SO-ARM100
python learning/train.py --robot so_arm --task reach --algo sac --timesteps 100000source .venv/bin/activate
# Collect heuristic demos + train BC policy
python learning/train.py --robot panda --task reach --algo bc
# Visualize demo collection
python learning/train.py --robot panda --task reach --algo bc --rendersource .venv/bin/activate
# RL policy (SAC reach)
python learning/play.py --robot panda --task reach --algo sac \
--model learning/runs/panda_reach_sac.zip --episodes 5
# BC policy
python learning/play.py --robot panda --task reach --algo bc \
--model learning/runs/panda_reach_bc.pt --episodes 5
# Push / Pick-Place
python learning/play.py --robot panda --task push --algo sac \
--model learning/runs/panda_push_sac.zip
python learning/play.py --robot panda --task pick_place --algo sac \
--model learning/runs/panda_pick_place_sac.zip
# SO-ARM100
python learning/play.py --robot so_arm --task reach --algo sac \
--model learning/runs/so_arm_reach_sac.zip| Argument | Default | Description |
|---|---|---|
--robot |
panda |
panda | so_arm |
--task |
reach |
reach | push | pick_place |
--algo |
sac |
ppo | sac | td3 | ddpg | bc |
--timesteps |
200000 |
Total RL training steps |
--n-envs |
1 |
Number of parallel environments |
--render |
False |
Enable MuJoCo real-time viewer |
--tensorboard |
False |
Enable TensorBoard logging |
--save-dir |
learning/runs |
Model save directory |
--model |
β | Path to saved model (play only) |
--episodes |
10 |
Rollout episodes (play only) |
learning/runs/{robot}_{task}_{algo}.zip β RL (Stable-Baselines3)
learning/runs/{robot}_{task}_{algo}.pt β IL (Behavior Cloning)
- Create
learning/robots/my_robot.pyinheritingMuJocoRobotEnv - Implement
_build_spaces,_get_obs,_apply_action,get_ee_pos,get_ee_body - Register it in
learning/envs/registry.pyundermake_env - Use the same
train.py/play.pycommands with--robot my_robot