MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

In submission.

Robotics Institute, School of Computer Science, Carnegie Mellon University

Teaser

Abstract

Planning long-horizon motions using a set of predefined skills is a key challenge in robotics and AI. Addressing this challenge requires methods that systematically explore skill combinations to uncover task-solving sequences, harness generic, easy-to-learn skills (e.g., pushing, grasping) to generalize across unseen tasks, and bypass reliance on symbolic world representations that demand extensive domain and task-specific knowledge. Despite significant progress, these elements remain largely disjoint in existing approaches, leaving a critical gap in achieving robust, scalable solutions for complex, long-horizon problems. In this work, we present \mosaic{}, a skill-centric framework that unifies these elements by using the skills themselves to guide the planning process. \mosaic{} uses two families of skills: \textit{Generators} compute executable trajectories and world configurations, and \textit{Connectors} link these independently generated skill trajectories by solving boundary value problems, enabling progress toward completing the overall task. By breaking away from the conventional paradigm of incrementally discovering skills from predefined start or goal states---a limitation that significantly restricts exploration---\mosaic{} focuses planning efforts on regions where skills are inherently effective. We demonstrate the efficacy of \mosaic{} in both simulated and real-world robotic manipulation tasks, showcasing its ability to solve complex long-horizon planning problems using a diverse set of skills incorporating generative diffusion models, motion planning algorithms, and manipulation-specific models.

Experimentation

Simulation setups and their real-world counterparts. Across all scenarios, the robot must place the plate into the bin. In Transport (a), the robot must push the plate to the edge in order to pick it up. In Transport in Clutter (b), the robot must move the plate to the edge without displacing other objects. In Transport Among Movable Objects (c), the robot discovered the need to clear space for manipulating the plate by moving the chips can elsewhere.

Transport

Transport real Transport sim

Transport in Clutter

Clutter real Clutter sim

Transport Among Movable Objects

Movable real Movable sim

Planner Comparison

Algorithms comparison across experimental scenarios. Left: Success rates. Middle: Planning time density with median, IQR, and average. Right: Head-to-head comparison on tests both algorithms solved. Upper-right shows relative planning times; lower-left shows relative sequence lengths. Each cell compares the “row algorithm” to the “column algorithm.” For mosaic, lower values are better in the first row, and higher values are better in the first column.

Planner comparison no obs Planner comparison with obs Planner comparison movable objects