

Planning long-horizon motions using a set of predefined skills is a key challenge in robotics and AI. Addressing this challenge requires methods that systematically explore skill combinations to uncover task-solving sequences, harness generic, easy-to-learn skills (e.g., pushing, grasping) to generalize across unseen tasks, and bypass reliance on symbolic world representations that demand extensive domain and task-specific knowledge. Despite significant progress, these elements remain largely disjoint in existing approaches, leaving a critical gap in achieving robust, scalable solutions for complex, long-horizon problems. In this work, we present \mosaic{}, a skill-centric framework that unifies these elements by using the skills themselves to guide the planning process. \mosaic{} uses two families of skills: \textit{Generators} compute executable trajectories and world configurations, and \textit{Connectors} link these independently generated skill trajectories by solving boundary value problems, enabling progress toward completing the overall task. By breaking away from the conventional paradigm of incrementally discovering skills from predefined start or goal states---a limitation that significantly restricts exploration---\mosaic{} focuses planning efforts on regions where skills are inherently effective. We demonstrate the efficacy of \mosaic{} in both simulated and real-world robotic manipulation tasks, showcasing its ability to solve complex long-horizon planning problems using a diverse set of skills incorporating generative diffusion models, motion planning algorithms, and manipulation-specific models.
Simulation setups and their real-world counterparts. Across all scenarios, the robot must place the plate into the bin. In Transport (a), the robot must push the plate to the edge in order to pick it up. In Transport in Clutter (b), the robot must move the plate to the edge without displacing other objects. In Transport Among Movable Objects (c), the robot discovered the need to clear space for manipulating the plate by moving the chips can elsewhere.
Algorithms comparison across experimental scenarios. Left: Success rates. Middle: Planning time density with median, IQR, and average. Right: Head-to-head comparison on tests both algorithms solved. Upper-right shows relative planning times; lower-left shows relative sequence lengths. Each cell compares the “row algorithm” to the “column algorithm.” For mosaic, lower values are better in the first row, and higher values are better in the first column.