Recent advances in machine learning have started a paradigm shift from task-specific models towards large general-purpose architectures. In the domains of language and vision we see large models such as GPT3, BERT, and CLIP that have opened avenues towards solving several applications and continue to cause an explosion of new ideas and possibilities. What does it take to bring the same level of advancements to the field of robotics - in order to build versatile agents that can be deployed in challenging environments? The goal of this workshop is to analyze how we can scale robotics towards the complexity of real world by leveraging pretrained models. We will discuss how to apply the concept of large scale pretraining to robotics, so as to enable models to learn how to process diverse, multimodal perception inputs, connect perception with action, and generalize across scenarios and form factors. In particular, we are interested in analyzing the domain of pretraining for robotics from several angles such as, and not limited to:
- How do we build pre-trained reusable feature representations from complex inputs?
- How do we learn world models that combine perception and actions?
- How can we combine pretrained representations from multiple modalities such as language, vision, and geometry into robotics systems?
- What are the right kinds of priors that are helpful for optimization and task planning?
- How do we leverage architectures and training methods that have been successful in other domains in robotics?
- How do we efficiently fine-tune pretrained models for new downstream tasks?
- How best to deal with the specificities of robotics such as expensive data collection and safety constraints?