Recent advances in machine learning have started a paradigm shift from task-specific models towards large general-purpose architectures. In the domains of language and vision we see large models such as GPT3, BERT, and CLIP that have opened avenues towards solving several applications and continue to cause an explosion of new ideas and possibilities. What does it take to bring the same level of advancements to the field of robotics - in order to build versatile agents that can be deployed in challenging environments? The goal of this workshop is to analyze how we can scale robotics towards the complexity of real world by leveraging pretrained models. We will discuss how to apply the concept of large scale pretraining to robotics, so as to enable models to learn how to process diverse, multimodal perception inputs, connect perception with action, and generalize across scenarios and form factors. In particular, we are interested in analyzing the domain of pretraining for robotics from several angles such as, and not limited to:

  • How do we build pre-trained reusable feature representations from complex inputs?
  • How do we learn world models that combine perception and actions?
  • How can we combine pretrained representations from multiple modalities such as language, vision, and geometry into robotics systems?
  • What are the right kinds of priors that are helpful for optimization and task planning?
  • How do we leverage architectures and training methods that have been successful in other domains in robotics?
  • How do we efficiently fine-tune pretrained models for new downstream tasks?
  • How best to deal with the specificities of robotics such as expensive data collection and safety constraints?
We hope to connect researchers from the communities of deep learning, representation learning, classical robotics, and to induce collaborations in this exciting new domain, while providing a platform to discuss recent developments, challenges and tradeoffs.

Speakers and panelists


May 29th 2023

  • 8:30am - 8:50am: Breakfast
  • 8:50am - 9:00am: Introduction and opening remarks
  • 9:00am - 9:30am: Yuke Zhu
  • 9:30am - 10:00am: Sanjiban Choudhury
  • 10:00am - 10:20am: Poster lightning talks (20 x 1min talk each)
  • 10:20am - 11:00am: Coffee break and poster session I
  • 11:00am - 11:30pm: Jitendra Malik
  • 11:30am - 12:00pm: Ashish Kapoor
  • 12:00pm - 1:00pm: Lunch break
  • 1:00pm - 1:30pm: Dieter Fox
  • 1:30pm - 2:00pm: Kristen Grauman
  • 2:00pm - 2:30pm: Spotlight talks (4x 5min talk and 2min Q&A each)
  • 2:30pm - 3:15pm: Coffee break and poster session II
  • 3:15pm - 3:45pm: Mac Schwager
  • 3:45pm - 4:15pm: Dorsa Sadigh
  • 4:15pm - 5:00pm: Panel discussion
  • 5:00pm - 5:05pm: Closing remarks

Call for papers

Important dates (all times AoE)

  • Submissions open: Feb 15th 2023
  • Submission deadline: Apr 14th 2023
  • Decision notification: Apr 30th 2023
  • Camera ready deadline: May 14th 2023
  • Workshop: May 29th 2023

Call for papers

Submission link: https://openreview.net/group?id=ICRA.org/2023/Workshop/Pretraining4Robotics

In this workshop, we aim to bring together machine learning and robotics researchers who work at the intersection of these fields. We invite researchers to submit work in the following or related areas (non-exhaustive list):

  • Multi-modal pretrained models (images, text, depth, point clouds, action information)
  • Pretraining for perception and control
  • How can pretraining take advantage of both perception and action?
  • How can pretraining be useful to robots with different form factors, latencies, and distinct time and physical scales?
  • Large dataset collection and data management techniques for robot pretraining
  • Pretraining with simulation vs real-world data
  • Theoretical guarantees and performance bounds for pretraining
  • How much supervision is required? - learning from labaled vs unlabeled data
  • What will robotics architectures look like in 10 years? Which components should or should not be pretrained?
  • How much finetuning do pretrained models need?
  • What can we pretraing? - Skills discovery, perception representations, perception-action loops, etc
  • Human bottleneck: how to pretrain when humans are involved in the decision-making process?
  • Any other related topics we might have forgotten in the list above 😄

Accepted Talks and Posters

Accepted papers will be presented in the form of posters (with lightning talks) or spotlight talks at the workshop. We encourage submissions of work in progress, as well as work that is not yet published.

Submission instructions

  • Submissions should be short papers up to 4 pages in PDF format (not counting references and an optional appendix, which can go over the limit)
  • This workshop will not provide formal official proceedings and the papers will be available on the workshop website.

Accepted Papers

Spotlight talks (top 15%)

Lightning talks



For questions and comments, please contact us.