Pretraining for Robotics Workshop, ICRA 2023

About

Recent advances in machine learning have started a paradigm shift from task-specific models towards large general-purpose architectures. In the domains of language and vision we see large models such as GPT3, BERT, and CLIP that have opened avenues towards solving several applications and continue to cause an explosion of new ideas and possibilities. What does it take to bring the same level of advancements to the field of robotics - in order to build versatile agents that can be deployed in challenging environments? The goal of this workshop is to analyze how we can scale robotics towards the complexity of real world by leveraging pretrained models. We will discuss how to apply the concept of large scale pretraining to robotics, so as to enable models to learn how to process diverse, multimodal perception inputs, connect perception with action, and generalize across scenarios and form factors. In particular, we are interested in analyzing the domain of pretraining for robotics from several angles such as, and not limited to:

How do we build pre-trained reusable feature representations from complex inputs?
How do we learn world models that combine perception and actions?
How can we combine pretrained representations from multiple modalities such as language, vision, and geometry into robotics systems?
What are the right kinds of priors that are helpful for optimization and task planning?
How do we leverage architectures and training methods that have been successful in other domains in robotics?
How do we efficiently fine-tune pretrained models for new downstream tasks?
How best to deal with the specificities of robotics such as expensive data collection and safety constraints?

We hope to connect researchers from the communities of deep learning, representation learning, classical robotics, and to induce collaborations in this exciting new domain, while providing a platform to discuss recent developments, challenges and tradeoffs.

Speakers and panelists

Dieter Fox
University of Washington / NVIDIA
Mac Schwager
Stanford University
Dorsa Sadigh
Stanford University

Sanjiban Choudhury
Cornell University
Ashish Kapoor
Scaled Foundations
Kristen Grauman
University of Texas Austin

Jitendra Malik
UC Berkeley
Yuke Zhu
University of Texas Austin

Schedule

May 29th 2023

8:30am - 8:50am: Breakfast
8:50am - 9:00am: Introduction and opening remarks
9:00am - 9:30am: Yuke Zhu
9:30am - 10:00am: Sanjiban Choudhury
10:00am - 10:20am: Poster lightning talks (20 x 1min talk each)
10:20am - 11:00am: Coffee break and poster session I
11:00am - 11:30pm: Jitendra Malik
11:30am - 12:00pm: Ashish Kapoor
12:00pm - 1:00pm: Lunch break
1:00pm - 1:30pm: Dieter Fox
1:30pm - 2:00pm: Kristen Grauman
2:00pm - 2:30pm: Spotlight talks (4x 5min talk and 2min Q&A each)
2:30pm - 3:15pm: Coffee break and poster session II
3:15pm - 3:45pm: Mac Schwager
3:45pm - 4:15pm: Dorsa Sadigh
4:15pm - 5:00pm: Panel discussion
5:00pm - 5:05pm: Closing remarks

Call for papers

Important dates (all times AoE)

Submissions open: Feb 15th 2023
Submission deadline: Apr 14th 2023
Decision notification: Apr 30th 2023
Camera ready deadline: May 14th 2023
Workshop: May 29th 2023

Call for papers

Submission link: https://openreview.net/group?id=ICRA.org/2023/Workshop/Pretraining4Robotics

In this workshop, we aim to bring together machine learning and robotics researchers who work at the intersection of these fields. We invite researchers to submit work in the following or related areas (non-exhaustive list):

Multi-modal pretrained models (images, text, depth, point clouds, action information)
Pretraining for perception and control
How can pretraining take advantage of both perception and action?
How can pretraining be useful to robots with different form factors, latencies, and distinct time and physical scales?
Large dataset collection and data management techniques for robot pretraining
Pretraining with simulation vs real-world data
Theoretical guarantees and performance bounds for pretraining
How much supervision is required? - learning from labaled vs unlabeled data
What will robotics architectures look like in 10 years? Which components should or should not be pretrained?
How much finetuning do pretrained models need?
What can we pretraing? - Skills discovery, perception representations, perception-action loops, etc
Human bottleneck: how to pretrain when humans are involved in the decision-making process?
Any other related topics we might have forgotten in the list above 😄

Accepted Talks and Posters

Accepted papers will be presented in the form of posters (with lightning talks) or spotlight talks at the workshop. We encourage submissions of work in progress, as well as work that is not yet published.

Submission instructions

Submissions should be short papers up to 4 pages in PDF format (not counting references and an optional appendix, which can go over the limit)
This workshop will not provide formal official proceedings and the papers will be available on the workshop website.

Accepted Papers

Spotlight talks (top 15%)

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

Anthony Liang, Jesse Thomason, Erdem Biyik
Building Long-term Spatial Temporal Semantic Map

Ifrah Idrees, Trevor Wiedmann, Huda Abdulrasool, George Konidaris, Stefanie Tellex
Zero-Shot Robot Manipulation from Passive Human Videos

Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar

Lightning talks

Road Barlow Twins: Redundancy Reduction for Motion Prediction

Royden Wagner, Marvin Klemp, Carlos Fernandez Lopez, Omer Sahin Tas
ConceptFusion: Open-set Multimodal 3D Mapping

Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Subramanian Iyer, Soroush Saryazdi, Nikhil Varma Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso M de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam
Fast Traversability Estimation for Wild Visual Navigation

Jonas Frey, Matias Mattamala, Nived Chebrolu, Cesar Cadena, Maurice Fallon, Marco Hutter
Pretraining Neural-Networks with Neural-Fly for Rapid Online Learning

Michael O'Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, Soon-Jo Chung
Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience

Haresh Karnan, Elvin Yang, Daniel Farkash, Garrett Warnell, Joydeep Biswas, Peter Stone
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

Dhruv Shah, Kyle Stachowicz, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine
Contrastive Language, Action, and State Pre-training for Robot Learning

Krishan Rana, Andrew Melnik, Niko Suenderhauf
Learning from synthetic data generated with GRADE

Elia Bonetto, Chenghao Xu, Aamir Ahmad
Fine-Grained Object Detection and Manipulation with Segmentation-Conditioned Perceiver-Actor

Shogo Akiyama, Dan Ogawa Lillrank, Kai Arulkumaran
CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Ayush Agrawal, Raghav Arora, Ahana Datta, Snehasis Banerjee, Brojeshwar Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, Madhava Krishna
Text2Motion: From Natural Language Instructions to Feasible Plans

Kevin Lin, Christopher Agia, Toki Migimatsu, Marco Pavone, Jeannette Bohg
Masked Trajectory Models for Prediction, Representation, and Control

Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran
FLIP-TD: Free Lunch Inpainting on Top-Down Images for Robotic Tasks

Anukriti Singh, Vishnu Dutt Sharma, Pratap Tokekar
Self-Supervised 3D Representation Learning for Robotics

Ishika Singh, Anthony Liang, Mohit Shridhar, Jesse Thomason
Improved Zero-Shot Object Localization using Contextualized Prompts and Objects in Context

Gertjan J. Burghouts, Wouter Meijer, Fieke Hillerström, Jelle van Mil, Michael van Bekkum, Marianne Schaaphok, Frank Ruis
Digital Twin of a Multi-Arm Robot Platform based on Isaac Sim for Synthetic Data Generation

Juan Jose Quiroz Omana, Murilo Marques Marinho, Kanako Harada
Grounding Pretrained Features in 3D Representations

Kenneth Tor Blomqvist, Francesco Milano, Jen Jen Chung, Lionel Ott, Roland Siegwart
TartanDrive 1.5: Improving Large Multimodal Robotics Dataset Collection and Distribution

Matthew Sivaprakasam, Samuel Triest, Mateo Guaman Castro, Micah Nye, Mukhtar Maulimov, Cherie Ho, Parv Maheshwari, Wenshan Wang, Sebastian Scherer
Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference-Aligned Path Planning

Elvin Yang, Haresh Karnan, Garrett Warnell, Peter Stone, Joydeep Biswas

Organizers

Rogerio Bonatti
Microsoft
Sai Vemprala
Scaled Foundations
Mustafa Mukadam
Facebook AI Research

Luis Figueredo
Technical University of Munich
Antonio Loquercio
University of California Berkeley
Xingyu Liu
Carnegie Mellon University
Valts Blukis
NVIDIA
Huang (Raven) Huang
UC Berkeley

Contact

For questions and comments, please contact us.

Pretraining for Robotics (PT4R)

Workshop at the 2023 International Conference on Robotics and Automation - ICRA
London, May 29 2023, full-day workshop

About

Speakers and panelists

Dieter Fox
University of Washington / NVIDIA

Mac Schwager
Stanford University

Dorsa Sadigh
Stanford University

Sanjiban Choudhury
Cornell University

Ashish Kapoor
Scaled Foundations

Kristen Grauman
University of Texas Austin

Jitendra Malik
UC Berkeley

Yuke Zhu
University of Texas Austin