Benchmarking Affordance Generalization with BusyBox

Modular 3D-printable robotic manipulation benchmark for evaluating affordance generalization in robot foundation models.

BusyBox modular hardware

Overview

Robot Foundation Models (RFMs), also referred to as Vision-Language Action models (VLAs), have been attracting the attention of researchers and practitioners with a promise of generalizing robot behaviors across tasks, objects, and environments. The community has extensively studied RFMs' generalization capabilities in the vision and language space. However, affordance generalization – RFMs' ability to manipulate new objects with familiar physical features - remains largely unexplored. In the meantime, this meta-skill is plays a critical rule in a person's ability to quickly figure out how to handle hitherto unseen objects. In fact, basic physical interface elements like buttons and switches are designed to look and function similarly across different devices to facilitate affordance generalization in environments inhabited by people. Whether robots can capitalize on these design aids remains unknown: researchers currently lack a benchmark for systematically studing affordance generalization in RFMs.

BusyBox is a physical 3D-printable kit for systematically evaluating how well RFMs generalize their knowledge of basic affordances (pressing buttons, flipping switches, turning knobs, etc). BusyBox can be assembled into any of a multitude of distinct objects having the same set of affordances. Paired with a carefully design protocols for experiments and data collection that we present in this work, BusyBox can provide valuable insights into RFMs' ability to recognize and exploit ubiquitous affordance classes. In our experiments, BusyBox hightlights affordance generalization as a major improvement area for RFMs.

Contributions

BusyBox Hardware

Disassembled BusyBox modular hardware

BusyBox consists of six 3D-printable modules that can be easily swapped and rotated to create multiple configurations with the same set of affordances:

  • Display module: E Ink display with LED indicators and main electronics
  • Buttons module: Four colored, illuminated buttons
  • Sliders module: Two adjustable sliders
  • Knob module: Rotatable knob with 6 positions and handle
  • Switches module: Two switches with on/off positions
  • Wire module: Colored wires with pluggable connectors

The modular design enables rapid reconfiguration using snap connectors, allowing researchers to systematically vary the spatial layout while maintaining consistent affordances.

Benchmarking protocol

We use BusyBox to evaluate affordance generalization by finetuning RFMs on affordance demonstration data from a "canonical" BusyBox configuration and testing the resulting models on their ability to perform the demonstrated tasks on reconfigured BusyBox instances.

Task Families

Results

Current Robot Foundation Models Fail at Affordance Generalization

Despite high success on conventional pick-and-place tasks, leading VLAs completely fail at BusyBox's affordance generalization challenge. Even after finetuning on our dataset, π0 achieves only 30% success on the training configuration and 0% on novel configurations.

Quantitative Results

Model Configuration Canonical Success (%) Config-1 Success (%) Config-2 Success (%)
π0 Zero-shot0.00.00.0
π0 Finetuned30.00.00.0

Performance by Task Type (π0 Finetuned on Canonical)

Task Type Success Rate (%) Key Failure Mode
Turn Knob50.0Overshooting target position
Pull Wire40.0Insufficient grip/force
Push Button40.0Missing button location
Move Slider30.0Imprecise positioning
Flip Switch20.0Lack of bimanual coordination
Insert Wire0.0Failed alignment/insertion

Note: The finetuned model's failures on non-canonical configurations were primarily due to reaching for wrong module positions, indicating memorization rather than affordance understanding.

Assembly

BusyBox's modular design enables rapid reconfiguration through snap connectors. Modules can be swapped and rotated in under 2 minutes, allowing researchers to quickly create new test configurations. The 3D-printed components are designed for easy replication in any robotics lab with access to a standard 3D printer. Two-filament printing enhances visibility of position markers, though single-color prints with manual highlighting work as an alternative.

Teleoperation Data Collection

We collected 1000+ demonstrations by teleoperating a Mobile ALOHA bimanual robot on the canonical BusyBox configuration. The dataset covers all task families with systematic variation in:

  • Initial states (slider/knob positions, switch states)
  • BusyBox position and orientation
  • Language instruction (color vs position references)
  • Robot starting poses

Teleoperators followed strict protocols for consistency: efficient movements, active demonstration without unnecessary pauses, and ensuring wrist camera visibility of task-relevant areas.

BibTeX

@misc{busybox2025,
  title={Benchmarking Affordance Generalization with BusyBox},
  author={Fortier, Dean and Adamson, Timothy and Hellebrekers, Tess and LaScala, Teresa and Ennin, Kofi and Murray, Michael and Kolobov, Andrey and Mullins, Galen},
  booktitle={Eval\&Deploy Workshop at CoRL-2025: Evaluation and Deployment Across the Robot Learning Lifecycle},
  year={2025}
}