Benchmarking Affordance Generalization with BusyBox

Modular 3D-printable robotic manipulation benchmark for evaluating affordance generalization in robot foundation models.

BusyBox modular hardware

Overview

Robot Foundation Models (RFMs), also referred to as Vision-Language Action models (VLAs), have been attracting the attention of researchers and practitioners with a promise of generalizing robot behaviors across tasks, objects, and environments. The community has extensively studied RFMs' generalization capabilities in the vision and language space. However, affordance generalization – RFMs' ability to manipulate new objects with familiar physical features - remains largely unexplored. In the meantime, this meta-skill is plays a critical rule in a person's ability to quickly figure out how to handle hitherto unseen objects. In fact, basic physical interface elements like buttons and switches are designed to look and function similarly across different devices to facilitate affordance generalization in environments inhabited by people. Whether robots can capitalize on these design aids remains unknown: researchers currently lack a benchmark for systematically studing affordance generalization in RFMs.

BusyBox is a physical 3D-printable kit for systematically evaluating how well RFMs generalize their knowledge of basic affordances (pressing buttons, flipping switches, turning knobs, etc). BusyBox can be assembled into any of a multitude of distinct objects having the same set of affordances. Paired with a carefully design protocols for experiments and data collection that we present in this work, BusyBox can provide valuable insights into RFMs' ability to recognize and exploit ubiquitous affordance classes. In our experiments, BusyBox hightlights affordance generalization as a major improvement area for RFMs.

Contributions

BusyBox Hardware

Disassembled BusyBox modular hardware

BusyBox consists of six 3D-printable modules that can be easily swapped and rotated to create multiple configurations with the same set of affordances:

  • Display module: E Ink display with LED indicators and main electronics
  • Buttons module: Four colored, illuminated buttons
  • Sliders module: Two adjustable sliders
  • Knob module: Rotatable knob with 6 positions and handle
  • Switches module: Two switches with on/off positions
  • Wire module: Colored wires with pluggable connectors

The modular design enables rapid reconfiguration using snap connectors, allowing researchers to systematically vary the spatial layout while maintaining consistent affordances.

Benchmarking protocol

We use BusyBox to evaluate affordance generalization by finetuning RFMs on affordance demonstration data from a "canonical" BusyBox configuration and testing the resulting models on their ability to perform the demonstrated tasks on reconfigured BusyBox instances.

BusyBox configurations

Results

Current Robot Foundation Models Struggle with Affordance Generalization

Our experiment shows that affordance generalization is a challenge even for some of the strongest existing VLAs and even in in-distribution settings, because it can require out-of distribution generalization in the visual space.

Affordance generalization experiment results

Assembly

BusyBox's modular design enables rapid reconfiguration through snap connectors. Modules can be swapped and rotated in under 2 minutes, allowing researchers to quickly create new test configurations. The 3D-printed components are designed for easy replication in any robotics lab with access to a standard 3D printer. Two-filament printing enhances visibility of position markers, though single-color prints with manual highlighting work as an alternative.

Teleoperation Data Collection

We collected 1000+ demonstrations by teleoperating a Mobile ALOHA bimanual robot on the canonical BusyBox configuration. The dataset covers all task families with systematic variation in:

  • Initial states (slider/knob positions, switch states)
  • BusyBox position and orientation
  • Language instruction (color vs position references)
  • Robot starting poses

Teleoperators followed strict protocols for consistency: efficient movements, active demonstration without unnecessary pauses, and ensuring wrist camera visibility of task-relevant areas.

BibTeX

@misc{fortier2026benchmarkingaffordancegeneralizationbusybox,
      title={Benchmarking Affordance Generalization with BusyBox}, 
      author={Dean Fortier and Timothy Adamson and Tess Hellebrekers and Teresa LaScala and Kofi Ennin and Michael Murray and Andrey Kolobov and Galen Mullins},
      year={2026},
      eprint={2602.05441},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.05441}, 
}