What is LLF-Bench? LLF-Bench (Learning from Language Feedback Benchmark; pronounced as “elf bench”), is a new benchmark
to evaluate the ability of AI agents to interactively learn from just
language feedback. The agent interacts with an environment in LLF-Bench, takes action, and gets language feedback instead of
rewards or action. LLF-Bench consists of 8 diverse benchmarks.
# Clone the LLF-bench code
git clone https://github.com/microsoft/LLF-Bench.git
# Optional but recommended: create a conda environment.
conda create -n LLF-Bench python=3.8 -y
conda activate LLF-Bench
# Install LLF Bench
pip install -e .
# To install Alfworld and Metaworld, we need some more resources. See Github for details.
How is it different from RL? Reinforcement learning (RL) is another commonly studied
interactive learning setting. The key difference is that in RL, the agent is trained using
rewards, whereas in LLF (the paradigm upon which LLF-Bench) is based, uses language feedback instead of rewards.
Why language feedback? Language feedback has two main advantages over rewards and expert actions (which are the two
most commonly used feedbacks). Firstly, unlike rewards, language feedback is very expressive and
consequently can pack a lot more information which can help the agent train faster, and unlike actions,
language feedback can be more easily provided by non-expert humans. Secondly, language feedback is
closer to how humans learn, and this makes it more natural for many settings.
Can LLF-Bench be used to evaluate LLMs? Yes! In fact, that is one of the main purposes behind LLF-Bench -- to robustly evaluate LLM-based Agents.
There are two reasons to prefer LLF-Bench for such evaluation: firstly, LLF-Bench provides a diverse set of environments where each environment provides sampled verbalizations of the problem making it harder
to prompt hack. Secondly, LLF-Bench includes environments that require learning, so that no matter how good an LLM is, it cannot zero-shot
solve those LLF-Bench environments. Therefore, LLF-agents must show signs of learning new information to be able to solve those environments.