Introduction


How does the brain process language? We’ve been studying how to scalably answer this question using LLMs and large-scale brain-imaging datasets. Together, these let us automatically generate and test scientific hypotheses about language processing in the brain, potentially enabling a new paradigm for scientific research.

Our starting point is the work of Huth et al. 2016, who built predictive models of voxel responses to natural language stimuli (called encoding models) using interpretable word features (called Eng1000). They found that different brain areas are selective for different semantic categories, and that these categories are organized in a complex, overlapping manner across cortex:

QA encoding models

A lot has happened since 2016. Deep learning flourished, and brought encoding models based on black-box text embeddings that predicted brain responses way better. However, these modern models struggle to explain the underlying phenomena, i.e. what features of the stimulus drive the response?

To solve this, we used QA encoding models, a method for converting qualitative language questions into highly accurate, interpretable models of brain responses. QA encoding models annotate a language stimulus by using a LLM to answer yes-no questions corresponding to qualitative theories. Then we can just fit a linear model to predict brain responses from these answers.


This works surprisingly well. With just 35 questions, a QA encoding model outperforms existing baselines at predicting brain responses in both fMRI and ECoG data (even black-box models).
The model weights also provide easily interpretable maps of language selectivity across cortex. The 35-question model is small enough that we can visualize the whole thing. No feature importances or post-hoc summaries, just 35 questions and a map showing their linear weights for each brain voxel:

This opens the door to improve the speed of iteration on scientific hypotheses, as LLMs can quickly annotate and test any verbalizable theory. We tested a lot more hypotheses in the paper, and made a lot more comparisons, e.g. to maps from the literature. In the next section, we explore how we can we further verify these hypotheses in follow-up experiments.

Generative causal testing

To help test what features of a language stimulus drive the response in each brain area, we present generative causal testing (GCT). GCT is a framework for generating concise explanations of language selectivity in the brain from predictive models and then testing those explanations in follow-up experiments using LLM-generated stimuli.

Specifically, GCT tests explanations by building a story where each paragraph is designed to align with a particular explanation, while maintaining overall coherence. We can then average over responses for paragraphs with a particular explanation to estimate which brain areas are selective for that explanation. This recovers known selectivity maps and also suggests some new ones:

We can now compare these GCT-derived maps to the original QA encoding model maps to see how well they align. Turns out the correlations are generally quite positive: We did a lot more with GCT, including explaining selectivity both in individual voxels and cortical regions of interest (ROIs), including newly identified microROIs in prefrontal cortex. We found that explanatory accuracy is closely related to the predictive power and stability of the underlying predictive models. Finally, we found that GCT can dissect fine-grained differences between brain areas with similar functional selectivity.

There is a lot more to be done! We're just at the beginning of an exciting journey of LLM-powered scientific discovery. If you're interested in building off of this work, take a look at the code below and feel free to reach out!

Code and reproducibility