How does the brain process language?
We’ve been studying how to
scalably answer this question using
LLMs and
large-scale brain-imaging datasets.
Together, these let us automatically generate and test scientific hypotheses about language processing in the
brain,
potentially enabling a new paradigm for scientific research.
Our starting point is the work of
Huth et al.
2016, who built predictive models of voxel responses to natural language stimuli (called encoding
models) using interpretable word features (called Eng1000).
They found that different brain areas are selective for different semantic categories, and that these categories
are organized in a complex, overlapping manner across cortex:
QA encoding models
A lot has happened since 2016.
Deep learning flourished, and brought
encoding
models
based on
black-box
text embeddings that predicted brain responses
way
better.
However, these modern models struggle to explain the underlying phenomena, i.e. what features of the stimulus
drive the response?
To solve this, we used QA encoding models, a method for converting qualitative language questions into highly
accurate, interpretable models of brain responses. QA encoding models annotate
a language stimulus by using a LLM to answer yes-no questions corresponding to qualitative
theories.
Then we can just fit a linear model to predict brain responses from these answers.
This works surprisingly well. With just 35 questions, a QA encoding model
outperforms existing baselines at predicting
brain responses in both fMRI and ECoG data (even black-box models).
The model weights also provide easily interpretable maps of language
selectivity across cortex.
The 35-question model is small enough that we can visualize the whole thing.
No feature importances or post-hoc summaries, just 35 questions and a map showing their linear weights for each
brain voxel:
This opens the door to improve the speed of iteration on scientific hypotheses, as LLMs can quickly annotate and
test any verbalizable theory.
We tested a lot more hypotheses in the paper, and made a lot more comparisons, e.g. to maps from
the
literature.
In the next section, we explore how we can we further verify these hypotheses in follow-up experiments.
Generative causal testing
To help test what features of a language stimulus
drive the response in each brain area, we present generative causal testing (GCT).
GCT is a framework for generating
concise explanations of language selectivity in the brain from predictive models and then testing those
explanations in follow-up experiments using LLM-generated stimuli.
Specifically, GCT tests explanations by building a story where each paragraph is designed to
align with a particular explanation, while maintaining overall coherence.

We can then average over responses for paragraphs with a particular explanation
to estimate which brain areas are selective for that explanation. This recovers known selectivity maps and also
suggests some new ones:
We can now compare these GCT-derived maps to the original QA encoding model maps to see how well they align.
Turns out the correlations are generally quite positive:

We did a lot more with GCT, including explaining
selectivity both in individual voxels and cortical regions of interest (ROIs), including newly identified
microROIs in prefrontal cortex.
We found that explanatory accuracy is closely related to the predictive power and
stability of the underlying predictive models.
Finally, we found that GCT can dissect fine-grained differences
between brain areas with similar functional selectivity.
There is a lot more to be done! We're just at the beginning of an exciting journey of LLM-powered
scientific
discovery. If you're interested in building off of this work, take a look at the code below and feel free to
reach out!
Code and reproducibility