Skip to content

Commit af48645

Browse files
committed
add mindllm
1 parent 5dd20df commit af48645

File tree

10 files changed

+88
-0
lines changed

10 files changed

+88
-0
lines changed
85.6 KB
Loading

app/projects/mindllm/assets/adapt.png

49.1 KB
Loading

app/projects/mindllm/assets/bit.png

347 KB
Loading
168 KB
Loading
254 KB
Loading

app/projects/mindllm/assets/ood.png

253 KB
Loading
490 KB
Loading

app/projects/mindllm/assets/vis.png

1.28 MB
Loading

app/projects/mindllm/page.mdx

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
import { Authors, Badges } from '@/components/utils'
2+
3+
# MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
4+
5+
<Authors
6+
authors="Weikang Qiu, Yale University; Zheng Huang, Dartmouth College; Haoyu Hu, University of Cambridge; Aosong Feng, Yale University; Yujun Yan, Dartmouth College; Rex Ying, Yale University"
7+
/>
8+
9+
<Badges
10+
venue="ICML 2025"
11+
github="https://github.com/Graph-and-Geometric-Learning/MindLLM"
12+
arxiv="https://arxiv.org/abs/2502.15786"
13+
pdf="https://arxiv.org/pdf/2502.15786"
14+
/>
15+
16+
17+
## Introduction
18+
Decoding functional magnetic resonance imaging (fMRI) signals into text has been a key challenge in the neuroscience community, with the potential to advance brain-computer interfaces and uncover deeper insights into brain mechanisms. However, existing approaches often struggle with suboptimal predictive performance, limited task variety, and poor generalization across subjects. Two challenges are positted:
19+
1. **Subject-Agnostic Decoding**: Existing fMRI-to-text decoding models often require extensive subject-specific training data, which limits their generalization to new subjects, preventing the out-of-the-box application, which is crucial for real-world applications.
20+
2. **Versatile Decoding**: Most existing models are designed for specific tasks, which restricts their applicability to a narrow range of tasks and limits their ability to capture the rich semantic information encoded in fMRI signals.
21+
22+
In response to this, we propose MindLLM, a model designed for subject-agnostic and versatile fMRIto-text decoding. MindLLM consists of an fMRI encoder and an off-the-shelf LLM. The fMRI encoder employs a neuroscience-informed attention mechanism, which is capable of accommodating subjects with varying input shapes and thus achieves high-performance subject-agnostic decoding. Moreover, we introduce Brain Instruction Tuning (BIT), a novel approach that enhances the model's ability to capture diverse semantic representations from fMRI signals, enabling more versatile decoding.
23+
24+
## Method
25+
### fMRI Encoder
26+
The fMRI encoder consists of a neuroscience-informed attention layer and an MLP. The attention layer is designed to handle varying input shapes across subjects, which is crucial for subject-agnostic decoding. The MLP processes the output of the attention layer to generate fMRI tokens aligned with LLM's input spaces.
27+
28+
There are three design choices that differentiate the neuroscience-informed attention layer from the standard attention mechanism:
29+
1. Exclude fMRI values from keys. This is motivated by the intuition: different from images or text, which are usually considered translation-invariant, the positions of voxels carry specific brain functional information, as voxels in different
30+
areas are associated with distinct brain functions. Consequently, a voxel's position alone can theoretically serve as
31+
effective keys for attention weight computation.
32+
2. Incorporate brain parcellation information. While positional encoding alone improves performance, it lacks inherent neuroscientific grounding, potentially making it challenging for the model to efficiently learn representations aligned with established principles of brain function. To overcome this, we incorporate existing brain region information into the keys of the attention.
33+
3. Combine multiple parcellation schemes. Different parcellation schemes capture different aspects of brain function, and combining them can provide a more comprehensive representation of the brain's functional organization.
34+
35+
![Architecture of MindLLM.|scale=0.8](./assets/encoder.png)
36+
37+
38+
### Brain Instruction Tuning (BIT)
39+
To enable versatile fMRI-to-text decoding, an appropriate BIT dataset is required, yet no such dataset currently exists. To bridge this gap, we construct one based on images the subject currently see and previously saw. The BIT dataset consists of 980,610 conversations and includes the following 4 aspects that are crucial for caputring important semantics in fMRI signals:
40+
- **Perception & Scene Understanding**. This aspect captures the subject's basic perception and understanding of the current scene.
41+
- **Memory & Knowledge Retrieval**. This aspect captures the subject's memory and knowledge. It is also relevant to *signifying chain* in Lacan's theory, which closely related to human cognition.
42+
- **Language & Symbolic Processing**. This aspect captures the subject's language and symbolic processing, which is achieved by including tasks related to text recognition and numerical reasoning.
43+
- **Complex Reasoning**. This aspect tries to emulate the reasoning process happening in human brains.
44+
45+
46+
![Overview of Brain Instruction Tuning (BIT) dataset.|scale=0.8](./assets/bit.png)
47+
48+
## Experiments
49+
### Brain Captioning
50+
MindLLM outperforms the state-of-the-art model by an average of 12.0%.
51+
52+
![Results on brain captioning benchmark.|scale=0.7](./assets/brain-cap.png)
53+
54+
### Versatile Decoding
55+
MindLLM outperforms the state-of-the-art model by an average of 28.0%, demonstrating its ability to handle a wide range of tasks.
56+
57+
![Results on versatile decoding benchmark.|scale=0.7](./assets/versatile.png)
58+
59+
60+
### Unseen Subject Generalization
61+
MindLLM outperforms the state-of-the-art model by an average of 24.5%, demonstrating its out-of-the-box generalization ability.
62+
63+
![Model generalization when training on subjects 1-7 and evaluate on subject 8|scale=0.4](./assets/ood.png)
64+
65+
66+
### Adapting to New Tasks
67+
MindLLM outperforms the state-of-the-art model by an average of 25.0%. Therefore, it is convincing that one can always repurpose MindLLM to new tasks of own interest without the need for extensive retraining.
68+
69+
![Model adaptation to new tasks.|scale=0.5](./assets/adapt.png)
70+
71+
### Visualizations and Interpretations
72+
By visualizing the attention maps, we can observe that MindLLM is capable of capturing the spatial focus of the subject's attention. Some queries are more focused on specific brain regions, while others communicate regions of different functions.
73+
74+
![Each subfigure corresponds to a specific query of subject 1. We visualize the attention map between that query token and all voxels. Here we randomly select 6 query tokens, each of which exhibits a distinct spatial focus.|scale=0.5](./assets/vis.png)
75+
76+
77+

config/publications.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,17 @@ export interface Publication {
1919
}
2020

2121
export const publications: Publication[] = [
22+
{
23+
title: "MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding",
24+
authors: "Weikang Qiu, Zheng Huang, Haoyu Hu, Aosong Feng, Yujun Yan, Rex Ying",
25+
venue: "ICML 2025",
26+
page: "mindllm",
27+
code: "https://github.com/Graph-and-Geometric-Learning/MindLLM",
28+
paper: "https://arxiv.org/abs/2502.15786",
29+
abstract: "We introduce MindLLM, a subject-agnostic and versatile model for fMRI-to-text decoding. MindLLM is equipped with a novel encoder that employs neuroscience-informed attention, and is trained on a large-scale Brain Instruction Tuning (BIT) dataset, enabling it to decode fMRI signals into natural language descriptions across various tasks in a subject-agnostic manner.",
30+
impact: "MindLLM achieves state-of-the-art performance on a wide range of fMRI-to-text decoding tasks, and demonstrates strong generalization ability to unseen subjects and tasks. This work paves the way for future research on high-quality fMRI-to-text decoding.",
31+
tags: [Tag.MultiModalFoundationModel],
32+
},
2233
{
2334
title: "MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering",
2435
authors: "Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying",

0 commit comments

Comments
 (0)