Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires (ICLR 2025)¶
This repository contains the Pytorch code to replicate experiments in our paper Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires accepted at the International Conference on Learning Representations (ICLR 2025):
@inproceedings{
chapfuwa2025scalable,
title={Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires},
author={Paidamoyo Chapfuwa and Ilker Demirel and Lorenzo Pisani and Javier Zazo and Elon Portugaly and H. Jabran Zahid and Julia Greissl},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=wyF5vNIsO7}
}
- Model type: Unsupervised representation learning
- License: MIT
Model¶
Prerequisites¶
The code is implemented with the following dependencies:
- Python 3.10.16
- Additional python packages can be installed by running:
Data¶
We consider the following public datasets: - Synthentic for validating of the proposed JL-GloVe algorithm - ImmuneCODE for training the publicly available JL-GloVe TCR embeddings - Emerson for evaluating the trained public TCR embeddings
Model Training¶
- To train JL-GloVe embeddings using synthentic data run:
Metrics and Visualizations¶
- We provide the 535,186 JL-GloVe TCR embeddings derived from the 3,991 ImmuneCODE repertoires here:
- 100 dimensions
- 300 dimensions
Direct intended uses¶
JL-GloVe is shared for research purposes only, namely, benchmarking and inference on downstream tasks. It is not meant to be used for clinical practice. JL-Glove was not extensively tested for its capabilities and properties, including its accuracy and reliability in application settings, fairness across different demographics and uses, and security and privacy.
Out-of-scope uses¶
This is a research model which should not be used in any real clinical or production scenario.
Risks and limitations¶
JL-GloVe TCR embeddings reflect the co-occurrence statistics of the data used for training.
License and Usage Notices¶
The data, code, and model checkpoints described in this repository is provided for research use only. The data, code, and model checkpoints is not intended for use in clinical decision-making or for any other clinical use, and the performance of model for clinical use has not been established. You bear sole responsibility for any use of these data, code, and model checkpoints, including incorporation into any product intended for clinical use.
Trademarks¶
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.