FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
Anjia Cao, Xing Wei, Zhiheng Ma
- [2025/03/09] Release training codes and scripts.
- [2025/03/08] Recaptioned CC3M and Recaptioned YFCC15M on Hugging Face.
- [2025/02/27] Accepted by CVPR 2025.
- [2024/11/28] Model on Hugging Face.
- [2024/11/28] Release evaluation codes and scripts.
- [2024/11/18] Paper on arXiv.
- 🔥 Leveraging frozen LLMs to naturally process long text inputs.
- 🔥 Generalizing from monolingual training to multilingual evaluation.
- 🔥 Strong improvement on long/short-context image-text retrieval, image classification, and multilingual scenarios.
- Release training code and data.
- Release evaluation code.
- Release pre-trained checkpoints.
git clone https://github.com/MIV-XJTU/FLAME.git
cd FLAME
conda create -n flame python=3.10 -y
conda activate flame
make install
make install-training
make install-test
See Training.md.
See Evaluation.md.
Dataset | Link |
---|---|
CC3M-ReCap | Hugging Face |
YFCC15M-ReCap | Hugging Face |
Dataset | Model | Link |
---|---|---|
CC3M | Mistral-Nemo-ViT-B/16 | Hugging Face |
The project is under a standard Creative Common CC-BY-4.0 License.
If you find our work helpful for your research, please consider giving a star and citation.
@inproceedings{cao2025flame,
title={FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training},
author={Cao, Anjia and Wei, Xing and Ma, Zhiheng},
booktitle={CVPR},
year={2025}
}
This project is based on open_clip, and thanks for the nice work! We also thank CLIP_benchmark, DreamLIP, Long-CLIP, PromptEOL, and MiniCPM-V for their codes.