UniFER: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models

🌟 Official repository for the paper "Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models"

[📖 Paper] [🤗 Dataset] [🤗 Model]

👀 About UniFER

Multimodal Large Language Models (MLLMs) have revolutionized numerous research fields, including computer vision and affective computing. As a pivotal challenge in this interdisciplinary domain, facial expression recognition (FER) has evolved from separate, domain-specific models to more unified approaches. One promising avenue to unify FER tasks is converting conventional FER datasets into visual question-answering (VQA) formats, enabling the direct application of powerful generalist MLLMs for inference. However, despite the success of cutting-edge MLLMs in various tasks, their performance on FER tasks remains largely unexplored. To address this gap, we provide FERBench, a systematic benchmark that incorporates 20 state-of-the-art MLLMs across four widely used FER datasets. Our results reveal that, while MLLMs exhibit good classification performance, they still face significant limitations in reasoning and interpretability.

To this end, we introduce post-training strategies aimed at enhancing the facial expression reasoning capabilities of MLLMs. Specifically, we curate two high-quality and large-scale datasets: UniFER-CoT-230K for cold-start initialization and UniFER-RLVR-360K for reinforcement learning with verifiable rewards (RLVR), respectively. Building upon them, we develop a unified and interpretable FER foundation model termed UniFER-7B, which outperforms many open-sourced and closed-source generalist MLLMs (e.g., Gemini-2.5-Pro and Qwen2.5-VL-72B).

🔥 Datasets

Our curated datasets consist of four widely used FER datasets: RAF-DB, FERPlus, AffectNet, and SFEW 2.0. Please download the corresponding images from their official websites before use.

Installation

Clone the repository:

git clone https://github.com/zfkarl/UniFER.git
cd UniFER

Create a conda environment:

conda create -n r1-v python=3.11
conda activate r1-v

Please follow the official instructions here to install both PyTorch and additional dependencies.

FERBench

The proposed four subsets of FERBench are stored in the following json files:

eval_rafdb/data/rafdb_qa.json
eval_ferplus/data/ferplus_qa.json
eval_affectnet/data/affectnet_qa.json
eval_sfew_2.0/data/sfew_2.0_qa.json

UniFER-CoT-230K

Download our dataset, and put the json file UniFER_CoT_230K.json in:

data/UniFER_CoT_230K.json

UniFER-RLVR-360K

Download our dataset, and put the json file UniFER_RLVR_360K.json in:

data/UniFER_RLVR_360K.json

🚀 Training

Stage 1: Cold Start SFT

cd train_unifer/src/scripts
bash run_sft_fer.sh

Stage 2: RLVR GRPO Training

cd train_unifer/src/scripts
bash run_grpo_vllm.sh

💫 Evaluation

After the above two-stage post-training, we can subsequently employ the derived model UniFER-7B for inference and evaluate its performance. You may change the directory name Qwen2.5-VL-7B-FER-GRPO-VLLM-8GPU to UniFER-7B for inference. Also, you can directly download our provided checkpoints for inference.

Inference and Evaluation

On RAFDB:

cd eval_rafdb/code
python infer_unifer.py 
python eval_unifer.py

On FERPlus:

cd eval_ferplus/code
python infer_unifer.py 
python eval_unifer.py

On AffectNet:

cd eval_affectnet/code
python infer_unifer.py 
python eval_unifer.py

On SFEW2.0:

cd eval_sfew_2.0/code
python infer_unifer.py 
python eval_unifer.py

Overall Performance:

cd eval_total/code
python eval_unifer.py

🥳 Acknowledgements

We would like to thank R1-V and video-r1, which served as the foundations for our repository.

✅ Citation

If you find UniFER useful for your research and applications, please kindly cite using this BibTeX:

@misc{zhang2025rethinkingfacialexpressionrecognition,
      title={Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond}, 
      author={Fan Zhang and Haoxuan Li and Shengju Qian and Xin Wang and Zheng Lian and Hao Wu and Zhihong Zhu and Yuan Gao and Qiankun Li and Yefeng Zheng and Zhouchen Lin and Pheng-Ann Heng},
      year={2025},
      eprint={2511.00389},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.00389}, 
}

🔥 Please contact fzhang@link.cuhk.edu.hk if you would like to contribute to the leaderboard or have any problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniFER: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models

👀 About UniFER

🔥 Datasets

Installation

FERBench

UniFER-CoT-230K

UniFER-RLVR-360K

🚀 Training

Stage 1: Cold Start SFT

Stage 2: RLVR GRPO Training

💫 Evaluation

Inference and Evaluation

🥳 Acknowledgements

✅ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
eval_affectnet		eval_affectnet
eval_ferplus		eval_ferplus
eval_rafdb		eval_rafdb
eval_sfew_2.0		eval_sfew_2.0
eval_total/code		eval_total/code
figs		figs
train_unifer/src		train_unifer/src
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

UniFER: Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models

👀 About UniFER

🔥 Datasets

Installation

FERBench

UniFER-CoT-230K

UniFER-RLVR-360K

🚀 Training

Stage 1: Cold Start SFT

Stage 2: RLVR GRPO Training

💫 Evaluation

Inference and Evaluation

🥳 Acknowledgements

✅ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages