Algorithm Engineer @ dots, Xiaohongshu (RED) · Ph.D. from MAC Lab, Xiamen University
Multimodal Large Language Models 🤖 · Text-to-Image Pretraining 🎨
- 🔬 I'm an Algorithm Engineer at the dots team of Xiaohongshu (RED), working on Multimodal Large Language Models and Text-to-Image Pretraining.
- 🎓 I received my Ph.D. from the Department of Artificial Intelligence, Xiamen University (MAC Lab), advised by Prof. Rongrong Ji and Prof. Xiaoshuai Sun.
- 📚 27 papers in CCF-A/B venues (17 as first/co-first author, 3 Orals), with 1500+ Google Scholar citations.
- ⭐ Core developer of External-Attention-pytorch (12k+ stars).
- 📫 Reach me at mayiwei1998@163.com — feel free to chat!
- 2026 — Two papers accepted by IJCV; one by ACL 2026 (Findings); one by Pattern Recognition.
- 2025 — One paper accepted by IEEE TPAMI; one by ACM MM 2025.
- 🥇 2026 Top-Talent Program Offers (9): Xiaohongshu Red Star · Tencent Qingyun · Tongyi Alibaba Star · ByteDance Jindouyun · Ant Star · Huawei Genius Youth · Meituan Beidou · Xiaomi Top Talent · JD TGT
- 🧪 NSFC Youth Student Basic Research Project — Principal Investigator (国自然青基), 2024
- 🚀 CAST Young Talent Support Project for Ph.D. Students (青托), 2025
- 🎖️ Baidu Scholarship — Global Top 40, 2024
- 🏅 National Scholarship ×3 (2019 · 2022 · 2024)
Full list on my homepage →
- An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing — IJCV 2026 [Code]
- CoP: Chain of Perception for Referring 3D Instance Segmentation — IJCV 2026 [Code]
- Boosting Multi-Modal Large Language Model with Enhanced Visual Features — TPAMI 2025 [Code]
- I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing — NeurIPS 2024 [Code]
- X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation — ICML 2024 [Project]
- X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance — ICCV 2023 [Project]
- Towards Local Visual Modeling for Image Captioning — Pattern Recognition 2023 🏆 ESI Highly Cited [Code]
- X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval — ACM MM 2022 🔥 500+ citations [Code]
- 🤖 dots.vlm1.inst — Instruction-tuned multimodal LLM from the dots series (Xiaohongshu · dots)
- 📄 dots.mocr — Multilingual document layout parsing & OCR model (Xiaohongshu · dots)
- ⭐ External-Attention-pytorch — PyTorch implementations of Attention / MLP / Re-param / Conv modules (12k+ stars)
I share paper reading notes and tutorials on 知乎 (Zhihu) and my WeChat public account FightingCV.