About Me

Xize Cheng (成曦泽) is a fourth-year doctoral candidate at the School of Computer Science and Technology, Zhejiang University, expecting to graduate in June 2026. He is advised by Professor Zhou Zhao. I am actively looking for academic collaboration, feel free to drop me an email.

In 2024, I lead or participate in the following research topics:

  • Large Language Models(LLMs): Spoken Dialogue Systems / Audio Large Language Models
  • Audio Understanding: Sound Separation Model / Audio-Visual Speech

🔥 News

  • 2025.01: 🎉🎉 4 papers (2 first author) are accepted by ICLR 2025!
  • 2024.09: 🎉🎉 3 papers are accepted by NeurIPS 2024!
  • 2024.09: 🎉🎉 1 papers (1 co-first author) are accepted by EMNLP 2024!
  • 2024.07: 🎉🎉 2 papers (1 first author & 1 corresponding author) are accepted by ACL 2024!
  • 2024.05: 🎉🎉 3 papers (2 co-first author & 1 corresponding author & 1 oral presentation) are accepted by ACMMM 2024!
  • 2024.03: 🎉🎉 2 papers are accepted by ICML 2024!
  • 2024.01: I start my internship at Alibaba, DAMO Academy, Tongyi Lab.
  • 2023.10: 🎉🎉 I am awarded the National Scholarship (2023, Graduate student). Top 0.1% at Zhejiang University.
  • 2023.09: 🎉🎉 1 paper is accepted by EMNLP 2023!
  • 2023.09: 🎉🎉 1 paper is accepted by NeurIPS 2023!
  • 2023.07: 🎉🎉 1 paper (1 co-first author) is accepted by ACMMM 2023!
  • 2023.05: 🎉🎉 3 papers (1 first author) are accepted by ICCV 2023!
  • 2023.06: AV-TranSpeech comes out! Media coverage: PaperWeekly and ByteDance.
  • 2023.05: OpenSR will be presented in an oral presentation at ACL 2023!
  • 2023.05: 🎉🎉 7 papers (1 first author & 2 co-first author, & 2 oral presentation)are accepted by ACL 2023!
  • 2023.03: We created the first Audio-Visual Multi-lingual Speech Translation dataset AVMuST-TED!
  • 2022.10: I was awarded the Outstanding Graduate Student and Triple Excellence Graduate Student of Zhejiang University!
  • 2021.03: I started my internship at Taobao as an algorithm intern, conducting multi-modality research.

📝 Publications

Under Review
sym
ICLR 2025
sym
ACL2023 Oral
sym
ICCV 2023
sym
ACL 2023
sym

Full Publication List

[*] denotes co-first authors, [#] denotes co-supervised, [✉] denotes corresponding author,

Spoken Dialogue System & Audio-Visual Speech Understanding

Multi-Modal Alignment

📖 Educations

  • 2021.09 - 2026.06, Doctor, Zhejiang University, Hangzhou.

  • 2017.09 - 2021.06, Undergraduate, Shandong Univeristy, Jinan.

🎖 Honors and Awards

  • National Scholarship (2023, Grauate student). Top 0.1% in Zhejiang University.
  • Excellent Graduate, Shandong Province (2021), Top 1%.
  • Outstanding Student Cadres (2017-2021 in Shandong University and 2021-2023 in Zhejiang University), Top 1%.
  • Academic Scholarship (2017-2021 in Shandong University and 2021-2023 in Zhejiang University), Top3%.
  • Outstanding Graduate Student & Triple Excellence Graduate Student(2022) in Zhejiang University.
  • First Prize (Meritorious Winner) in American Mathematical Modeling Competition (2019), Top 7% worldwide.
  • First Prize of National Mathematical Modeling Competition in Shandong Province (2018).

💬 Professional Services

  • Conference Reviewer: ARR 2023, ICCV 2023, ACL 2023
  • Assist to Review: KDD 2022, TNNLS 2022, TMM 2022, TMM 2023

💻 Internships & Projects

  • 2024.01- 2024.09: Research Intern: Alibaba, DAMO Academy, Tongyi Lab at Hangzhou, China.
    Research on Audio-Visual Sound Separation and Spoken Dialogue System.
  • 2021.02 - 2021.08: Algorithm Engineer Intern: Taobao(China) Software
    Research on Multi-modality Interaction.