CN-Celeb-AV is a multi-genre audio-visual person recognition dataset covering 11 different genres in the real world, collected from multiple Chinese open media sources.
1,136
SpeakersCN-Celeb-AV contains speech from Chinese celebrities.
419,000 +
UtterancesCN-Celeb-AV covers multiple genres of speech, including entertainment, interview, singing, play, movie, vlog, live broadcast, speech, drama, recitation and advertisement.
660 +
HoursCN-Celeb-AV consists of both full-modality and partial-modality challenges which meet the scenarios of most real applications.
A development set with full-modality information, contains both audio and visual information.
An evaluation set with full-modality information, contains both audio and visual information.
An evaluation set with partial-modality information, contains some segments whose audio or visual information is corrupted or fully lost.
The dataset consists of three subsets, Dev-F, Eval-F and Eval-P. For each subset, we provide video and audio files and speaker meta-data. There is no overlap among the three subsets. Dev-F contains more than 93,000 segments from 689 Chinese celebrities, Eval-F contains more than 17,000 segments from 197 Chinese celebrities, and Eval-P contains more than 307,900 segments from 250 Chinese celebrities.
All the resources contained in the dataset are free for research institutes and individuals. The copyright remains with the original owners of the audio/video.
No commerical usage is permitted.
Please register and log in to the CN-Celeb system, and then submit the data license to request the data.
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants No.62171250.