Nearest neighborhood-based deep clustering for source data-absent unsupervised domain adaptation

S Tang, Y Yang, Z Ma, N Hendrich, F Zeng… - arXiv preprint arXiv …, 2021 - arxiv.org
S Tang, Y Yang, Z Ma, N Hendrich, F Zeng, SS Ge, C Zhang, J Zhang
arXiv preprint arXiv:2107.12585, 2021arxiv.org
In the classic setting of unsupervised domain adaptation (UDA), the labeled source data are
available in the training phase. However, in many real-world scenarios, owing to some
reasons such as privacy protection and information security, the source data is inaccessible,
and only a model trained on the source domain is available. This paper proposes a novel
deep clustering method for this challenging task. Aiming at the dynamical clustering at
feature-level, we introduce extra constraints hidden in the geometric structure between data …
In the classic setting of unsupervised domain adaptation (UDA), the labeled source data are available in the training phase. However, in many real-world scenarios, owing to some reasons such as privacy protection and information security, the source data is inaccessible, and only a model trained on the source domain is available. This paper proposes a novel deep clustering method for this challenging task. Aiming at the dynamical clustering at feature-level, we introduce extra constraints hidden in the geometric structure between data to assist the process. Concretely, we propose a geometry-based constraint, named semantic consistency on the nearest neighborhood (SCNNH), and use it to encourage robust clustering. To reach this goal, we construct the nearest neighborhood for every target data and take it as the fundamental clustering unit by building our objective on the geometry. Also, we develop a more SCNNH-compliant structure with an additional semantic credibility constraint, named semantic hyper-nearest neighborhood (SHNNH). After that, we extend our method to this new geometry. Extensive experiments on three challenging UDA datasets indicate that our method achieves state-of-the-art results. The proposed method has significant improvement on all datasets (as we adopt SHNNH, the average accuracy increases by over 3.0% on the large-scaled dataset). Code is available at https://github.com/tntek/N2DCX.
arxiv.org