skip to main content
research-article

Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive Processing

Published: 18 June 2024 Publication History

Abstract

Graph-based semi-supervised learning plays an important role in large scale image classification tasks. However, the problem becomes very challenging in the presence of noisy labels and outliers. Moreover, traditional robust semi-supervised learning solutions suffers from prohibitive computational burdens thus cannot be computed for streaming data. Motivated by that, we present a novel unified framework robust structure-aware semi-supervised learning called Unified RSSL (URSSL) for batch processing and recursive processing robust to both outliers and noisy labels. Particularly, URSSL applies joint semi-supervised dimensionality reduction with robust estimators and network sparse regularization simultaneously on the graph Laplacian matrix iteratively to preserve the intrinsic graph structure and ensure robustness to the compound noise. First, in order to relieve the influence from outliers, a novel semi-supervised robust dimensionality reduction is applied relying on robust estimators to suppress outliers. Meanwhile, to tackle noisy labels, the denoised graph similarity information is encoded into the network regularization. Moreover, by identifying strong relevance of dimensionality reduction and network regularization in the context of robust semi-supervised learning (RSSL), a two-step alternative optimization is derived to compute optimal solutions with guaranteed convergence. We further derive our framework to adapt to large scale semi-supervised learning particularly suitable for large scale image classification and demonstrate the model robustness under different adversarial attacks. For recursive processing, we rely on reparameterization to transform the formulation to unlock the challenging problem of robust streaming-based semi-supervised learning. Last but not least, we extend our solution into distributed solutions to resolve the challenging issue of distributed robust semi-supervised learning when images are captured by multiple cameras at different locations. Extensive experimental results demonstrate the promising performance of this framework when applied to multiple benchmark datasets with respect to state-of-the-art approaches for important applications in the areas of image classification and spam data analysis.

References

[1]
E. Adeli, K. Thung, L. An, F. Shi, and D. Shen. 2018. Semi-supervised discriminative classification robust to sample-outliers and feature-noises. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 515–522.
[2]
E. Arazo, D. Ortego, P. Albert, and N. Connor. 2019. Unsupervised label noise modeling and loss correction. International Conference on Machine Learning 97 (2019), 312–321.
[3]
A. Beck and M. Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2, 1 (2009), 183–202.
[4]
X. Chang, S. Lin, and D. Zhou. 2017. Distributed semi-supervised learning with kernel ridge regression. Journal of Machine Learning Research 18, 46 (2017), 1–22.
[5]
P. Chen, B. Liao, G. Chen, and S. Zhang. 2019. Understanding and utilizing deep neural networks trained with noisy labels. International Conference on Machine Learning 97 (2019), 1062–1070.
[6]
X. Chen and B. Wujek. 2020. AutoDAL: Distributed active learning with automatic hyperparameter selection. AAAI 34, 4 (2020), 3537–3544.
[7]
C. Cong and H. Zhang. 2017. Learning with inadequate and incorrect supervision. International Conference on Data Mining (2017), 889–894.
[8]
Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. 2014. Boosting adversarial attacks with momentum. IEEE Conference Computer Vision and Pattern Recognition (2014), 9185–9193.
[9]
H. Gan. 2017. A noise-robust semi-supervised dimensionality reduction method for face recognition. International Journal of Lights and Electron Optics 157 (2017), 858–865.
[10]
W. Gao, L. Wang, and Y. Li. 2016. Risk minimization in the presence of label noise. AAAI 30, 1 (2016), 1112–1120.
[11]
P. Garrigues and L. Ghaoui. 2008. An homotopy algorithm for the lasso with online observations. Neural Information Processing Systems 21 (2008), 3531–3539.
[12]
L. Guo, Y. Zhang, Z. Wu, J. Shao, and Y. Li. 2022. Robust semi-supervised learning when not all classes have labels. Neural Information Processing Systems 35 (2022), 3305–3317.
[13]
W. Hardle. 1992. Applied nonparametric regression. Cambridge University Press (1992), 126–140.
[14]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Identity mapping in deep residual network. ECCV (2016), 630–645.
[15]
R. He, W. Zheng, and B. Hu. 2011. Maximum correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8 (2011), 1561–1576.
[16]
Z. Huang, C. Xue, B. Han, J. Yang, and C. Gong. 2021. Universal semi-supervised learning. Neural Information Processing Systems 34 (2021), 26714–26725.
[17]
P. Huber. 2011. Robust statistics. Interna-tional Encyclopedia of Statistical Science (2011), 131–145.
[18]
I. Goodfellow, J. Shlens, and C. Szegedy. 2015. Explaining and harnessing adversarial examples. International Conference on Learning Representations (ICLR’15).
[19]
L. Jia, Z. Zhang, L. Wang, W. Jiang, and M. Zhao. 2016. Adaptive neighborhood propagation by joint L2,1-norm regularized sparse coding for representation and classification. IEEE 16th International Conference on Data Mining (ICDM’16). 201–210.
[20]
L. Jiang, Z. Zhou, T. Leung, L. Li, and L. Fei-Fei. 2018. Mentornet: Learning datadriven curriculum for very deep neural networks on corrupted labels. International Conference on Machine Learning (2018), 2304–2313.
[21]
J. A. Lee and M. Verleysen. 2011. Shift-invariant similarities circumvent distance concentration in stochastic neighbor embedding and variants. International Conference on Computational Science 4 (2011), 538–547.
[22]
J. Li, R. Socher, and S. Hoi. 2020. DIVIDEMIX: Learning with noisy labels as semi-supervised learning. International Confernce on Learning Respresentation 2002, 07394 (2020).
[23]
J. Li, Y. Wong, Q. Zhao, and M. Kankanhalli. 2019. Learning to learn from noisy labeled data. IEEE Conference on Computer Vision and Pattern Recognition (2019), 5051–5059.
[24]
Y. Li, Y. Wang, Q. Liu, C. Bi, X. Jiang, and S. Sun. 2019. Incremental semi-supervised learning on streaming data. Pattern Recognition 88 (2019), 383–396.
[25]
Y.-F. Li, J. T. Kwok, and Z.-H. Zhou. 2009. Semi-supervised learning using label mean. ICML (2009), 633–640.
[26]
T. Liu and D. Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 447–461.
[27]
Y. Liu and H. Guo. 2020. Peer loss functions-learning from noisy labels without knowing noise rates. International Conference on Machine Learning (2020), 6226–6236.
[28]
Z. Lu, X. Gao, L. Wang, J. Wen, and S. Huang. 2015. Noise-robust semi-supervised learning by large-scale sparse coding. AAAI Conference on Artificial Intelligence 29, 1 (2015), 2828–2834.
[29]
L. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Retrieved from https://arxiv.org/abs/1706.06083
[30]
E. Malach and S. Shwartz. 2017. Decoupling “when to update” from “how to update”. Neural Information Processing Systems (2017), 961–971.
[31]
Q. Mao, L. Wang, and Y. Sun. 2015. Dimensionality reduction via graph structure learning. KDD (2015), 765–774.
[32]
Q. Mao, L. Wang, I. W. Tsang, and Y. Sun. 2016. Principal graph and structure learning based on reversed graph embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 11 (2016), 2227–2241.
[33]
K. Murty. 1988. Linear complementarity, linear and nonlinear programming. Sigma Series in Applied Mathematics (1988), 178–206.
[34]
A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao. 2019. Adversarial defense by restricting the hidden space of deep neural networks. ICCV (2019), 3385–3394.
[35]
M. Nikolova and M. Ng. 2005. Analysis of half-quadratic minimization methods for signal and image recovery. SIAM Journal on Scientific computing 27, 3 (2005), 937–966.
[36]
G. Patrini, F. Nielsen, R. Nock, and M. Carinoi. 2017. Loss factorization, weakly supervised learning and label noise robustness. International Conference on Machine Learning (2017), 708–717.
[37]
G. Patrini, A. Rozza, A. Menon, R. Nock, and L. Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. IEEE Conference on Computer Vision and Pattern Recognition (2017), 1944–1952.
[38]
B. Sun, N. Tsai, F. Liu, R. Yu, and H. Su. 2021. Adversarial defense by stratified convolutional sparse coding. CVPR (2021), 11439–11448.
[39]
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI 31, 1 (2017), 4278–4284.
[40]
Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018. Joint optimization framework for learning with noisy labels. International Conference on Computer Vision and Patterson Recognition (2018), 5552–5560.
[41]
T. Wagner, S. Guha, S. P. Kasiviswanathan, and N. Mishra. 2018. Semi-supervised learning on data streams via temporal label propagation. International Conference on Machine Learning (2018), 5095–5104.
[42]
J. Wang, T. Jebara, and S. Chang. 2008. Graph transduction via alternating minimization. ICML (2008), 1144–1151.
[43]
M. Wang, W. Fu, S. Hao, and D. Tao. 2016. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data Engineering 28, 7 (2016), 1864–1877.
[44]
Y. Wang, J. Sharpnack, and R. Tibshirani. 2016. Trend filtering on graphs. Journal of Machine Learning Research 17, 105 (2016), 1–41.
[45]
X.Chen. 2022. Robust structure-aware semi-supervised learning. International Conference on Data Mining (2022), 41–50.
[46]
K. Xu, M. Kliger, Y. Chen, P. Woolf, and A. Hero. 2009. Social networks of spammers through spectral clustering. IEEE International Conference on Communications (2009), 1–6.
[47]
Y. Xu, J. Ding, L. Zhang, and S. Zhou. 2021. DP-SSL: Towards robust semi-supervised learning with a few labeled samples. Neural Information Processing Systems 34 (2021), 15895–15907.
[48]
Y. Yan, Z. Xu, and I. Tsang. 2016. Robust semi-supervised learning through lable aggregation. AAAI Conference on Artificial Intelligence 30, 1 (2016), 2244–2250.
[49]
Z. Yan, G. Li, Y. Tian, J. Wu, S. Li, M. Chen, and H. Poor. 2021. DeHiB: Deep hidden backdoor attack on semi-supervised learning via adversarial perturbation. AAAI 35, 12 (2021), 10585–10593.
[50]
Y. Yao, T. Liu, B. Han, M. Gong, J. Deng, G. Niu, and M. Sugiyama. 2020. Dual T: Reducing estimation error for transition matrix in label-noise learning. NeurIPS 33 (2020), 7260–7271.
[51]
K. Yi and J. Wu. 2019. Probabilistic end-to-end noise correction for learning with noisy labels. IEEE Conference on Computer Vision and Pattern Recognition (2019), 7017–7025.
[52]
J. Yoon and S. Hwang. 2017. Combined group and exclusive sparsity for deep neural networks. International Conference on Machine Learning (2017), 3958–3966.
[53]
X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M.Sugiyama. 2019. How does disagreement help generalization against label corruption? International Conference on Machine Learning (2019), 7164–7173.
[54]
M. Yurochkin, S. Upadhyay, D. Bouneffouf, M. Agarwal, and Y. Khazaeni. 2019. Online semi-supervised learning with bandit feedback. International Conference on Learning Representation Workshop (2019), 407–419.
[55]
K. Zhang, J. T. Kwok, and B. Parvin. 2009. Prototype vector machine for large scale semi-supervised learning. International Conference on Machine Learning (2009), 1233–1240.
[56]
Z. Zhang, F. Li, L. Jia, J. Qin, L. Zhang, and S. Yan. 2018. Robust adaptive embedded label propagation with weight learning for inductive classification. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3388–3403.
[57]
X. Zhu, C. Lei, and H. Yu. 2018. Robust graph dimensionality reduction. International Joint Conference on Artificial Intelligence (2018), 3257–3263.
[58]
Y. Zhu and Y. Li. 2020. Semi-supervised streaming learning with emerging new labels. AAAI 34, 4 (2020), 7015–7022.

Cited By

View all
  • (2025)Local interpretable spammer detection model with multi-head graph channel attention networkNeural Networks10.1016/j.neunet.2024.107069184(107069)Online publication date: Apr-2025
  • (2024)CCFD: Efficient Credit Card Fraud Detection Using Meta-Heuristic Techniques and Machine Learning AlgorithmsMathematics10.3390/math1214225012:14(2250)Online publication date: 19-Jul-2024

Index Terms

  1. Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive Processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 4
    August 2024
    563 pages
    EISSN:2157-6912
    DOI:10.1145/3613644
    • Editor:
    • Huan Liu
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2024
    Online AM: 26 March 2024
    Accepted: 12 March 2024
    Revised: 18 February 2024
    Received: 07 August 2023
    Published in TIST Volume 15, Issue 4

    Check for updates

    Author Tags

    1. Embedding
    2. uncertainty
    3. robustness
    4. network regularization
    5. streaming processing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)200
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Local interpretable spammer detection model with multi-head graph channel attention networkNeural Networks10.1016/j.neunet.2024.107069184(107069)Online publication date: Apr-2025
    • (2024)CCFD: Efficient Credit Card Fraud Detection Using Meta-Heuristic Techniques and Machine Learning AlgorithmsMathematics10.3390/math1214225012:14(2250)Online publication date: 19-Jul-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media