skip to main content
research-article

WC-SBERT: Zero-Shot Topic Classification Using SBERT and Light Self-Training on Wikipedia Categories

Published: 26 October 2024 Publication History

Abstract

In natural language processing (NLP), zero-shot topic classification requires machines to understand the contextual meanings of texts in a downstream task without using the corresponding labeled texts for training, which is highly desirable for various applications. In this article, we propose a novel approach to construct a zero-shot task-specific model called WC-SBERT with satisfactory performance. The proposed approach is highly efficient since it uses light self-training requiring target labels (target class names of downstream tasks) only, which is distinct from other research that uses both the target labels and the unlabeled texts for training. In particular, during the pre-training stage, WC-SBERT uses contrastive learning with multiple negative ranking losses to construct the pre-trained model based on the similarity between Wiki categories. For the self-training stage, online contrastive loss is utilized to reduce the distance between a target label and Wiki categories of similar Wiki pages to the label. Experimental results indicate that compared to existing self-training models, WC-SBERT achieves rapid inference on approximately 6.45 million Wiki text entries by utilizing pre-stored Wikipedia text embeddings, significantly reducing inference time per sample by a factor of 2,746 to 16,746. During the fine-tuning step, the time required for each sample is reduced by a factor of 23–67. Overall, the total training time shows a maximum reduction of 27.5 times across different datasets. Most importantly, our model has achieved state-of-the-art (SOTA) accuracy on two of the three commonly used datasets for evaluating zero-shot classification, namely the AG News (0.84) and Yahoo! Answers (0.64) datasets. The code for WC-SBERT is publicly available on GitHub, and the dataset can also be accessed on Hugging Face.

References

[1]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web. Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.), Springer, Berlin, 722–735.
[2]
Lorenzo Bongiovanni, Luca Bruno, Fabrizio Dominici, and Giuseppe Rizzo. 2023. Zero-Shot Taxonomy Mapping for Document Classification. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, 911–918.
[3]
Ming-Wei Chang, Lev Ratinov, Dan Roth, and Vivek Srikumar. 2008. Importance of Semantic Representation: Dataless Classification. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI ’08), Vol. 2, AAAI Press, 830–835.
[4]
Qianben Chen, Richong Zhang, Yaowei Zheng, and Yongyi Mao. 2022. Dual contrastive learning: Text classification via label-aware data augmentation. arXiv:2201.08702. Retrieved from https://arxiv.org/abs/2201.08702
[5]
Zewei Chu, Karl Stratos, and Kevin Gimpel. 2020. NatCat: Weakly supervised text classification with naturally annotated datasets. arXiv:2009.14335. Retrieved from https://arxiv.org/abs/2009.14335
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, ACL, 4171–4186. DOI:
[7]
Hantian Ding, Jinrui Yang, Yuqian Deng, Hongming Zhang, and Dan Roth. 2022. Towards Open-Domain Topic Classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations. ACL, 90–98. DOI:
[8]
Ariel Gera, Alon Halfon, Eyal Shnarch, Yotam Perlitz, Liat Ein-Dor, and Noam Slonim. 2022. Zero-Shot Text Classification with Self-Training. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. ACL, 1107–1119. Retrieved from https://aclanthology.org/2022.emnlp-main.73
[9]
Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, László Lukács, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient natural language response suggestion for smart reply. arXiv:1705.00652. Retrieved from https://arxiv.org/abs/1705.00652
[10]
Chaoqun Liu, Wenxuan Zhang, Guizhen Chen, Xiaobao Wu, Anh Tuan Luu, Chip Hong Chang, and Lidong Bing. 2023. Zero-shot text classification via self-supervised tuning. arXiv:2305.11442. Retrieved from https://arxiv.org/abs/2305.11442
[11]
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 61–68.
[12]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from http://arxiv.org/abs/1907.11692
[13]
Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, and Ian Tenney. 2020. What happens to BERT embeddings during fine-tuning? arXiv:2004.14448. Retrieved from https://arxiv.org/abs/2004.14448
[14]
Tomas Mikolov, Kai Chen, G. S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the Workshop at ICLR 2013. Retrieved from https://www.researchgate.net/publication/234131319_Efficient_Estimation_of_Word_Representations_in_Vector_Space
[15]
Alejandro Peña, Aythami Morales, Julian Fierrez, Ignacio Serna, Javier Ortega-Garcia, Iñigo Puente, Jorge Cordova, and Gonzalo Cordova. 2023. Leveraging large language models for topic classification in the domain of public affairs. In Proceedings of the International Conference on Document Analysis and Recognition. Springer, 20–33.
[16]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1532–1543.
[17]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. Retrieved from https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
[18]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. ACL. Retrieved from http://arxiv.org/abs/1908.10084
[19]
Timo Schick and Hinrich Schütze. 2020. Exploiting cloze questions for few shot text classification and natural language inference. arXiv:2001.07676. Retrieved from https://arxiv.org/abs/2001.07676
[20]
Sonish Sivarajkumar and Yanshan Wang. 2022. HealthPrompt: A zero-shot learning paradigm for clinical natural language processing. arXiv:2203.05061. Retrieved from https://arxiv.org/abs/2203.05061
[21]
Mozes van de Kar, Mengzhou Xia, Danqi Chen, and Mikel Artetxe. 2022. Don’t Prompt, Search! Mining-based Zero-Shot Learning with Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. ACL, 7508–7520. DOI: https://aclanthology.org/2022.emnlp-main.509
[22]
Yuki Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, and Suparna De. 2023. Prompt-based Zero-shot Text Classification with Conceptual Knowledge. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), Vol. 4, ACL, 30–38.
[23]
Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, and Tetsuya Sakai. 2022. Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. ACL, 7042–7055. https://aclanthology.org/2022.emnlp-main.474
[24]
Xiangli Yang, Zixing Song, Irwin King, and Zenglin Xu. 2021. A survey on deep semi-supervised learning. arXiv:2103.00550. Retrieved from https://arxiv.org/abs/2103.00550
[25]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems. H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf
[26]
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, Curran Associates, Inc., 17283–17297. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2020/file/c8512d142a2d849725f31a9a7a361ab9-Paper.pdf
[27]
Ruohong Zhang, Yau-Shian Wang, and Yiming Yang. 2023. Generation-driven contrastive self-training for zero-shot text classification with instruction-tuned GPT. arXiv:2304.11872. Retrieved from https://arxiv.org/abs/2304.11872
[28]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-Level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf
[29]
Yang Zou, Zhiding Yu, B. V. K. Vijaya Kumar, and Jinsong Wang. 2018. Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training. In Proceedings of the European Conference on Computer Vision (ECCV).

Index Terms

  1. WC-SBERT: Zero-Shot Topic Classification Using SBERT and Light Self-Training on Wikipedia Categories

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 5
          October 2024
          719 pages
          EISSN:2157-6912
          DOI:10.1145/3613688
          • Editor:
          • Huan Liu
          Issue’s Table of Contents

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 26 October 2024
          Online AM: 18 July 2024
          Accepted: 06 July 2024
          Revised: 14 June 2024
          Received: 04 January 2024
          Published in TIST Volume 15, Issue 5

          Check for updates

          Author Tags

          1. Zero-shot topic classification
          2. SBERT
          3. Wikipedia
          4. Self-training
          5. Contrastive learning
          6. Knowledge graph
          7. LLM

          Qualifiers

          • Research-article

          Funding Sources

          • Higher Education Sprout Project by the Ministry of Education

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 181
            Total Downloads
          • Downloads (Last 12 months)181
          • Downloads (Last 6 weeks)35
          Reflects downloads up to 12 Feb 2025

          Other Metrics

          Citations

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          Full Text

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media