skip to main content
research-article

Enabling Graph Neural Networks for Semi-Supervised Risk Prediction in Online Credit Loan Services

Published: 16 January 2024 Publication History

Abstract

Graph neural networks (GNNs) are playing exciting roles in the application scenarios where features are hidden in information associations. Fraud prediction of online credit loan services (OCLSs) is such a typical scenario. But it has another rather critical challenge, i.e., the scarcity of data labels. Fortunately, GNNs can also cope with this problem due to their good ability of semi-supervised learning by mining structure and feature information within graphs. Nevertheless, the gain of internal information is often too limited to help GNNs handle the extreme deficiency of labels with high performance beyond the basic requirement of fraud prediction in OCLSs. Therefore, adding labels from the experts, such as manually adding labels through rules, has become a logical practice. However, the existing rule engines for OCLSs have the confliction problem among continuously accumulated rules. To address this issue, we propose a Snorkel-based Semi-Supervised GNN (S3GNN). Under S3GNN, we specially design an upgraded version of the rule engines, called Graph-Oriented Snorkel (GOS), a graph-specific extension of Snorkel, a widely used weakly supervised learning framework, to design rules by subject matter experts (SMEs) and resolve confliction. In particular, in the graph of an anti-fraud scenario, each node pair may have multiple different types of edges, so we propose the Multiple Edge-Types Based Attention mechanism. In general, for the heterogeneous information and multiple relations in the graph, we first obtain the embedding of applicant nodes by aggregating the representation of attribute nodes, and then use the attention mechanism to aggregate neighbor nodes on multiple meta-paths to get ultimate applicant node embedding. We conduct experiments over the real-life data of a large financial platform. The results demonstrate that S3GNN can outperform the state-of-the-art methods, including the method of pilot platform.

References

[1]
Dmitrii Babaev, Maxim Savchenko, Alexander Tuzhilin, and Dmitrii Umerenkov. 2019. ET-RNN: Applying deep learning to credit loan applications. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2183–2190.
[2]
Stephen H. Bach, Bryan He, Alexander Ratner, and Christopher Ré. 2017. Learning the structure of generative models without labeled data. In International Conference on Machine Learning. PMLR, 273–282.
[3]
Siddhartha Bhattacharyya, Sanjeev Jha, Kurian Tharakunnel, and J. Christopher Westland. 2011. Data mining for credit card fraud: A comparative study. Decision Support Systems 50, 3 (2011), 602–613.
[4]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146.
[5]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral networks and locally connected networks on graphs. arXiv:1312.6203. https://arxiv.org/abs/2310.01668
[6]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Review of “Semi-supervised learning” (O. Chapelle, B. Schölkopf, and A. Zien (Eds.). IEEE Transactions on Neural Networks 20, 3 (2009), 542–542.
[7]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.
[8]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of ACM KDD 2016. 785–794.
[9]
Dawei Cheng, Zhibin Niu, Yi Tu, and Liqing Zhang. 2018. Prediction defaults for networked-guarantee loans. In Proceedings of IEEE ICPR 2018. 361–366.
[10]
Dawei Cheng, Yiyi Zhang, Fangzhou Yang, Yi Tu, Zhibin Niu, and Liqing Zhang. 2019. A dynamic default prediction framework for networked-guarantee loans. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2547–2555.
[11]
Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31, 5 (2018), 833–852.
[12]
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S. Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 315–324.
[13]
Jason A. Fries, Ethan Steinberg, Saelig Khattar, Scott L. Fleming, Jose Posada, Alison Callahan, and Nigam H. Shah. 2021. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nature Communications 12, 1 (2021), 1–11.
[14]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems. 1024–1034.
[15]
Frederick Hayes-Roth. 1985. Rule-based systems. Communications of the ACM 28, 9 (1985), 921–932.
[16]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[17]
Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, and Jieping Ye. 2020. An attention-based graph neural network for heterogeneous structural learning. In Proceedings of AAAI 2020. 4132–4139.
[18]
Binbin Hu, Zhiqiang Zhang, Jun Zhou, Jingli Fang, Quanhui Jia, Yanming Fang, Quan Yu, and Yuan Qi. 2020. Loan default analysis with multiplex graph learning. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2525–2532.
[19]
Tianyu Hu, Qinglai Guo, Xinwei Shen, Hongbin Sun, Rongli Wu, and Haoning Xi. 2019. Utilizing unlabeled data to detect electricity fraud in AMI: A semisupervised deep learning approach. IEEE Transactions on Neural Networks and Learning Systems 30, 11 (2019), 3287–3299.
[20]
Mengda Huang, Yang Liu, Xiang Ao, Kuan Li, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2022. AUC-oriented graph neural network for fraud detection. In Proceedings of the ACM Web Conference 2022. 1311–1321.
[21]
Xuanwen Huang, Yang Yang, Yang Wang, Chunping Wang, Zhisheng Zhang, Jiarong Xu, Lei Chen, and Michalis Vazirgiannis. 2022. Dgraph: A large-scale financial dataset for graph anomaly detection. In Proceedings of the 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
[22]
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. 2019. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5070–5079.
[23]
James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA ’15). IEEE, 1–10.
[24]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. https://arxiv.org/abs/1609.02907
[25]
Donald E. Knuth. 1992. Two notes on notation. The American Mathematical Monthly 99, 5 (1992), 403–422.
[26]
Sangho Lee and Jong Kim. 2013. Warningbird: A near real-time detection system for suspicious URLs in Twitter stream. IEEE Transactions on Dependable and Secure Computing 10, 3 (2013), 183–195.
[27]
Yok Yong Lee, Mohd Hisham Dato Haji Yahya, Muzafar Shah Habibullah, and Zariyawati Mohd Ashhari. 2019. Non-performing loans in European Union: Country governance dimensions. Journal of Financial Economic Policy 12, 2 (2020), 209–226.
[28]
Qiutong Li, Yanshen He, Cong Xu, Feng Wu, Jianliang Gao, and Zhao Li. 2022. Dual-Augment graph neural network for fraud detection. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 4188–4192.
[29]
Xiangfeng Li, Shenghua Liu, Zifeng Li, Xiaotian Han, Chuan Shi, Bryan Hooi, He Huang, and Xueqi Cheng. 2020. FlowScope: Spotting money laundering based on graphs. In Proceedings of AAAI. 4731–4738.
[30]
Zhao Li, Haishuai Wang, Peng Zhang, Pengrui Hui, Jiaming Huang, Jian Liao, Ji Zhang, and Jiajun Bu. 2021. Live-streaming fraud detection: A heterogeneous graph neural network approach. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3670–3678.
[31]
Chen Liang, Ziqi Liu, Bin Liu, Jun Zhou, Xiaolong Li, Shuang Yang, and Yuan Qi. 2019. Uncovering insurance fraud conspiracy with network learning. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1181–1184.
[32]
Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2018. Attributed social network embedding. IEEE Transactions on Knowledge and Data Engineering 30, 12 (2018), 2257–2270.
[33]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of ICCV 2017. 2980–2988.
[34]
Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2021. Pick and choose: A GNN-based imbalanced learning approach for fraud detection. In Proceedings of the Web Conference 2021. 3168–3177.
[35]
Yang Liu, Xiang Ao, Qiwei Zhong, Jinghua Feng, Jiayu Tang, and Qing He. 2020. Alike and unlike: Resolving class imbalance problem in financial credit risk assessment. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2125–2128.
[36]
Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song. 2018. Heterogeneous graph neural networks for malicious account detection. In Proceedings of ACM CIKM. 2077–2085.
[37]
Xiangfeng Meng, Yunhai Tong, Xinhai Liu, Yiren Chen, and Shaohua Tan. 2017. NetRating: Credit risk evaluation for loan guarantee chain in China. Pacific-Asia Workshop on Intelligence and Security Informatics. Springer, 99–108.
[38]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 701–710.
[39]
Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Zhiyao Chen, Yinan Shan, Yang Zhao, and Ce Zhang. 2020. xFraud: Explainable fraud transaction detection on heterogeneous graphs. arXiv:2011.12193. https://arxiv.org/abs/2011.12193
[40]
Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. In Proceedings of VLDB 2017, Vol. 11, 269.
[41]
Pediredla Ravisankar, Vadlamani Ravi, G. Raghava Rao, and Indranil Bose. 2011. Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems 50, 2 (2011), 491–500.
[42]
Ke Ren and Avinash Malik. 2019. Investment recommendation system for low-liquidity online peer to peer lending (P2PL) marketplaces. In Proceedings of ACM WSDM 2019. 510–518.
[43]
José A. Sáez, Julián Luengo, Jerzy Stefanowski, and Francisco Herrera. 2015. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences 291 (2015), 184–203.
[44]
Burr Settles. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison, Department of Computer Sciences.
[45]
Mohammad Ahmad Sheikh, Amit Kumar Goel, and Tapas Kumar. 2020. An approach for prediction of loan approval using machine learning algorithm. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC ’20). IEEE, 490–494.
[46]
Guillermo Suarez-Tangil, Matthew Edwards, Claudia Peersman, Gianluca Stringhini, Awais Rashid, and Monica Whitty. 2019. Automatically dismantling online dating fraud. IEEE Transactions on Information Forensics and Security 15 (2019), 1128–1137.
[47]
Ke Sun, Zhouchen Lin, and Zhanxing Zhu. 2020. Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes. In Proceedings of AAAI 2020. 5892–5899.
[48]
Wanlin Sun, Ming Chen, Jie-xia Ye, Yuhang Zhang, Cheng-zhong Xu, Yangqing Zhang, Yaonan Wang, Wen Wu, Peng Zhang, and Feipeng Qu. 2019. Semi-supervised anti-fraud models for cash pre-loan in internet consumer finance. In Proceedings of ICPS 2019. 635–640.
[49]
Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. 2022. Rethinking graph neural networks for anomaly detection. In Proceedings of the International Conference on Machine Learning. PMLR, 21076–21089.
[50]
Jun Tang and Jian Yin. 2005. Developing an intelligent data discriminating system of anti-money laundering based on SVM. In Proceedings of ICMLC 2005, Vol. 6. 3453–3457.
[51]
Liang Tong, Bo Li, Chen Hajaj, Chaowei Xiao, Ning Zhang, and Yevgeniy Vorobeychik. 2019. Improving robustness of ML classifiers against realizable evasion attacks using conserved features. In Proceedings of the 28th USENIX Security Symposium (USENIX Security ’19). 285–302.
[52]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv:1710.10903. https://arxiv.org/abs/1710.10903
[53]
Cheng Wang and Hangyu Zhu. 2020. Representing fine-grained co-occurrences for behavior-based fraud detection in online payment services. IEEE Transactions on Dependable and Secure Computing 19, 1 (2022), 301–315.
[54]
Daixin Wang, Jianbin Lin, Peng Cui, Quanhui Jia, Zhen Wang, Yanming Fang, Quan Yu, Jun Zhou, Shuang Yang, and Yuan Qi. 2019. A semi-supervised graph attentive network for financial fraud detection. In Proceedings of IEEE ICDM 2019. 598–607.
[55]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In Proceedings of WWW 2019. 2022–2032.
[56]
Yang Wang, Wenchun Wang, and Jiaojiao Wang. 2017. Credit risk management framework for rural commercial banks in China. Journal of Financial Risk Management 6, 01 (2017), 48.
[57]
Yuchen Wang, Jinghui Zhang, Zhengjie Huang, Weibin Li, Shikun Feng, Ziheng Ma, Yu Sun, Dianhai Yu, Fang Dong, Jiahui Jin, et al. 2023. Label information enhanced fraud detection against low homophily in graphs. In Proceedings of the ACM Web Conference 2023. 406–416.
[58]
Jiele Wu, Chunhui Zhang, Zheyuan Liu, Erchi Zhang, Steven Wilson, and Chuxu Zhang. 2022. GraphBERT: Bridging graph and text for malicious behavior detection on social media. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM ’22). IEEE, 548–557.
[59]
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Christopher Ré. 2018. Fonduer: Knowledge base construction from richly formatted data. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). 1301–1316.
[60]
Bingbing Xu, Huawei Shen, Bingjie Sun, Rong An, Qi Cao, and Xueqi Cheng. 2021. Towards consumer loan fraud detection: Graph neural networks with role-constrained conditional random field. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4537–4545.
[61]
Yuto Yamaguchi and Kohei Hayashi. 2017. When does label propagation fail? A view from a network generative model. In Proceedings of the IJCAI. 3224–3230.
[62]
Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J. Kim. 2019. Graph transformer networks. In Proceedings of the Advances in Neural Information Processing Systems. 11983–11993.
[63]
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous graph neural network. In Proceedings of ACM KDD 2019. 793–803.
[64]
Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arxiv:1812.08434. https://arxiv.org/abs/1812.08434
[65]
Zhi-Hua Zhou. 2018. A brief introduction to weakly supervised learning. National Science Review 5, 1 (2018), 44–53.
[66]
Xiaojin Zhu, Zoubin Ghahramani, and John D. Lafferty. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of ICML 2003. 912–919.

Cited By

View all
  • (2024)Behavioral Unicity: On the Limits of Anonymized Social Behavior MetadataBlockchain and Web3.0 Technology Innovation and Application10.1007/978-981-97-9412-6_1(1-12)Online publication date: 3-Nov-2024

Index Terms

  1. Enabling Graph Neural Networks for Semi-Supervised Risk Prediction in Online Credit Loan Services

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 1
      February 2024
      533 pages
      EISSN:2157-6912
      DOI:10.1145/3613503
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 January 2024
      Online AM: 21 September 2023
      Accepted: 20 July 2023
      Revised: 16 July 2023
      Received: 20 March 2022
      Published in TIST Volume 15, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Fraud prediction
      2. graph neural networks
      3. weak supervision
      4. online credit loan services

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China (NSFC)
      • Program of Shanghai Academic Research Leader
      • National Key Research and Development Program of China
      • Shanghai Science and Technology Innovation Action Plan Project
      • Fundamental Research Funds for the Central Universities
      • Open Fund of Key Laboratory of Industrial Internet of Things and Networked Control, Ministry of Education

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)573
      • Downloads (Last 6 weeks)34
      Reflects downloads up to 12 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Behavioral Unicity: On the Limits of Anonymized Social Behavior MetadataBlockchain and Web3.0 Technology Innovation and Application10.1007/978-981-97-9412-6_1(1-12)Online publication date: 3-Nov-2024

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media