Skip to content

A Systematic Survey of Sparse Matrix Vector Multiplication

License

Notifications You must be signed in to change notification settings

double-flower/SpMV-survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 

Repository files navigation

A Systematic Survey of Sparse Matrix Vector Multiplication

Jianhua Gao1, Bingjie Liu2, Weixing Ji1, Hua Huang1
1School of Artificial Intelligence, Beijing Normal University
2School of Computer Science and Technology, Beijing Institute of Technology

Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic survey that introduces, analyzes, discusses, and summarizes the SpMV work in recent years is currently lacking. Aiming to fill this gap, this paper compares existing techniques and analyzes their strengths and weaknesses. We conduct an in-depth overview of the important techniques that optimize SpMV on modern architectures, which we specifically classify as classic, auto-tuning, machine learning, and mixed-precision-based optimization. We also elaborate on the hardware-based architectures, including CPU, GPU, and FPGA. We present an experimental evaluation that compares the performance of state-of-the-art SpMV implementations. Based on our findings, we identify several challenges and point out future research directions. This survey intends to provide researchers with a comprehensive understanding of SpMV optimization on modern architectures and guide future work.

Content

Citation

@misc{gao2024spmvsurvey,
     title={A Systematic Literature Survey of Sparse Matrix-Vector Multiplication},
     author={Jianhua Gao and Bingjie Liu and Weixing Ji and Hua Huang},
     year={2024},
     eprint={2404.06047},
     archivePrefix={arXiv},
     primaryClass={cs.DC},
     url={https://arxiv.org/abs/2404.06047},
}

Papers

SpMV Related Surveys

  • A Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs. Aditi V. Kulkarni, C. R. Barde. International Journal of Computer Science and Information Technologies (IJCSIT), 2014 [DOI] [pdf]
  • A Survey of Sparse Matrix-Vector Multiplication Performance on Large Matrices. Max Grossman, Christopher Thiele, Mauricio Araya-Polo, Florian Frank, Faruk O. Alpak, Vivek Sarkar. arXiv, 2016 [DOI] [pdf]
  • Sparse Matrix-Vector Multiplication on GPGPUs. Salvatore Filippone, Valeria Cardellini, Davide Barbieri, Alessandro Fanfarillo, ACM Transactions on Mathematical Software (TOMS), 2017 [DOI]
  • Research on Performance Optimization for Sparse Matrix-Vector Multiplication in Multi/Many-Core Architecture. Qihan Wang, Mingliang Li, Jianming Pang, Di Zhu, International Conference on Information Technology and Computer Application (ITCA). IEEE, 2020 [DOI] [pdf]
  • A Survey of Accelerating Parallel Sparse Linear Algebra. Guoqing Xiao, Chuanghui Yin, Tao Zhou, Xueqi Li, Yuedan Chen, Kenli Li, ACM Computing Surveys (CSUR), 2023 [DOI]

Sparse Compression Formats

Basic Compression Formats

  • Data Structures to Vectorize CG Algorithms for General Sparsity Patterns. Gaia Valeria Paolini, Giuseppe Radicati Di Brozolo, BIT Numerical Mathematics (BITNM), 1989 [DOI]
  • Sparse Matrix Vector Multiplication Techniques on the IBM 3090 VF. Alexander Peters, Parallel Computing (PC), 1991 [DOI]
  • SPARSKIT: A Basic Took Kit for Sparse Matrix Computations, Version 2. Youcef Saad, 1994 [DOI] [pdf]
  • Scan Primitives for GPU Computing. Shubhabrata Sengupta, Mark Harris, Yao Zhang, John D. Owens, in Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium (SCA), 2007 [DOI] [pdf]
  • Sparse Matrix Computations on Manycore GPU’s. Michael Garland, in Proceedings of the 45th annual Design Automation Conference (DAC), 2008 [DOI] [pdf]

New Compression Formats

Regular Slicing

  • The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-Enabled GPUs. Hoang-Vu Dang, Bertil Schmidt, Proceedings of the International Conference on Computational Science(ICCS), 2012 [DOI] [pdf]
  • Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. Eduardo F. D’Azevedo, Mark R. Fahey, Richard T. Mills, Computational Science(ICCS), 2005 [DOI]
  • Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. Alexander Monakov, Anton Lokhmotov, Arutyun Avetisyan, High Performance Embedded Architectures and Compilers(HiPEAC), 2010 [DOI]
  • Three Storage Formats for Sparse Matrices on GPGPUs. Davide Barbieri, Alessandro Fanfarillo, Valeria Cardellini, Salvatore Filippone, Technical Report, 2015 [DOI] [pdf]
  • Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation. Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Achim Basermann, Alan R. Bishop, 26th IEEE International Parallel and Distributed Processing Symposium(IPDPS), 2012 [DOI] [pdf]
  • A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Alan R. Bishop, SIAM Journal on Scientific Computing(IPDPS), 2014 [DOI]
  • Optimizing Sparse Matrix Vector Multiplication Using Diagonal Storage Matrix Format. Liang Yuan, Yunquan Zhang, Xiangzheng Sun, Ting Wang, The International Conference on High Performance Computing (HPCC),2010 [DOI] [pdf]
  • CRSD: Application Specific Auto-tuning of SpMV for Diagonal Sparse Matrices. Xiangzheng Sun, Yunquan Zhang, Ting Wang, Guoping Long, Xianyi Zhang, Yan L,. Euro-Par 2011 Parallel Processing (Euro-Par), 2011 [DOI]

Regular Blocking

  • SPARSKIT: A Basic Took Kit for Sparse Matrix Computations, Version 2. Y. Saad, 1994 [DOI] [pdf]
  • Improving Performance of Sparse Matrix-Vector Multiplication. Ali Pınar, Michael T. Heath, Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (SC), 1999 [DOI] [pdf]
  • VBSF: A New Storage Format for SIMD Sparse Matrix–Vector Multiplication on Modern Processors. Yishui Li, Peizhen Xie, Xinhai Chen, Jie Liu, Bo Yang, Shengguo Li, Chunye Gong, Xinbiao Gan, Han Xu, The Journal of Supercomputing (TJSC), 2020 [DOI]
  • Automatic Performance Tuning of Sparse Matrix Kernels. Richard Wilson Vuduc, A dissertation submitted in partial, 2003 [DOI] [pdf]
  • An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs. Arash Ashari, Naser Sedaghati, John Eisenlohr, P. Sadayappan, Proceedings of the 28th ACM international conference on Supercomputing (ICS),2014 [DOI]
  • A Case Study of Streaming Storage Format for Sparse Matrices. Shweta Jain-Mendon, Ron Sass, 2012 International Conference on Reconfigurable Computing and FPGAs (ReConFig), 2012 [DOI] [pdf]
  • yaSpMV: Yet Another SpMV Framework on GPUs. Shengen Yan, Chao Li, Yunquan Zhang, Huiyang Zhou, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2014 [DOI]
  • High-Performance Sparse Matrix-Vector Multiplication on GPUs for Structured Grid Computations. Jeswin Godwin, Justin Holewinski, P. Sadayappan, Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5), 2012 [DOI]
  • Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. Jee W. Choi, Amik Singh, Richard W. Vuduc, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2010 [DOI] [pdf]

Irregular Compressing

  • CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. Weifeng Liu, Brian Vinter, Proceedings of the 29th ACM on International Conference on Supercomputing (ICS), 2015 [DOI] [pdf][code]
  • CSR2: A New Format for SIMD-Accelerated SpMV. Haodong Bian, Jianqiang Huang, Runting Dong, Lingbin Liu, Xiaoying Wang, IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID), 2020 [DOI] [pdf] [code]
  • CSX: An Extended Compression Format for SpMV on Shared Memory Systems. Kornilios Kourtis, Vasileios Karakasis, Georgios Goumas, Nectarios Koziris, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2011 [DOI]
  • ClSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. Bor-Yiing Su, Kurt Keutzer,Proceedings of the 26th ACM international conference on Supercomputing (ICS) ,2012 [DOI] [pdf]

Bit or Byte Compressing

  • Accelerating Sparse Matrix Computations Via Data Compression. Jeremiah Willcock, Andrew Lumsdaine, Proceedings of the 20th annual international conference on Supercomputing (ICS), 2006 [DOI]
  • Optimizing sparse matrix-vector multiplication using index and value compression. Kornilios Kourtis, Georgios Goumas, Nectarios Koziris, Proceedings of the 5th conference on Computing frontiers (CF),2008 [DOI] [pdf]
  • A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU. Wai Teng Tang, Wen Jun Tan, Rick Siow Mong Goh, Stephen John Turner, Weng-Fai Wong, IEEE Transactions on Parallel and Distributed Systems (TPDS), 2015 [DOI] [pdf]
  • Towards a Universal FPGA Matrix-Vector Multiplication Architecture. Jeremiah Willcock, Andrew Lumsdaine, Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2012 [DOI] [pdf]

Hybrid Encoding

  • Efficient Sparse Matrix-Vector Multiplication on CUDA. Nathan Bell, Michael Garland, NVIDIA Technical Report (NVR),2008 [DOI]
  • Load-Balancing Sparse Matrix Vector Product Kernels on GPUs. Hartwig Anzt, Terry Cojean, Yen-Chen Chen, Jack Dongarra, Goran Flegar, Pratik Nayak, Stanimire Tomov, Yuhsiang M.Tsai, Weichung Wang, ACM Transactions on Parallel Computing (TOPC), 2020 [DOI] [pdf] [ginkgo-project] [ginkgo-data]
  • Acceleration of Conjugate Gradient Method for Circuit Simulation Using CUDA. Anirudh Maringanti, Viraj Athavale, Sachin B. Patkar, International Conference on High Performance Computing (HPCC), 2009 [DOI] [pdf]
  • A Parallel Computing Method Using Blocked Format with Optimal Partitioning for SpMV on GPU. Wangdong Yang, Kenli Li, Keqin Li ,Journal of Computer and System Sciences (JCSS), 2018 [DOI]
  • Adaptive Multi-Level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU. Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka, Procedia Computer Science (PCS), 2016 [DOI]
  • Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Jeff Bolz, Ian Farmer, Eitan Grinspun, Peter Schr¨ oder, ACM Transactions on Graphics (TOG), 2003 [DOI] [pdf]
  • Optimization of Quasi-Diagonal Matrix-Vector Multiplication on GPU. Wangdong Yang, Kenli Li, Yan Liu, Lin Shi, Lanjun Wan, International Journal of High Performance Computing Applications (IJHPCA), 2014 [DOI]
  • TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU. Jianhua Gao, Weixing Ji, Zhaonian Tan, Yizhuo Wang, Feng Shi, IEEE Transactions on Parallel and Distributed Systems (ICPADS), 2022 [DOI] [pdf] [code]
  • Optimizing and Auto-Tuning Scale-Free Sparse Matrix-Vector Multiplication on Intel Xeon Phi. Wai Teng Tang, Ruizhe Zhao, Mian Lu, Yun Liang, Huynh Phung Huyng, Xibai Li, International Symposium on Code Generation and Optimization (CGO), 2015 [DOI] [pdf]
  • Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs. Alexander Monakov, Arutyun Avetisyan, Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2009 [DOI]
  • MMSparse: 2D Partitioning of Sparse Matrix Based on Mathematical Morphology. *Zhaonian Tan, Weixing Ji, Jianhua Gao, Yueyan Zhao, Akrem Benatia, Yizhuo Wang, Feng Shi, Future Generation Computer Systems (FGCS), 2020 [DOI] [code]
  • TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, Guangming Tan, International Symposium on Parallel and Distributed Processing (IPDPS), 2021 [DOI] [pdf]

Other variants

  • Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam. John Mellor-Crummey, John Garvin, International Journal of High Performance Computing Applications (IJHPC), 2004 [DOI] [pdf]
  • Anewapproachfor Sparse Matrix Vector Product on NVIDIAGPUs. Francisco Vázquez, José-Jesús Fernández, Ester M. Garzón, Concurrency Computation Practice and Experience (CCPE),2011 [DOI] [pdf]
  • Parallel Solution of Linear Systems with Striped Sparse Matrices. Rami Melhem, Parallel Computing (PC), 1988 [DOI]
  • An Optimal Storage Format for Sparse Matrices. Eurı́pides Montagne, Anand Ekambaram, Information Processing Letters (IPL), 2004 [DOI]

Auto-Tuning Based Algorithm

Offline Auto-Tuning

  • Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY. Eun-Jin Im, Katherine Yelick, Proceedings of the International Conference on Computational Sciences-Part I (ICCS), 2001 [DOI] [pdf]
  • SPARSITY: Optimization Framework for Sparse Matrix Kernels. Eun-Jin Im, Katherine Yelick, Richard Vuduc, International Journal of High Performance Computing Applications (IJHPCA), 2004 [DOI] [pdf]
  • Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. Rich Vuduc, James Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, Benjamin C. Lee, Supercomputing Conference (SC), 2002 [DOI]
  • Automatic Performance Tuning of Sparse Matrix Kernels. Richard Wilson Vuduc, A dissertation submitted for the degree of Doctor of Philosophy, 2003 [DOI] [pdf]
  • OSKI: A Library of Automatically Tuned Sparse Matrix Kernels. Richard Vuduc, James W Demmel and Katherine A Yelick, Journal of Physics: Conference Series (JPCS), 2005 [DOI] [pdf]
  • When Cache Blocking of Sparse Matrix Vector Multiply Works and Why. Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick, Applicable Algebra in Engineering, Communication and Computing (AAECC), 2007 [DOI] [pdf]
  • Automatic Performance Tuning of SpMV on GPGPU. Xianyi Zhang, Yunquan Zhang, Xiangzheng Sun, Fangfang Liu, Shengfei Liu, Yuxin Tang, Yucheng Li, Computer Systems Science and Engineering (CSSE), 2009 [DOI] [pdf]
  • Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. JeeWhan Choi, Amik Singh, Richard W. Vuduc, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2010 [DOI] [pdf]
  • A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs. Ping Guo, Liqiang Wang, Po Chen, IEEE Transactions on Parallel and Distributed Systems (TPDS), 2014 [DOI] [pdf]
  • Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling. Kenli Li, Wangdong Yang, Keqin Li, IEEE Transactions on Parallel and Distributed Systems (TPDS), 2015 [DOI] [pdf]
  • Automatic Tuning of the Sparse Matrix Vector Product on GPUs Based on the ELLR-T Approach. Francisco Vázquez, José Jesús Fernández, Ester M. Garzón, Parallel Computing (PC), 2012 [DOI] [pdf]
  • Improving the Performance of the Sparse Matrix Vector Product with GPUs. Francisco V´azquez, Gloria Ortega, José Jesús Fernández, Ester M. Garzón, IEEE International Conference on Computer and Information Technology (CIT), 2010 [DOI] [pdf]
  • Automatic Tuning of Sparse Matrix-Vector Multiplication on Multicore Clusters. ShiGang Li, ChangJun Hu, JunChao Zhang, YunQuan Zhang, Science China Information Sciences (SCIS), 2015 [DOI]
  • hpSpMV: A Heterogeneous Parallel Computing Scheme for SpMV on the Sunway TaihuLight Supercomputer. Yuedan Chen, Guoqing Xiao, Zheng Xiao, Wangdong Yang, IEEE International Conference on High Performance Computing and Communications (HPCC), 2019 [DOI]

Online Auto-Tuning

  • Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs. Ping Guo, Liqiang Wang, International Conference on Computational and Information Sciences (ICCIS), 2010 [DOI] [pdf]
  • Efficient Sparse Matrix-Vector Multiplication on Cache-Based GPUs. István R eguly, Mike Giles, Innovative Parallel Computing (InPar), 2012 [DOI] [pdf]
  • Auto-Tuning of Sparse Matrix-Vector Multiplication on Graphics Processors. Walid Abu-Sufah, Asma Abdel Karim, Supercomputing (ISC), 2013 [DOI] [pdf]
  • An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units. Walid Abu-Sufah, Asma Abdel Karim, IEEE International Conference on High Performance Computing and Communications (HPCC), 2012 [DOI] [pdf]
  • CRSD: Application Specific Auto-tuning of SpMV for Diagonal Sparse Matrices. Xiangzheng Sun, Yunquan Zhang, Ting Wang, Guoping Long, Xianyi Zhang, Yan Li, Euro-Par 2011 Parallel Processing (Euro-Par), 2011 [DOI] [pdf]
  • yaSpMV: Yet Another SpMV Framework on GPUs. Shengen Yan, Chao Li, Yunquan Zhang, Huiyang Zhou, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2014 [DOI] [pdf]

Machine-Learning Based Algorithm

Format or Algorithm Selection

  • Reinforcement Learning for Automated Performance Tuning: Initial Evaluation for Sparse Matrix Format Selection. Warren Armstrong, Alistair P. Rendell, IEEE International Conference on Cluster Computing (CLUSTER),2008 [DOI] [pdf]
  • SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication. Jiajia Li, Guangming Tan, Mingyu Chen, Ninghui Sun, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2013 [DOI]
  • Automatic Selection of Sparse Matrix Representation on GPUs. Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, P. Sadayappan, Proceedings of the 29th ACM on International Conference on Supercomputing (ICS), 2015 [DOI] [pdf]
  • Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures. Shizhao Chen, Jianbin Fang, Donglin Chen, Chuanfu Xu, Zheng Wang, IEEE International Conference on High Performance Computing and Communications (HPCC), 2018 [DOI] [pdf]
  • Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU. Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi, International Conference on Parallel Processing (ICPP), 2016 [DOI] [pdf]
  • Machine Learning for Optimal Compression Format Prediction on Multiprocessor Platform. Ichrak Mehrez, Olfa Hamdi-Larbi, Thomas Dufaud, Nahid Emad, International Conference on High Performance Computing & Simulation (HPCS), 2018 [DOI] [pdf]
  • Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors. Kaixi Hou, Wu-chun Feng, Shuai Che, IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2017 [DOI] [pdf]
  • BestSF: A Sparse Meta-Format for Optimizing SpMV on GPU. Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi, ACM Transactions on Architecture and Code Optimization (TACO), 2018 [DOI] [pdf] [code]
  • Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs. Israt Nisa, Charles Siegel, Aravind Sukumaran Rajam, Abhinav Vishnu, P. Sadayappan, IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2018 [DOI] [pdf]
  • A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats. Hang CUI, Shoichi Hirasawa, Hiroaki Kobayashi, Hiroyuki Takizawa, IEICE Transactions on Information and Systems (ITIS), 2018 [DOI] [pdf]
  • Bridging the Gap Between Deep Learning and Sparse Matrix Format Selection. Yue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP), 2018 [DOI] [pdf]
  • Machine Learning to Design an Auto-tuning System for the Best Compressed Format Detection for Parallel Sparse Computations. Olfa Hamdi-Larbi, Ichrak Mehrez, Thomas Dufaud, Parallel Processing Letters (PPL), 2021 [DOI]
  • Enabling Runtime SpMV Format Selection through an Overhead Conscious Method. Weijie Zhou, Yue Zhao, Xipeng Shen, Wang Chen, IEEE Transactions on Parallel and Distributed Systems (TPDS), 2019 [DOI] [pdf]
  • BASMAT: Bottleneck-Aware Sparse Matrix-Vector Multiplication Auto-Tuning on GPGPUs. Athena Elafrou, Georgios Goumas, Nectarios Koziris, Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP), 2019 [DOI]
  • Selecting Optimal SpMV Realizations for GPUs via Machine Learning. Ernesto Dufrechou1, Pablo Ezzatti, Enrique S Quintana-Ort´ı, The international journal of high performance computing applications (IJHPCA), 2021 [DOI] [pdf]
  • DTSpMV: An Adaptive SpMV Framework for Graph Analysis on GPUs. Guoqing Xiao, Tao Zhou, Yuedan Chen, Yikun Hu, Kenli Li, IEEE International Conference on High Performance Computing and Communications (HPCC), 2022 [DOI] [pdf]

Parameter Prediction

  • Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors. Kaixi Hou, Wu-chun Feng, Shuai Che, IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2017 [DOI] [pdf]
  • ZAKI: A Smart Method and Tool for Automatic Performance Optimization of Parallel SpMV Computations on Distributed Memory Machines. Sardar Usman, Rashid Mehmood, Iyad Katib, Aiiad Albeshri, Saleh M. Altowaijri, Mobile Networks and Applications (MNA), 2019 [DOI]
  • ZAKI+: A Machine Learning Based Process Mapping Tool for SpMV Computations on Distributed Memory Architectures. Sardar Usman, Rashid Mehmood, Iyad Katib, Aiiad Albeshri, IEEE Access, 2019 [DOI] [pdf]
  • AAQAL: A Machine Learning-Based Tool for Performance Optimization of Parallel SpMV Computations Using Block CSR. Muhammad Ahmed, Sardar Usman, Nehad Ali Shah, M. Usman Ashraf, Ahmed Mohammed Alghamdi, Adel A. Bahadded, Khalid Ali Almarhabi, Applied Sciences (AS), 2022 [DOI]
  • Revisiting Thread Configuration of SpMV Kernels on GPU: A Machine Learning Based Approach. Jianhua Gao, Weixing Ji, Jie Liu, Yizhuo Wang, Feng Shi, Journal of Parallel and Distributed Computing (JPDC), 2024 [DOI]

Performance Prediction

  • Machine Learning Approach for the Predicting Performance of SpMV on GPU. Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi, International Conference on Parallel and Distributed Systems (ICPADS), 2016 [DOI] [pdf]
  • Sparse Matrix Partitioning for Optimizing SpMV on CPU-GPU Heterogeneous Platforms. Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi, The International Journal of High Performance Computing Applications (IJHPCA), 2019 [DOI]
  • Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs. Israt Nisa, Charles Siegel, Aravind Sukumaran Rajam, Abhinav Vishnu, P. Sadayappan, IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018 [DOI] [pdf]
  • Performance Modeling of the Sparse Matrix-Vector Product via Convolutional Neural Networks. Maria Barreda, Manuel F. Dolz, M. Asunción Castaño, Pedro Alonso-Jordá, Enrique S. Quintana-Ortí, The Journal of Supercomputing (JS), 2020 [DOI]
  • Convolutional Neural Nets for Estimating the Run Time and Energy Consumption of the Sparse Matrix-Vector Product. Maria Barreda, Manuel F Dolz, M Asunción Castaño, The International Journal of High Performance Computing Applications (IJHPCA), 2021 [DOI]

Mixed Precision Based Optimization

Mixed-Precision Iterative Solving Algorithms

  • Exploiting Variable Precision in GMRES. Serge Gratton, Ehouarn Simon, David Titley-Peloquin, Philippe Toint, arXiv, 2019 [DOI] [pdf]
  • Compressed Basis GMRES on High-Performance Graphics Processing Units. José I Aliaga, Hartwig Anzt, Thomas Grützmacher, Enrique S. Quintana-Ortí, Andrés E. Tomás, The International Journal of High Performance Computing Applications (IJHPCA), 2023 [DOI]
  • Improving the Performance of the GMRES Method Using Mixed-Precision Techniques. Neil Lindquist, Piotr Luszczek, Jack Dongarra, Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI (SMC), 2020 [DOI]
  • A Study of Mixed Precision Strategies for GMRES on GPUs. Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam, arXiv, 2021 [pdf]
  • Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. Erin Carson, Nicholas J. Higham, SIAM Journal on Scientific Computing (SJSC), 2018 [DOI] [pdf] [code]
  • Mixed-Precision In-Memory Computing. Manuel Le Gallo, Abu Sebastian, Roland Mathis, Matteo Manica, Heiner Giefers, Tomas Tuma, Costas Bekas, Alessandro Curioni, Evangelos Eleftheriou, Nature Electronics (NE), 2018 [DOI]

Mixed-Precision SpMV

  • Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs. Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architecture and Code Optimization (TACO), 2020 [DOI] [pdf]
  • Performance and Energy Consumption of Accurate and Mixed-Precision Linear Algebra Kernels on GPUs. Daichi Mukunoki, Takeshi Ogita, Journal of Computational and Applied Mathematics (JCAM), 2020 [DOI]
  • Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection. Erhan Tezcan, Tugba Torun, Fahrican Koşar, Kamer Kaya, Didem Unat, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022 [DOI] [pdf]
  • Adaptive Precision Matrix-Vector Product. Stef Graillat, Fabienne Jézéquel, Théo Mary, Roméo Molina, HAL, 2022 [DOI] [pdf]
  • Multiple-Precision Sparse Matrix-Vector Multiplication on GPUs. Konstantin Isupov, Journal of Computational Science (JCS), 2022 [DOI]
  • A Highly Efficient Implementation of Multiple Precision Sparse Matrix-Vector Multiplication and Its Application to Product-type Krylov Subspace Methods. Tomonori Kouya, arXiv, 2014 [DOI] [pdf]

Architecture Oriented Optimization

CPU

  • Performance of a Structure-Detecting SpMV Using the CSR Matrix Representation. Hans Pabst, Bev Bachmayer, Michael Klemm, International Symposium on Parallel and Distributed Computing (HPDC), 2012 [DOI] [pdf]
  • CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. Weifeng Liu, Brian Vinter, Proceedings of the 29th ACM on International Conference on Supercomputing (ICS), 2015 [DOI] [pdf][code]
  • CSR2: A New Format for SIMD-Accelerated SpMV. Haodong Bian, Jianqiang Huang, Runting Dong, Lingbin Liu, Xiaoying Wang, IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID), 2020 [DOI] [pdf] [code]
  • Performance Analysis and Optimization for SpMV Based on Aligned Storage Formats on an ARM Processor. Yufeng Zhang, Wangdong Yang, Kenli Li, Dahai Tang, Keqin Li, Journal of Parallel and Distributed Computing (JPDC), 2021 [DOI] [pdf]
  • Sparsity: Optimization Framework for Sparse Matrix Kernels. Eun-Jin Im, Katherine Yelick, Richard Vuduc, International Journal of High Performance Computing Applications (IJHPCA), 2004 [DOI]
  • When Cache Blocking of Sparse Matrix Vector Multiply Works and Why. Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick, Applicable Algebra in Engineering, Communication and Computing (AAECC), 2007 [DOI]
  • Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, Supercomputing Conference (SC), 2007 [DOI] [pdf]
  • Optimization of Sparse Matrix–Vector Multiplication on Emerging Multicore Platforms. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, Parallel Computing (PC), 2009 [DOI] [pdf]
  • Optimizing Sparse Matrix Vector Multiplication on Emerging Multicores. Orhan Kislal, Wei Ding, Mahmut Kandemir, Ilteris Demirkiran, IEEE International Workshop on Multi-/Many-core Computing Systems (MuCoCoS), 2013 [DOI] [pdf]
  • Memory Access Complexity Analysis of SpMV in RAM (h) Model. E. Yuan, Yun-quan Zhang, Xiangzheng Sun, IEEE International Conference on High Performance Computing and Communications (HPCC), 2008 [DOI] [pdf]
  • Performance Optimizations and Bounds for Sparse Symmetric Matrix- Multiple Vector Multiply. Benjamin C. Lee, Katherine A. Yelick, Richard W. Vuduc, James W. Demmel, Michael de Lorimier, Lijue Zhong, Supercomputing Conference (SC), 2003 [DOI] [pdf]
  • Performance Evaluation of Multithreaded Sparse Matrix-Vector Multiplication Using OpenMP. Shengfei Liu, Yunquan Zhang, Xiangzheng Sun, RongRong Qiu, IEEE International Conference on High Performance Computing and Communications (HPCC), 2009 [DOI] [pdf]
  • Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors. Athena Elafrou, Georgios Goumas, Nectarios Koziris, International Conference on Parallel Processing (ICPP), 2017 [DOI] [pdf]
  • Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs. James D. Trotter, Sinan Ekmekçibaşi, Johannes Langguth, Tugba Torun, Emre Düzakın, Aleksandar Ilic, Supercomputing Conference (SC), 2023 [DOI] [pdf]
  • NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures. Xiaosong Yu, Huihui Ma, Zhengyu Qu, Jianbin Fang, Weifeng Liu, Network and Parallel Computing (NPC), 2020 [DOI] [pdf]

GPU

Single GPU

  • Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?. John Nickolls, Ian Buck, Michael Garland, Kevin Skadron, ACM Queue (Queue), 2008 [DOI] [pdf]
  • Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications. Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarath, P. Sadayappan, Supercomputing Conference (SC), 2014 [DOI] [pdf]
  • Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format. Joseph L. Greathouse, Mayank Daga, Supercomputing Conference (SC), 2014 [DOI] [pdf]
  • Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices. Mayank Daga, Joseph L. Greathouse, International Conference on High Performance Computing (HPCC), 2015 [DOI] [pdf] [code]
  • CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. Weifeng Liu, Brian Vinter, Proceedings of the 29th ACM on International Conference on Supercomputing (ICS), 2015[DOI] [pdf] [code]
  • Merge-Based Parallel Sparse Matrix-Vector Multiplication. Duane Merrill, Michael Garland, Supercomputing Conference (SC), 2016 [DOI] [pdf] [code]
  • Overcoming Load Imbalance for Irregular Sparse Matrices. Goran Flegar, Hartwig Anzt, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2017 [DOI]
  • Load-Balancing Sparse Matrix Vector Product Kernels on GPUs. Hartwig Anzt, Terry Cojean, Yen-Chen Chen, Jack Dongarra, Goran Flegar, Pratik Nayak, Stanimire Tomov, Yuhsiang M.Tsai, Weichung Wang, ACM Transactions on Parallel Computing (TOPC), 2020 [DOI] [pdf] [ginkgo-project] [ginkgo-data]
  • AMF-CSR: Adaptive Multi-Row Folding of CSR for SpMV on GPU. Jianhua Gao, Weixing Ji, Jie Liu, Senhao Shao, Yizhuo Wang, Feng Shi, International Conference on Parallel and Distributed Systems (ICPADS), 2021 [DOI] [pdf]
  • Compressed Multirow Storage Format for Sparse Matrices on Graphics Processing Units. Zbigniew Koza, Maciej Matyka, Sebastian Szkoda, Łukasz Mirosław, SIAM Journal on Scientific Computing (SJSC), 2014 [DOI] [pdf]
  • Globally Homogeneous, Locally Adaptive Sparse Matrix-Vector Multiplication on the GPU. Markus Steinberger, Rhaleb Zayer, Hans-Peter SeidelAuthors Info, Claims, Proceedings of the International Conference on Supercomputing (ICS), 2017 [DOI] [pdf]
  • Optimizing Sparse Matrix-Vector Multiplication on GPUs. Muthu Manikandan Baskaran, Rajesh Bordawekar, Computer Science, Engineering (CSE), 2009 [DOI]
  • Efficient Sparse Matrix-Vector Multiplication on CUDA. Nathan Bell, Michael Garland, NVIDIA Technical Report (NVR),2008 [DOI]
  • Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. Nathan Bell, Michael Garland, Supercomputing Conference (SC), 2009 [DOI] [pdf]
  • Efficient Sparse Matrix-Vector Multiplication on Cache-Based GPUs. István R eguly, Mike Giles, Innovative Parallel Computing (InPar), 2012 [DOI] [pdf]
  • Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS Format on GPUs. Hiroki Yoshizawa, Daisuke Takahashi, IEEE International Conference on Computational Science and Engineering (CSE), 2012 [DOI] [pdf]
  • Revisiting Thread Configuration of SpMV Kernels on GPU: A Machine Learning Based Approach. Jianhua Gao, Weixing Ji, Jie Liu, Yizhuo Wang, Feng Shi, Journal of Parallel and Distributed Computing (JPDC), 2024 [DOI] [pdf]
  • Taming Irregular EDA Applications on GPUs. Yangdong Deng, Bo David Wang, Shuai Mu, IEEE International Conference on Computer-Aided Design (ICCAD), 2009 [DOI] [pdf]
  • Parallel GMRES Solver for Fast Analysis of Large Linear Dynamic Systems on GPU Platforms. Kai He, Sheldon X.-D. Tan, Hengyang Zhao, Xue-Xin Liu, Hai Wang, Guoyong Shi, Integration, 2016 [DOI] [pdf]
  • Sparse Matrix Computations on Manycore GPU’s. Michael Garland, Design Automation Conference (DAC), 2008 [DOI] [pdf]
  • VCSR: An Efficient GPU Memory-Aware Sparse Format. Elmira Karimi, Nicolas Bohm Agostini, Shi Dong, David Kaeli, IEEE Transactions on Parallel and Distributed Systems (TPDS), 2022 [DOI] [pdf]
  • Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. Daichi Mukunoki, Daisuke Takahashi, Computational Science and Its Applications (ICCSA), 2013 [DOI]
  • DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication. Yuechen Lu, Weifeng Liu, Supercomputing Conference (SC), 2023 [DOI] [pdf]
  • Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. Jee Whan Choi, Amik Singh, Richard W. Vuduc, ACM SIGPLAN Notices (ACM SIGPLAN Not), 2010 [DOI] [pdf]
  • LightSpMV: Faster CSR-Based Sparse Matrix-Vector Multiplication on CUDA-Enabled GPUs. Yongchao Liu, Bertil Schmidt, International Conference on Application Specific Systems, Architectures and Processors (ASAP), 2015 [DOI] [pdf]
  • LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows. Yongchao Liu, Bertil Schmidt, Journal of Signal Processing Systems (JSPS), 2018 [DOI]
  • TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, Guangming Tan, International Symposium on Parallel and Distributed Processing (IPDPS), 2021 [DOI] [pdf]
  • Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs. Ping Guo, Liqiang Wang, International Conference on Computational and Information Sciences (ICCIS), 2010 [DOI] [pdf]
  • Automatically Generating and Tuning GPU Code for Sparse Matrix-Vector Multiplication from a High-Level Representation. Dominik Grewe, Anton Lokhmotov, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units (GPGPU), 2011 [DOI] [pdf]

Multiple GPUs

  • Fast Conjugate Gradients with Multiple GPUs. Ali Cevahir, Akira Nukada, Satoshi Matsuoka, Computational Science (ICCS), 2009 [DOI] [pdf]
  • Performance Optimization for SpMV on Multi-GPU Systems Using Threads and Multiple Streams. Ping Guo, Changjiang Zhang, International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2016 [DOI] [pdf]
  • Multi-GPU Implementation of the Uniformization Method for Solving Markov Models. Marek Karwacki, Beata Bylina, Jarosław Bylina, Federated Conference on Computer Science and Information Systems (FedCSIS), 2012 [DOI] [pdf]
  • Analysis and Performance Estimation of the Conjugate Gradient Method on Multiple GPUs. Mickeal Verschoor, Andrei C. Jalba, Parallel Computing (PC), 2012 [DOI] [pdf]
  • Preconditioned GMRES Solver on Multiple-GPU Architecture. Bo Yang, Hui Liu, Zhangxin Chen, Computers & Mathematics with Applications (CMA), 2016 [DOI] [pdf]
  • A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. George Karypis, Vipin Kumar, SIAM Journal on Scientific Computing (SJSC), 1998 [DOI] [pdf]
  • Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs. Jiaquan Gao, Yu Wang, Jun Wang, Ronghua Liang, ACM Transactions on Parallel Computing (TOPC), 2016 [DOI]
  • A Multi-GPU Parallel Optimization Model for the Preconditioned Conjugate Gradient Algorithm. Jiaquan Gao, Yuanshen Zhou, Guixia He, Yifei Xia, Parallel Computing (PC), 2017 [DOI] [pdf]
  • A Novel Multi-Graphics Processing Unit Parallel Optimization Framework for the Sparse Matrix-Vector Multiplication. Jiaquan Gao, Yu Wang, Jun Wang, Concurrency and Computation: Practice and Experience (CCPE), 2017 [DOI]
  • P-cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems Using Dynamic Matrix Assembly and Pipelined Implicit Integrators. Cheng Li, Min Tang, Ruofeng Tong, Ming Cai, Jieyi Zhao, Dinesh Manocha, ACM Transactions on Graphics (TOG), 2020 [DOI]
  • MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems. Jieyang Chen, Chenhao Xie, Jesun S Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin Barker, Mark Raugas, Ang Li, arXiv, 2022 [DOI] [pdf]
  • Exploring the Multiple-GPU Design Space. Dana Schaa, David Kaeli, International Symposium on Parallel and Distributed Processing (IPDPS), 2009 [DOI] [pdf]
  • High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications. Ahmad Abdelfattah, Hatem Ltaief, David Keyes, Euro-Par: Parallel Processing (Euro-Par), 2015 [DOI] [pdf]

FPGA

  • FPGA and GPU Implementation of Large Scale SpMV. Yi Shan, Tianji Wu, Yu Wang, Bo Wang, Zilong Wang, Ningyi Xu, Symposium on Application Specific Processors (SASP), 2010 [DOI] [pdf]
  • A Case Study of Streaming Storage Format for Sparse Matrices. Shweta Jain-Mendon, Ron Sass, International Conference on Reconfigurable Computing and FPGAs (ReConFig), 2012 [DOI] [pdf]
  • Towards a Universal FPGA Matrix-Vector Multiplication Architecture. Srinidhi Kestur, John D. Davis, Eric S. Chung, Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2012 [DOI] [pdf]
  • An Energy Efficient Column-Major Backend for FPGA SpMV Accelerators. Yaman Umuroglu, Magnus Jahre, IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD), 2014 [DOI] [pdf]
  • A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication. Jeremy Fowers, Kalin Ovtcharov, Karin Strauss, Eric S. Chung, Greg Stitt, Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2014 [DOI] [pdf]
  • Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM). Jannatun Naher, Clay Gloster, Christopher C. Doss, Shrikant S. Jadhav, IEEE Annual Computing and Communication Workshop and Conference (CCWC), 2020 [DOI] [pdf]
  • A Vector Caching Scheme for Streaming FPGA SpMV Accelerators. Yaman Umuroglu, Magnus Jahre, Applied Reconfigurable Computing (ARC), 2015 [DOI]
  • Random Access Schemes for Efficient FPGA SpMV Acceleration. Yaman Umuroglu, Magnus Jahre, Microprocessors and Microsystems (MICPRO), 2016 [DOI] [pdf]
  • Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization. Fazle Sadi, Joe Sweeney, Tze Meng Low, James C. Hoe, Larry Pileggi, Franz Franchetti, Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019 [DOI]
  • A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis. Mohammad Hosseinabady, Jose Luis Nunez-Yanez, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2020 [DOI] [pdf]
  • An FPGA Cached Sparse Matrix Vector Product (SpMV) for Unstructured Computational Fluid Dynamics Simulations. Guillermo Oyarzun, Daniel Peyrolon, Carlos Alvarez, Xavier Martorell, arXiv, 2021 [DOI] [pdf]
  • Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs. Alberto Parravicini, Luca Giuseppe Cellamare, Marco Siracusa, Marco D. Santambrogio, Design Automation Conference (DAC), 2021 [DOI] [pdf] [code]
  • Towards High-Bandwidth-Utilization SpMV on FPGAs via Partial Vector Duplication. Bowen Liu, Dajiang Liu, Asia and South Pacific Design Automation Conference (ASP-DAC),2023 [DOI] [pdf]
  • FPGA-Based HPC Accelerators: An Evaluation on Performance and Energy Efficiency. Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, Concurrency Computation: Practice and Experience (CCPE), 2022 [DOI]
  • Optimizing the Performance of the Sparse Matrix–Vector Multiplication Kernel in FPGA Guided by the Roofline Model. * Federico Favaro, Ernesto Dufrechou, Juan P. Oliver, Pablo Ezzatti, Micromachines, 2023* [DOI]
  • Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM). Jannatun Naher, Clay Gloster, Christopher C. Doss, Shrikant S. Jadhav, IEEE Annual Computing and Communication Workshop and Conference (CCWC), 2020 [DOI] [pdf]
  • Hardware Acceleration of SpMV Multiplier for Deep Learning. Mahadurkar Mahesh, Nalesh Sivanandan, Kala S, International Symposium on VLSI Design and Test (VDAT), 2021 [DOI] [pdf]

Processing in Memory

  • SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, IEEE Symposium on High-Performance Computer Architecture (HPCA), 2021 [DOI] [pdf]
  • ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast. Weiyi Sun, Zhaoshi Li, Shouyi Yin, Shaojun Wei, Leibo Liu, Annual International Symposium on Computer Architecture (ISCA), 2021 [DOI] [pdf]
  • SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures. Christina Giannoula, Ivan Fernandez, Juan Gómez Luna, Nectarios Koziris, Georgios Goumas, Onur Mutlu, Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2022 [DOI] [code]

Heterogeneous Platform

  • Architecture- and Workload- Aware Heterogeneous Algorithms for Sparse Matrix Vector Multiplication. Sivaramakrishna Bharadwaj Indarapu, Manoj Maramreddy, Kishore Kothapalli, Proceedings of the 7th ACM India Computing Conference (COMPUTE), 2014 [DOI]
  • Heterogeneous Sparse Matrix Computations on Hybrid GPU/CPU Platforms. Valeria Cardellini, Alessandro Fanfarillo, Salvatore Filippone, International Conference on Parallel Computing (ICPC), 2013 [DOI] [pdf]
  • Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs. Wangdong Yang, Kenli Li, Zeyao Mo, Keqin Li, IEEE Transactions on Computers (ITC), 2015 [DOI] [pdf]
  • A Hybrid Computing Method of SpMV on CPU–GPU Heterogeneous Computing Systems. Wangdong Yang, Kenli Li, Keqin Li, Journal of Parallel and Distributed Computing (JPDC), 2017 [DOI] [pdf]
  • Sparse Matrix Partitioning for Optimizing SpMV on CPU-GPU Heterogeneous Platforms. Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi, The International Journal of High Performance Computing Applications (IJHPCA), 2019 [DOI]
  • A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems. Tracy D. Braun, Howard Jay Siegel, Noah Beck, Ladislau L. Bölöni, Muthucumaru Maheswaran, Albert I. Reuther, James P. Robertson, Mitchell D. Theys, Bin Yao, Debra Hensgen, Richard F. Freund, Journal of Parallel and Distributed Computing (JPDC), 2001 [DOI] [pdf]
  • Speculative Segmented Sum for Sparse Matrix-vector Multiplication on Heterogeneous Processors. Weifeng Liu, Brian Vinter, Parallel Computing (PC), 2015 [DOI] [pdf]
  • Efficient Sparse Matrix-Vector Multiplication on x86-Based Many-Core Processors. Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey, Proceedings of the 27th international ACM conference on International conference on supercomputing (ICS), 2013 [DOI [pdf]
  • An Efficient SIMD Compression Format for Sparse Matrix-Vector Multiplication. Xinhai Chen, Peizhen Xie, Lihua Chi, Jie Liu, Chunye Gong, Concurrency and Computation: Practice and Experience (CCPE), 2018 [DOI]
  • Optimizing and Auto-Tuning Scale-Free Sparse Matrix-Vector Multiplication on Intel Xeon Phi. Wai Teng Tang, Ruizhe Zhao, Mian Lu, Yun Liang, Huynh Phung Huyng, Xibai Li, International Symposium on Code Generation and Optimization (CGO), 2015 [DOI] [pdf]
  • CVR: Efficient Vectorization of SpMV on x86 Processors. Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, Lixin Zhang, Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO), 2018 [DOI] [pdf] [code]
  • VBSF: a New Storage Format for SIMD Sparse Matrix–Vector Multiplication on Modern Processors. Yishui Li, Peizhen Xie, Xinhai Chen, Jie Liu, Bo Yang, Shengguo Li, Chunye Gong, Xinbiao Gan, Han Xu, The Journal of Supercomputing (JS), 2020 [DOI]
  • Towards Efficient SpMV on Sunway Manycore Architectures. Changxi Liu, Biwei Xie, Xin Liu, Wei Xue, Hailong Yang, Xu Liu, Proceedings of the 2018 International Conference on Supercomputing (ICS), 2018 [DOI]
  • hpSpMV: A Heterogeneous Parallel Computing Scheme for SpMV on the Sunway TaihuLight Supercomputer. Yuedan Chen, Guoqing Xiao, Zheng Xiao, Wangdong Yang, IEEE International Conference on High Performance Computing and Communications (HPCC), 2019 [DOI] [pdf]
  • tpSpMV: A Two-Phase Large-Scale Sparse Matrix-Vector Multiplication Kernel for Manycore Architectures. Yuedan Chen, Guoqing Xiao, Fan Wu, Zhuo Tang, Keqin Li, Information Sciences (IS), 2020 [DOI] [pdf]
  • ahSpMV: An Autotuning Hybrid Computing Scheme for SpMV on the Sunway Architecture. Guoqing Xiao, Yuedan Chen, Chubo Liu, Xu Zhou, IEEE Internet of Things Journal (IOT), 2020 [DOI] [pdf]
  • CASpMV: A Customized and Accelerative SpMV Framework for the Sunway TaihuLight. Guoqing Xiao, Kenli Li, Yuedan Chen, Wangquan He, Albert Y. Zomaya, Tao Li, IEEE Transactions on Parallel and Distributed Systems (TPDS), 2021 [DOI] [pdf]
  • HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors. Wenxuan Li, Helin Cheng, Zhengyang Lu, Yuechen Lu, Weifeng Liu, IEEE International Conference on Cluster Computing (ISCA), 2023 [DOI] [pdf]

Distributed Platform

  • SMVP Distribution Using Hypergraph Model and S-GBNZ Algorithm. Ichrak Mehrez, Olfa Hamdi-Larbi, International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 [DOI] [pdf]
  • Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication. Hongli Mi, Xiangrui Yu, Xiaosong Yu, Shuangyuan Wu, Weifeng Liu, IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID), 2023 [DOI] [pdf]
  • Scalability of Hybrid SpMV with Hypergraph Partitioning and Vertex Delegation for Communication Avoidance. Brian A. Page, Peter M. Kogge, International Conference on High Performance Computing & Simulation (HPCS), 2020 [DOI] [pdf]
  • HYPE: Massive Hypergraph Partitioning with Neighborhood Expansion. Christian Mayer, Ruben Mayer, Sukanya Bhowmik, Lukas Epple, Kurt Rothermel, IEEE International Conference on Big Data (BigData), 2018 [DOI] [pdf] [code]
  • A Jacobi_PCG Solver for Sparse Linear Systems on Multi-GPU Cluster. Shaozhong Lin, Zhiqiang Xie, The Journal of Supercomputing (JS), 2017 [DOI]
  • Parallel Sparse Linear Solver with GMRES Method Using Minimization Techniques of Communications for GPU Clusters. Lilia Ziane Khodja, Raphaël Couturier, Arnaud Giersch & Jacques M. Bahi, The Journal of Supercomputing (JS), 2014 [DOI]
  • Performance Analysis of Multicore and Multinodal Implementation of SpMV Operation. Beata Bylina, Jarosław Bylina, Przemysław Stpiczyński, Dominik Szałkowski, Federated Conference on Computer Science and Information Systems (FedCSIS), 2014 [DOI] [pdf]
  • Adaptive Runtime Tuning of Parallel Sparse Matrix-Vector Multiplication on Distributed Memory Systems. Seyong Lee, Rudolf Eigenmann, Proceedings of the 22nd annual international conference on Supercomputing (ICS), 2008 [DOI] [pdf]
  • ZAKI: A Smart Method and Tool for Automatic Performance Optimization of Parallel SpMV Computations on Distributed Memory Machines. Sardar Usman, Rashid Mehmood, Iyad Katib, Aiiad Albeshri, Saleh M. Altowaijri, Mobile Networks and Applications (MONET), 2019 [DOI]
  • ZAKI+: A Machine Learning Based Process Mapping Tool for SpMV Computations on Distributed Memory Architectures. Sardar Usman, Rashid Mehmood, Iyad Katib, Aiiad Albeshri, IEEE Access, 2019 [DOI] [pdf]
  • Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices. Wenpeng Ma, Yiwen Hu, Wu Yuan, Xiazhen Liu, Mathematical Problems in Engineering (MPE), 2021 [DOI] [pdf]

Sparse Libraries

  • CUSP
  • cuSPARSE
  • MAGMA
  • GINKGO
  • hiSPARSE
  • MKL

About

A Systematic Survey of Sparse Matrix Vector Multiplication

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages