{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T14:42:58Z","timestamp":1775054578448,"version":"3.50.1"},"reference-count":26,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2019,5,27]],"date-time":"2019-05-27T00:00:00Z","timestamp":1558915200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61801469"],"award-info":[{"award-number":["61801469"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Convolutional neural networks (CNNs) have achieved great success in image processing. However, the heavy computational burden it imposes makes it difficult for use in embedded applications that have limited power consumption and performance. Although there are many fast convolution algorithms that can reduce the computational complexity, they increase the difficulty of practical implementation. To overcome these difficulties, this paper proposes several convolution accelerator designs using fast algorithms. The designs are based on the field programmable gate array (FPGA) and display a better balance between the digital signal processor (DSP) and the logic resource, while also requiring lower power consumption. The implementation results show that the power consumption of the accelerator design based on the Strassen\u2013Winograd algorithm is 21.3% less than that of conventional accelerators.<\/jats:p>","DOI":"10.3390\/a12050112","type":"journal-article","created":{"date-parts":[[2019,5,27]],"date-time":"2019-05-27T11:19:27Z","timestamp":1558955967000},"page":"112","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Convolution Accelerator Designs Using Fast Algorithms"],"prefix":"10.3390","volume":"12","author":[{"given":"Yulin","family":"Zhao","sequence":"first","affiliation":[{"name":"Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Information Technology for Autonomous Underwater Vehicles, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Donghui","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Information Technology for Autonomous Underwater Vehicles, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0563-9079","authenticated-orcid":false,"given":"Leiou","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Information Technology for Autonomous Underwater Vehicles, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,27]]},"reference":[{"key":"ref_1","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv, Available online: https:\/\/arxiv.org\/pdf\/1409.1556.pdf."},{"key":"ref_2","unstructured":"Szegedy, C., Liu, W., and Jia, Y. (2014). Going deeper with convolutions. arXiv, Available online: https:\/\/arxiv.org\/pdf\/1409.4842.pdf."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"11215","DOI":"10.1109\/ACCESS.2018.2798799","article-title":"Exploiting Convolutional Neural Networks With Deeply Local Description For Remote Sensing Image Classification","volume":"6","author":"Liu","year":"2018","journal-title":"IEEE Access"},{"key":"ref_4","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet classification with deep convolutional neural networks. Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_5","unstructured":"Le, N.M., Granger, E., and Kiran, M. (December, January 28). A comparison of CNN-based face and head detectors for real-time video surveillance applications. Proceedings of the Seventh International Conference on Image Processing Theory, Tools and Applications, Montreal, QC, Canada."},{"key":"ref_6","unstructured":"Ren, S., He, K., and Girshick, R. (2014, January 8\u201313). Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_7","unstructured":"Denil, M., Shakibi, B., and Dinh, L. (2013, January 5\u201310). Predicting Parameters in Deep Learning. Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_8","unstructured":"Han, S., Pool, J., and Tran, J. (2015, January 9\u201312). Learning Both Weights and Connections for Efficient Neural Networks. Proceedings of the International Conference on Neural Information Processing Systems, Istanbul, Turkey."},{"key":"ref_9","unstructured":"Guo, Y., Yao, A., and Chen, Y. (2016). Dynamic Network Surgery for Efficient DNNs. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_10","unstructured":"Colangelo, P., Nasiri, N., Nurvitadhi, E., Mishra, A., Margala, M., and Nealis, K. (May, January 29). Exploration of Low Numeric Precision Deep Learning Inference Using Intel\u00ae FPGAs. Proceedings of the IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines, Boulder, CO, USA."},{"key":"ref_11","unstructured":"Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. (2015, January 7\u20139). Deep Learning with Limited Numerical Precision. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Rastegari, M., Ordonez, V., and Redmon, J. (2016, January 11\u201314). XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"ref_13","unstructured":"Zhu, C., Han, S., and Mao, H. (2017, January 24\u201326). Trained Ternary Quantization. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_14","unstructured":"Vasilache, N., Johnson, J., and Mathieu, M. (2015, January 7\u20139). Fast convolutional nets with fbfft: A GPU performance evaluation. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Qiu, J., Wang, J., and Yao, S. (2016, January 21\u201323). Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays ACM, Monterey, CA, USA.","DOI":"10.1145\/2847263.2847265"},{"key":"ref_16","unstructured":"Courbariaux, M., Bengio, Y., and David, J. (2015). Training deep neural networks with low precision multiplications. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.neucom.2017.03.044","article-title":"High accuracy FPGA activation function implementation for neural networks","volume":"247","author":"Hajduk","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_18","first-page":"1415","article-title":"A CNN Accelerator on FPGA Using Depthwise Separable Convolution","volume":"65","author":"Bai","year":"2018","journal-title":"IEEE Trans. Circuits Syst. II Express Briefs"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cong, J., and Xiao, B. (2014, January 15\u201319). Minimizing Computation in Convolutional Neural Networks. Proceedings of the Artificial Neural Networks and Machine Learning\u2014ICANN 2014, Hamburg, Germany.","DOI":"10.1007\/978-3-319-11179-7_36"},{"key":"ref_20","unstructured":"Lavin, A., and Gray, S. (July, January 26). Fast Algorithms for Convolutional Neural Networks. Proceedings of the Computer Vision and Pattern Recognition, Caesars Palace, NV, USA."},{"key":"ref_21","unstructured":"Lu, L., Liang, Y., Xiao, Q., and Yan, S. (June, January 30). Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, Napa, CA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Wang, D., Wang, L., and Liu, P. (2018). A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks. Algorithms, 11.","DOI":"10.3390\/a11100159"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1016\/S0747-7171(08)80013-2","article-title":"Matrix multiplication via arithmetic progressions","volume":"9","author":"Coppersmith","year":"1990","journal-title":"J. Symb. Comput."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lai, P., Arafat, H., Elango, V., and Sadayappan, P. (2013, January 18\u201321). Accelerating Strassen-Winograd\u2019s matrix multiplication algorithm on GPUs. Proceedings of the International Conference on High Performance Computing, data, and analytics, Karnataka, India.","DOI":"10.1109\/HiPC.2013.6799109"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1007\/BF02165411","article-title":"Gaussian elimination is not optimal","volume":"13","author":"Strassen","year":"1969","journal-title":"Numer. Math."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Winograd, S. (1980). Arithmetic Complexity of Computations, SIAM.","DOI":"10.1137\/1.9781611970364"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/12\/5\/112\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:55:23Z","timestamp":1760187323000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/12\/5\/112"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,27]]},"references-count":26,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2019,5]]}},"alternative-id":["a12050112"],"URL":"https:\/\/doi.org\/10.3390\/a12050112","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,5,27]]}}}