default search action
ICS 2021: Virtual Event, USA
- Huiyang Zhou, Jose Moreira, Frank Mueller, Yoav Etsion:
ICS '21: 2021 International Conference on Supercomputing, Virtual Event, USA, June 14-17, 2021. ACM 2021, ISBN 978-1-4503-8335-6
Loop optimizations
- Brandon Neth, Thomas R. W. Scogland, Bronis R. de Supinski, Michelle Mills Strout:
Inter-loop optimization in RAJA using loop chains. 1-12 - Khaled Abdelaal, Martin Kong:
Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation. 13-26 - Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, Uday Bondhugula:
A practical tile size selection model for affine loop nests. 27-39
Program analysis and benchmarking
- Wenwen Wang, Pei-Hung Lin:
Does it matter?: OMPSanitizer: an impact analyzer of reported data races in OpenMP programs. 40-51 - Xin Zhao, Jin Zhou, Hui Guan, Wei Wang, Xu Liu, Tongping Liu:
NumaPerf: predictive NUMA profiling. 52-62 - Alexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler:
NPBench: a benchmarking suite for high-performance NumPy. 63-74 - Xiaofan Sun, Rajiv Gupta:
DSGEN: concolic testing GPU implementations of concurrent dynamic data structures. 75-87
Managing parallelism
- Seonmyeong Bak, Oscar R. Hernandez, Mark Gates, Piotr Luszczek, Vivek Sarkar:
Task-graph scheduling extensions for efficient synchronization and communication. 88-101 - Amirhossein Mirhosseini, Thomas F. Wenisch:
μSteal: a theory-backed framework for preemptive work and resource stealing in mixed-criticality microservices. 102-114 - Hongshi Tan, Xinyu Chen, Yao Chen, Bingsheng He, Weng-Fai Wong:
ThundeRiNG: generating multiple independent random number sequences on FPGAs. 115-126
Resilience and security
- Yujia Zhai, Elisabeth Giem, Quan Fan, Kai Zhao, Jinyang Liu, Zizhong Chen:
FT-BLAS: a high performance BLAS implementation with online fault tolerance. 127-138 - Shougang Yuan, Yan Solihin, Huiyang Zhou:
PSSM: achieving secure memory for GPUs with partitioned and sectored security metadata. 139-151
New architectures for HPC
- Yaoyang Zhou, Zihao Yu, Chuanqi Zhang, Yinan Xu, Huizhe Wang, Sa Wang, Ninghui Sun, Yungang Bao:
Omegaflow: a high-performance dependency-based architecture. 152-163 - Adrián Barredo, Adrià Armejach, Jonathan C. Beard, Miquel Moretó:
PLANAR: a programmable accelerator for near-memory data rearrangement. 164-176 - Markos Kynigos, Jose Antonio Pascual, Javier Navaridas, John Goodacre, Mikel Luján:
Power and energy efficient routing for Mach-Zehnder interferometer based photonic switches. 177-189
Exploiting non-volatile memory
- Jiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li:
Athena: high-performance sparse tensor contraction sequence on heterogeneous memory. 190-202 - Jie Ren, Jiaolin Luo, Ivy Bo Peng, Kai Wu, Dong Li:
Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchy. 203-214 - Zhen Xie, Wenqian Dong, Jie Liu, Ivy Bo Peng, Yanbao Ma, Dong Li:
MD-HM: memoization-based molecular dynamics simulations on big memory system. 215-226
Machine learning
- Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li:
Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators. 227-241 - Yuliana Zamora, Logan T. Ward, Ganesh Sivaraman, Ian T. Foster, Henry Hoffmann:
Proxima: accelerating the integration of machine learning in atomistic simulations. 242-253 - Gunduz Vehbi Demirci, Hakan Ferhatosmanoglu:
Partitioning sparse deep neural networks for scalable training and inference. 254-265 - Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao:
ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. 266-278 - Rohan Baskar Prabhakar, Sachit Kuhar, Rohit Agrawal, Christopher J. Hughes, Christopher W. Fletcher:
SumMerge: an efficient algorithm and implementation for weight repetition-aware DNN inference. 279-290 - MohammadHossein Olyaiy, Christopher Ng, Mieszko Lis:
Accelerating DNNs inference with predictive layer fusion. 291-303 - Siling Yang, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin, Xian-He Sun:
AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator. 304-315
Data locality and vectorization
- Peng Chen, Mohamed Wahib, Xiao Wang, Shin'ichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka:
Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations. 316-328 - Doru-Thom Popovici, Andrew Canning, Zhengji Zhao, Lin-Wang Wang, John Shalf:
A systematic approach to improving data locality across Fourier transforms and linear algebra operations. 329-341
Algorithms adapting to high-performance networks
- Archit Patke, Saurabh Jha, Haoran Qiu, Jim M. Brandt, Ann C. Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer:
Delay sensitivity-driven congestion mitigation for HPC systems. 342-353 - Xiaodong Yu, Tekin Bicer, Rajkumar Kettimuthu, Ian T. Foster:
Topology-aware optimizations for multi-GPU ptychographic image reconstruction. 354-366
Graph data structures and algorithms
- Xuan Huang, Pavol Klacansky, Steve Petruzza, Attila Gyulassy, Peer-Timo Bremer, Valerio Pascucci:
Distributed merge forest: a new fast and scalable approach for topological analysis at scale. 367-377 - Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali:
Sandslash: a two-level framework for efficient graph pattern mining. 378-391
Parallelization constrained by data dependencies
- Akshay Bhosale, Rudolf Eigenmann:
On the automatic parallelization of subscripted subscript patterns using array property analysis. 392-403 - Ahmed E. Helal, Jan Laukemann, Fabio Checconi, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, Jeewhan Choi:
ALTO: adaptive linearized storage of sparse tensors. 404-416 - Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, Depei Qian:
An optimized tensor completion library for multiple GPUs. 417-430 - Oguz Selvitopi, Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine A. Yelick, Aydin Buluç:
Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication. 431-442
Best paper candidates
- Chen Zhang, Zeyu Song, Haojie Wang, Kaiyuan Rong, Jidong Zhai:
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU. 443-454 - Thomas Randall, Tyler N. Allen, Rong Ge:
FULL-W2V: fully exploiting data reuse for W2V on GPU-accelerated systems. 455-466 - Nader Al Awar, Steven Zhu, George Biros, Milos Gligoric:
A performance portability framework for Python. 467-478 - Mazen Al-Wadi, Aziz Mohaisen, Amro Awad:
ProMT: optimizing integrity tree updates for write-intensive pages in secure NVMs. 479-490
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.