default search action
35th IPDPS 2021: Portland, OR, USA
- 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE 2021, ISBN 978-1-6654-4066-0
- Ilkay Altintas:
A Tale of Two C's: Convergence and Composability. 1 - Alessio Netti, Daniele Tafani, Michael Ott, Martin Schulz:
Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data. 2-12 - Jinyoung Choi, Sergey Blagodurov, Hung-Wei Tseng:
Dancing in the Dark: Profiling for Tiered Memory. 13-22 - Marcus Ritter, Alexander Geiß, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, Felix Wolf:
Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. 23-34 - Srinivasan Ramesh, Allen D. Malony, Philip H. Carns, Robert B. Ross, Matthieu Dorier, Jérome Soumagne, Shane Snyder:
SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services. 35-45 - Edward Hutter, Edgar Solomonik:
Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths. 46-57 - Thomas B. Rolinger, Christopher D. Krieger, Alan Sussman:
Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture. 58-67 - Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, Guangming Tan:
TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. 68-78 - Qinglei Cao, Yu Pei, Kadir Akbudak, George Bosilca, Hatem Ltaief, David E. Keyes, Jack J. Dongarra:
Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems. 79-89 - Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad:
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale. 90-100 - Weiling Yang, Jianbin Fang, Dezun Dong:
Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures. 101-110 - Alberto Parravicini, Arnaud Delamare, Marco Arnaboldi, Marco D. Santambrogio:
DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime. 111-120 - Zhuoran Ji, Cho-Li Wang:
CTXBack: Enabling Low Latency GPU Context Switching via Context Flashback. 121-130 - Nelson Mimura Gonzalez, Tonia Elengikal:
Transparent I/O-Aware GPU Virtualization for Efficient Resource Consolidation. 131-140 - Tyler N. Allen, Rong Ge:
Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis. 141-150 - Minjia Zhang, Zehua Hu, Mingqin Li:
DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture. 151-161 - Suzhen Wu, Chunfeng Du, Haijun Li, Hong Jiang, Zhirong Shen, Bo Mao:
CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low Latency Flash-based SSDs. 162-171 - Shashank Gugnani, Tianxi Li, Xiaoyi Lu:
NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-Fabrics. 172-181 - Qinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John:
Virtual-Link: A Scalable Multi-Producer Multi-Consumer Message Queue Architecture for Cross-Core Communication. 182-191 - Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi:
High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers. 192-202 - Ryan E. Grant, Michael J. Levenhagen, Matthew G. F. Dosanjh, Patrick M. Widener:
RVMA: Remote Virtual Memory Access. 203-212 - Michael S. Gilbert, Seher Acer, Erik G. Boman, Kamesh Madduri, Sivasankaran Rajamanickam:
Performance-Portable Graph Coarsening for Efficient Multilevel Graph Analysis. 213-222 - John Augustine, Kishore Kothapalli, Gopal Pandurangan:
Efficient Distributed Algorithms in the k-machine model via PRAM Simulations. 223-232 - Adam Polak, Adrian Siwiec, Michal Stobierski:
Euler Meets GPU: Practical Graph Algorithms with Theoretical Guarantees. 233-244 - Kiran Kumar Matam, Hanieh Hashemi, Murali Annavaram:
MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash Storage. 245-255 - Md. Khaledur Rahman, Majedul Haque Sujon, Ariful Azad:
FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. 256-266 - Anwesha Das, Frank Mueller, Barry Rountree:
Systemic Assessment of Node Failures in HPC Production Platforms. 267-276 - Masoud Gholami, Florian Schintke:
Combining XOR and Partner Checkpointing for Resilient Multilevel Checkpoint/Restart. 277-288 - Fernando Fernandes dos Santos, Siva Kumar Sastry Hari, Pedro Martins Basso, Luigi Carro, Paolo Rech:
Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and Profiling. 289-298 - Alvaro Frank, Manuel Baumgartner, Reza Salkhordeh, André Brinkmann:
Improving checkpointing intervals by considering individual job failure probabilities. 299-309 - Nicholas Gordon, John R. Lange:
Covirt: Lightweight Fault Isolation and Resource Protection for Co-Kernels. 310-319 - Daniel Curtis Wilson, Siddhartha Jana, Aniruddha Marathe, Stephanie Brink, Christopher M. Cantalupo, Diana R. Guttman, Brad Geltz, Lowren H. Lawson, Asma H. Al-Rawi, Ali Mohammad, Fuat Keceli, Federico Ardanaz, Jonathan M. Eastep, Ayse K. Coskun:
Introducing Application Awareness Into a Unified Power Management Stack. 320-329 - Jinsu Park, Seongbeom Park, Myeonggyun Han, Woongki Baek:
PALM: Progress- and Locality-Aware Adaptive Task Migration for Efficient Thread Packing. 330-339 - Sudheer Chunduri, Kevin Harms, Taylor L. Groves, Peter Mendygral, Justs Zarins, Michèle Weiland, Yasaman Ghadar:
Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems. 340-349 - Thaleia Dimitra Doudali, Daniel Zahka, Ada Gavrilovska:
Cori: Dancing to the Right Beat of Periodic Data Movements over Hybrid Memory Systems. 350-359 - Florian Schmaus, Nicolas Pfeiffer, Wolfgang Schröder-Preikschat, Timo Hönig, Jörg Nolte:
Nowa: A Wait-Free Continuation-Stealing Concurrency Platform. 360-371 - Mehran Sadeghi Lahijani, Abu Naser, Cong Wu, Mohsen Gavahi, Viet Tung Hoang, Zhi Wang, Xin Yuan:
Efficient Algorithms for Encrypted All-gather Operation. 372-381 - Otávio Augusto de Oliviera Souza, Olga Goussevskaia, Stefan Schmid:
CBNet: Minimizing Adjustments in Concurrent Demand-Aware Tree Networks. 382-391 - Yang Xia, Peng Jiang, Gagan Agrawal, Rajiv Ramnath:
Scaling Sparse Matrix Multiplication on CPU-GPU Nodes. 392-401 - Huizhang Luo, Junqi Wang, Qing Liu, Jieyang Chen, Scott Klasky, Norbert Podhorszki:
zMesh: Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement. 402-411 - Linjian Ma, Edgar Solomonik:
Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree. 412-421 - Lorena A. Barba:
12 Ways to Fool the Masses with Irreproducible Results. 422 - Karl Bäckström, Ivan Walulya, Marina Papatriantafilou, Philippas Tsigas:
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence. 423-432 - Xinyuan Li, Huang Ye, Jian Zhang:
Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations. 433-443 - Q. Zhou, C. Chu, N. S. Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni, Dhabaleswar K. Panda:
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters*. 444-453 - Xi Wang, John D. Leidel, Brody Williams, Alan Ehret, Miguel Mark, Michel A. Kinsy, Yong Chen:
xBGAS: A Global Address Space Extension on RISC-V for High Performance Computing. 454-463 - Lechen Yu, Joachim Protze, Oscar R. Hernandez, Vivek Sarkar:
ARBALEST: Dynamic Detection of Data Mapping Issues in Heterogeneous OpenMP Applications. 464-474 - Jan Hückelheim, Johannes Doerfert:
Spray: Sparse Reductions of Arrays in OPENMP. 475-484 - Larisa Stoltzfus, Brian Hamilton, Michel Steuwer, Lu Li, Christophe Dubach:
Code Generation for Room Acoustics Simulations with Complex Boundary Conditions. 485-496 - George Bisbas, Fabio Luporini, Mathias Louboutin, Rhodri Nelson, Gerard J. Gorman, Paul H. J. Kelly:
Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources. 497-506 - Louis Pisha, Lukasz Ligowski:
Accelerating non-power-of-2 size Fourier transforms with GPU Tensor Cores. 507-516 - Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine A. Yelick, Aydin Buluç:
Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly. 517-526 - Israt Nisa, Prashant Pandey, Marquita Ellis, Leonid Oliker, Aydin Buluç, Katherine A. Yelick:
Distributed-Memory k-mer Counting on GPUs. 527-536 - Thomas Hérault, Yves Robert, George Bosilca, Robert J. Harrison, Cannada A. Lewis, Edward F. Valeev, Jack J. Dongarra:
Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure. 537-546 - Will Usher, Xuan Huang, Steve Petruzza, Sidharth Kumar, Stuart R. Slattery, Samuel Temple Reeve, Feng Wang, Chris R. Johnson, Valerio Pascucci:
Adaptive Spatially Aware I/O for Multiresolution Particle Data Layouts. 547-556 - Bing Xie, Zilong Tan, Philip H. Carns, Jeffrey S. Chase, Kevin Harms, Jay F. Lofstead, Sarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang:
Interpreting Write Performance of Supercomputer I/O Systems with Regression Models. 557-566 - Jiwoo Bang, Chungyong Kim, Sunggon Kim, Qichen Chen, Cheongjun Lee, Eun-Kyu Byun, Jaehwan Lee, Hyeonsang Eom:
Finer-LRU: A Scalable Page Management Scheme for HPC Manycore Architectures. 567-576 - Jean Luca Bez, Alberto Miranda, Ramon Nou, Francieli Zanon Boito, Toni Cortes, Philippe O. A. Navaux:
Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms. 577-586 - Aaron Handleman, Arthur G. Rattew, I-Ting Angelina Lee, Tao B. Schardl:
A Hybrid Scheduling Scheme for Parallel Loops. 587-598 - Hao Lan, Li Chen, Baochun Li:
EAGLE: Expedited Device Placement with Automatic Grouping for Large Models. 599-608 - Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, Minyi Guo:
BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models. 609-618 - Yuke Wang, Boyuan Feng, Yufei Ding:
DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions. 619-628 - Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen:
SUPER: SUb-Graph Parallelism for TransformERs. 629-638 - Dustin Machi, Parantapa Bhattacharya, Stefan Hoops, Jiangzhuo Chen, Henning S. Mortveit, Srinivasan Venkatramanan, Bryan L. Lewis, Mandy L. Wilson, Arindam Fadikar, Tom Maiden, Christopher L. Barrett, Madhav V. Marathe:
Scalable Epidemiological Workflows to Support COVID-19 Planning and Response. 639-650 - Yubo Qin, Ivan Rodero, Manish Parashar:
Facilitating Data Discovery for Large-scale Science Facilities using Knowledge Networks. 651-660 - Laércio Lima Pilla:
Optimal Task Assignment for Heterogeneous Federated Learning Devices. 661-670 - Zhipin Gu, Yuexiang Yang:
Detecting Malicious Model Updates from Federated Learning on Conditional Variational Autoencoder. 671-680 - Guy E. Blelloch:
Is Asymptotic Cost Analysis Useful in Developing Practical Parallel Algorithms. 681 - Jason Cong:
From Parallelization to Customization - Challenges and Opportunities. 682 - Yongseok Soh, Patrick Flick, Xing Liu, Shaden Smith, Fabio Checconi, Fabrizio Petrini, Jee W. Choi:
High Performance Streaming Tensor Decomposition. 683-692 - Le Li, Shigeyuki Sato, Qiheng Liu, Kenjiro Taura:
Plex: Scaling Parallel Lexing with Backtrack-Free Prescanning. 693-702 - Daniel Mlakar, Martin Winter, Mathias Parger, Markus Steinberger:
Speculative Parallel Reverse Cuthill-McKee Reordering on Multi- and Many-core Architectures. 703-713 - Brendan L. West, Jeffrey A. Fessler, Thomas F. Wenisch:
Jigsaw: A Slice-and-Dice Approach to Non-uniform FFT Acceleration for MRI Image Reconstruction. 714-723 - Bo Peng, Jiayu Li, Selahattin Akkas, Takuya Araki, Ohno Yoshiyuki, Judy Qiu:
Rank Position Forecasting in Car Racing. 724-733 - Yuan Xu, Tianwei Zhang, Jimin Han, Sa Wang, Yungang Bao:
Towards Practical Cloud Offloading for Low-cost Ground Vehicle Workloads. 734-745 - Loïck Bonniot, Christoph Neumann, François Taïani:
Towards Internet-Scale Convolutional Root-Cause Analysis with DIAGNET. 746-755 - Jananie Jarachanthan, Li Chen, Fei Xu, Bo Li:
Astra: Autonomous Serverless Analytics with Cost-Efficiency and QoS-Awareness. 756-765 - Anne Benoit, Redouane Elghazi, Yves Robert:
Max-Stretch Minimization on an Edge-Cloud Platform. 766-775 - Janick Edinger, Martin Breitbach, Niklas Gabrisch, Dominik Schäfer, Christian Becker, Amr Rizk:
Decentralized Low-Latency Task Scheduling for Ad-Hoc Computing. 776-785 - Tim Shaffer, Zhuozhao Li, Ben Tovar, Yadu N. Babuji, T. J. Dasso, Zoe Surma, Kyle Chard, Ian T. Foster, Douglas Thain:
Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications. 786-796 - Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Shaolei Ren, Jingwen Leng, Quan Chen, Minyi Guo:
AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph. 797-806 - Yuping Fan, Zhiling Lan, J. Taylor Childers, Paul Rich, William E. Allcock, Michael E. Papka:
Deep Reinforcement Agent for Scheduling in HPC. 807-816 - Bin Xu, Jianzhong Huang, Qiang Cao, Xiao Qin, Ping Xie:
F-Write: Fast RDMA-supported Writes in Erasure-coded In-memory Clusters. 817-826 - Sijie Wu, Hanhua Chen, Yonghui Wang, Hai Jin:
Argus: Efficient Job Scheduling in RDMA-assisted Big Data Processing. 827-836 - Sajal Dash, Qais Al-Hajri, Wu-chun Feng, Harold R. Garner, Ramu Anandakrishnan:
Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs. 837-846 - Zihao Wang, Xiaohua Wan, Zhiyong Liu, Qianshuo Fan, Fa Zhang, Guangming Tan:
A Multi-GPU Design for Large Size Cryo-EM 3D Reconstruction. 847-858 - Jieyang Chen, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd S. Munson, Ian T. Foster, Scott Klasky:
Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs. 859-868 - Long Qu, Loris Lucido, Marie Bonnasse-Gahot, Pascal Vezolle, Diego Klahr:
Extremely Fast and Energy Efficient One-way Wave Equation Migration on GPU-based heterogeneous architecture. 869-880 - Jiannan Tian, Cody Rivera, Sheng Di, Jieyang Chen, Xin Liang, Dingwen Tao, Franck Cappello:
Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures. 881-891 - Zhehan Lin, Hanchen Guo, Chentao Wu, Jie Li, Guangtao Xue, Minyi Guo:
Rack-Scaling: An efficient rack-based redistribution method to accelerate the scaling of cloud disk arrays. 892-901 - Xiaoyi Zhang, Feng Zhu, Shu Li, Kun Wang, Wei Xu, Dengcai Xu:
Optimizing Performance for Open-Channel SSDs in Cloud Storage System. 902-911 - Liang Zhang, Wenli Zheng, Chao Li, Yao Shen, Minyi Guo:
AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-Scaling. 912-921 - Kishori M. Konwar, Wyatt Lloyd, Haonan Lu, Nancy A. Lynch:
SNOW Revisited: Understanding When Ideal READ Transactions Are Possible. 922-931 - Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, Xin Peng, Wenli Zheng, Minyi Guo:
QoS-Aware and Resource Efficient Microservice Deployment in Cloud-Edge Continuum. 932-941 - Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr.:
Byzantine Dispersion on Graphs. 942-951 - Pankaj Khanchandani, Roger Wattenhofer:
Byzantine Agreement with Unknown Participants and Failures. 952-961 - Abdullah T. Mughrabi, Mohannad Ibrahim, Gregory T. Byrd:
QPR: Quantizing PageRank with Coherent Shared Memory Accelerators. 962-972 - Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi:
Distributed Training of Embeddings using Graph Analytics. 973-983 - Joseph Renzullo, Westley Weimer, Stephanie Forrest:
Multiplicative Weights Algorithms for Parallel Automated Software Repair. 984-993 - Yun-Yong Ko, Kibong Choi, Jiwon Seo, Sang-Wook Kim:
An In-Depth Analysis of Distributed Training of Deep Neural Networks. 994-1003 - Masahiro Tanaka, Kenjiro Taura, Toshihiro Hanawa, Kentaro Torisawa:
Automatic Graph Partitioning for Very Large-scale Deep Learning. 1004-1013 - Eric Qin, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das, Gordon Euhyun Moon, Sivasankaran Rajamanickam, Tushar Krishna:
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats. 1014-1024 - Venmugil Elango:
Pase: Parallelization Strategies for Efficient DNN Training. 1025-1034 - Horng-Ruey Huang, Ding-Yong Hong, Jan-Jan Wu, Pangfeng Liu, Wei-Chung Hsu:
Efficient Video Captioning on Heterogeneous System Architectures. 1035-1045 - George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Dilip Vasudevan, Anastasiia Butko:
SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC. 1046-1055 - Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka:
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? 1056-1065 - Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert:
Performance Analysis of Scientific Computing Workloads on General Purpose TEEs. 1066-1076 - Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis:
High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection. 1077-1086 - Kamalakkannan Kamalavasan, Gihan R. Mudalige, István Z. Reguly, Suhaib A. Fahmy:
High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers. 1087-1096
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.