default search action
48th ISCA 2021: Virtual Event / Valencia, Spain
- 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Virtual Event / Valencia, Spain, June 14-18, 2021. IEEE 2021, ISBN 978-1-6654-3333-4
- Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter C. Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, David A. Patterson:
Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product. 1-14 - Jun-Woo Jang, Sehwan Lee, Dongyoung Kim, Hyunsun Park, Ali Shafiee Ardestani, Yeongjae Choi, Channoh Kim, Yoojin Kim, Hyeongseok Yu, Hamzah Abdel-Aziz, Jun-Seok Park, Heonsoo Lee, Dongwoo Lee, Myeong Woo Kim, Hanwoong Jung, Heewoo Nam, Dongguen Lim, Seungwon Lee, Joon-Ho Song, Suknam Kwon, Joseph Hassoun, Sukhwan Lim, Changkyu Choi:
Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC. 15-28 - Brian W. Thompto, Dung Q. Nguyen, José E. Moreira, Ramon Bertran, Hans M. Jacobson, Richard J. Eickemeyer, Rahul M. Rao, Michael Goulet, Marcy Byers, Christopher J. Gonzalez, Karthik Swaminathan, Nagu R. Dhanwada, Silvia M. Müller, Andreas Wagner, Satish Kumar Sadasivam, Robert K. Montoye, William J. Starke, Christian G. Zoellin, Michael S. Floyd, Jeffrey Stuecheli, Nandhini Chandramoorthy, John-David Wellman, Alper Buyuktosunoglu, Matthias Pflanz, Balaram Sinharoy, Pradip Bose:
Energy Efficiency Boost in the AI-Infused POWER10 Processor. 29-42 - Suk Han Lee, Shinhaeng Kang, Jaehoon Lee, Hyeonsu Kim, Eojin Lee, Seungwoo Seo, Hosang Yoon, Seungwon Lee, Kyounghwan Lim, Hyunsung Shin, Jinhyun Kim, Seongil O, Anand Iyer, David Wang, Kyomin Sohn, Nam Sung Kim:
Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product. 43-56 - Samuel Naffziger, Noah Beck, Thomas Burd, Kevin Lepak, Gabriel H. Loh, Mahesh Subramony, Sean White:
Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen™ Processor Families : Industrial Product. 57-70 - Mainak Chaudhuri:
Zero Inclusion Victim: Isolating Core Caches from Inclusive Last-level Cache Evictions. 71-84 - Georgios Vavouliotis, Lluc Alvarez, Vasileios Karakostas, Konstantinos Nikas, Nectarios Koziris, Daniel A. Jiménez, Marc Casas:
Exploiting Page Table Locality for Agile TLB Prefetching. 85-98 - Alberto Ros, Alexandra Jimborean:
A Cost-Effective Entangling Prefetcher for Instructions. 99-111 - Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, Nam Sung Kim:
Don't Forget the I/O When Allocating Your LLC. 112-125 - Nezam Rohbani, Sina Darabi, Hamid Sarbazi-Azad:
PF-DRAM: A Precharge-Free DRAM Structure. 126-138 - Harini Muthukrishnan, David W. Nellans, Daniel Lustig, Jeffrey A. Fessler, Thomas F. Wenisch:
Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers. 139-152 - Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei Wang, Sanchari Sen, Jintao Zhang, Ankur Agrawal, Monodeep Kar, Shubham Jain, Alberto Mannari, Hoang Tran, Yulong Li, Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, Marcel Schaal, Mauricio J. Serrano, Jungwook Choi, Xiao Sun, Naigang Wang, Chia-Yu Chen, Allison Allain, James Bonanno, Nianzheng Cao, Robert Casatuta, Matthew Cohen, Bruce M. Fleischer, Michael Guillorn, Howard Haynie, Jinwook Jung, Mingu Kang, Kyu-Hyoun Kim, Siyu Koswatta, Sae Kyu Lee, Martin Lutz, Silvia M. Mueller, Jinwook Oh, Ashish Ranjan, Zhibin Ren, Scot Rider, Kerstin Schelm, Michael Scheuermann, Joel Silberman, Jie Yang, Vidhi Zalani, Xin Zhang, Ching Zhou, Matthew M. Ziegler, Vinay Shah, Moriyoshi Ohara, Pong-Fei Lu, Brian W. Curran, Sunil Shukla, Leland Chang, Kailash Gopalakrishnan:
RaPiD: AI Accelerator for Ultra-low Precision Training and Inference. 153-166 - Anant V. Nori, Rahul Bera, Shankar Balachandran, Joydeep Rakshit, Om Ji Omer, Avishaii Abuhatzera, Belliappa Kuttanna, Sreenivas Subramoney:
REDUCT: Keep it Close, Keep it Cool! : Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache Compute. 167-180 - Jiayi Huang, Pritam Majumder, Sungkeun Kim, Abdullah Muzahid, Ki Hwan Yum, Eun Jung Kim:
Communication Algorithm-Architecture Co-Design for Distributed Deep Learning. 181-194 - Ajeya Naithani, Sam Ainsworth, Timothy M. Jones, Lieven Eeckhout:
Vector Runahead. 195-208 - Joao Mario Domingos, Nuno Neves, Nuno Roma, Pedro Tomás:
Unlimited Vector Extension with Data Streaming Support. 209-222 - Peng Sun, Giacomo Gabrielli, Timothy M. Jones:
Speculative Vectorisation with Selective Replay. 223-236 - Weiyi Sun, Zhaoshi Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast. 237-250 - Lingxi Wu, Rasool Sharifi, Marzieh Lenjani, Kevin Skadron, Ashish Venkat:
Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer Matching. 251-264 - Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen Ding:
FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator. 265-278 - Jun Heo, Seung Yul Lee, Sunhong Min, Yeonhong Park, Sungjun Jung, Tae Jun Ham, Jae W. Lee:
BOSS: Bandwidth-Optimized Search Accelerator for Storage-Class Memory. 279-291 - Rohan Basu Roy, Tirthak Patel, Devesh Tiwari:
SATORI: Efficient and Fair Resource Partitioning by Sacrificing Short-Term Benefits for Long-Term Gains*. 292-305 - Mingyu Li, Yubin Xia, Haibo Chen:
Confidential Serverless Made Efficient with Plug-In Enclaves. 306-318 - Chaojie Zhang, Alok Gautam Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit A. Misra, Rod Assis, Kyle Woolcock, Nithish Mahalingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, Ricardo Bianchini:
Flex: High-Availability Datacenters With Zero Reserved Power. 319-332 - AmirAli Abdolrashidi, Hodjat Asghari Esfeden, Ali Jahanshahi, Kaustubh Singh, Nael B. Abu-Ghazaleh, Daniel Wong:
BlockMaestro: Enabling Programmer-Transparent Task-based Execution in GPU Systems. 333-346 - Jose Rodrigo Sanchez Vicarte, Pradyumna Shome, Nandeeka Nayak, Caroline Trippel, Adam Morrison, David Kohlbrenner, Christopher W. Fletcher:
Opening Pandora's Box: A Systematic Study of New Ways Microarchitecture Can Leak Private Data. 347-360 - Xida Ren, Logan Moody, Mohammadkazem Taram, Matthew Jordan, Dean M. Tullsen, Ashish Venkat:
I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op Caches. 361-374 - Divya Ojha, Sandhya Dwarkadas:
TimeCache: Using Time to Eliminate Cache Side Channels when Sharing Software. 375-387 - Arun Subramaniyan, Jack Wadden, Kush Goliya, Nathan Ozog, Xiao Wu, Satish Narayanasamy, David T. Blaauw, Reetuparna Das:
Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees. 388-401 - Matthew Vilim, Alexander Rucker, Kunle Olukotun:
Aurochs: An Architecture for Dataflow Threads. 402-415 - Ye Zhang, Shuo Wang, Xian Zhang, Jiangbin Dong, Xingzhong Mao, Fan Long, Cong Wang, Dong Zhou, Mingyu Gao, Guangyu Sun:
PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture. 416-428 - Ajay Brahmakshatriya, Emily Furst, Victor A. Ying, Claire Hsu, Changwan Hong, Max Ruttenberg, Yunming Zhang, Dai Cheol Jung, Dustin Richmond, Michael B. Taylor, Julian Shun, Mark Oskin, Daniel Sánchez, Saman P. Amarasinghe:
Taming the Zoo: The Unified GraphIt Compiler Framework for Novel Architectures. 429-442 - Chencheng Ye, Yuanchao Xu, Xipeng Shen, Xiaofei Liao, Hai Jin, Yan Solihin:
Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach. 443-455 - Thomas Shull, Ilias Vougioukas, Nikos Nikoleris, Wendy Elsasser, Josep Torrellas:
Execution Dependence Extension (EDE): ISA Support for Eliminating Fences. 456-469 - Yue Zha, Jing Li:
Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA Clusters. 470-483 - Lois Orosa, Yaohua Wang, Mohammad Sadrosadati, Jeremie S. Kim, Minesh Patel, Ivan Puddu, Haocong Luo, Kaveh Razavi, Juan Gómez-Luna, Hasan Hassan, Nika Mansouri-Ghiasi, Saugata Ghose, Onur Mutlu:
CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations. 484-497 - Ziqi Wang, Chul-Hwan Choo, Michael A. Kozuch, Todd C. Mowry, Gennady Pekhimenko, Vivek Seshadri, Dimitrios Skarlatos:
NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM. 498-511 - Siddharth Gupta, Atri Bhattacharyya, Yunho Oh, Abhishek Bhattacharjee, Babak Falsafi, Mathias Payer:
Rebooting Virtual Memory with Midgard. 512-525 - Adarsh Patil, Vijay Nagarajan, Rajeev Balasubramonian, Nicolai Oswald:
Dvé: Improving DRAM Reliability and Performance On-Demand via Coherent Replication. 526-539 - Saeed Rashidi, Matthew Denton, Srinivas Sridharan, Sudarshan Srinivasan, Amoghavarsha Suresh, Jade Nie, Tushar Krishna:
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms. 540-553 - Qijing Huang, Aravind Kalaiah, Minwoo Kang, James Demmel, Grace Dinh, John Wawrzynek, Thomas Norell, Yakun Sophia Shao:
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. 554-566 - Xingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael B. Taylor, Shuaiwen Leon Song:
η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities. 567-580 - Xuhao Chen, Tianhao Huang, Shuotao Xu, Thomas Bourgeat, Chanwoo Chung, Arvind:
FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. 581-594 - Vidushi Dadu, Sihao Liu, Tony Nowatzki:
PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. 595-608 - Mikhail Asiatici, Paolo Ienne:
Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses. 609-622 - Majid Jalili, Ioannis Manousakis, Iñigo Goiri, Pulkit A. Misra, Ashish Raniwala, Husam Alissa, Bharath Ramakrishnan, Phillip Tuma, Christian Belady, Marcus Fontoura, Ricardo Bianchini:
Cost-Efficient Overclocking in Immersion-Cooled Datacenters. 623-636 - Gyu-hyeon Lee, Seongmin Na, Ilkwon Byun, Dongmoon Min, Jangwoo Kim:
CryoGuard: A Near Refresh-Free Robust DRAM Design for Cryogenic Computing. 637-650 - Georgios Tzimpragos, Jennifer Volk, Alex Wynn, James E. Smith, Timothy Sherwood:
Superconducting Computing with Alternating Logic Elements. 651-664 - Harrison Williams, Michael Moukarzel, Matthew Hicks:
Failure Sentinels: Ubiquitous Just-in-time Intermittent Computation via Low-cost Hardware Support for Voltage Monitoring. 665-678 - Hongju Kal, Seokmin Lee, Gun Ko, Won Woo Ro:
SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations. 679-691 - Tae Jun Ham, Yejin Lee, Seong Hoon Seo, Soosung Kim, Hyunji Choi, Sung Jun Jung, Jae W. Lee:
ELSA: Hardware-Software Co-design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks. 692-705 - Yongwei Zhao, Chang Liu, Zidong Du, Qi Guo, Xing Hu, Yimin Zhuang, Zhenxing Zhang, Xinkai Song, Wei Li, Xishan Zhang, Ling Li, Zhiwei Xu, Tianshi Chen:
Cambricon-Q: A Hybrid Architecture for Efficient Training. 706-719 - Liqiang Lu, Naiqing Guan, Yuyue Wang, Liancheng Jia, Zizhang Luo, Jieming Yin, Jason Cong, Yun Liang:
TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation. 720-733 - Tanvir Ahmed Khan, Dexin Zhang, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, Baris Kasikci:
Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications. 734-747 - Da Zhang, Gagandeep Panwar, Jagadish B. Kotra, Nathan DeBardeleben, Sean Blanchard, Xun Jian:
Quantifying Server Memory Frequency Margin and Using It to Improve Performance in HPC Systems. 748-761 - Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Nam Sung Kim, Mahmut Taylan Kandemir, Myoungsoo Jung:
Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution. 762-775 - Xingbin Wang, Boyan Zhao, Rui Hou, Amro Awad, Zhihong Tian, Dan Meng:
NASGuard: A Novel Accelerator Architecture for Robust Neural Architecture Search (NAS) Networks. 776-789 - Xiaohan Ma, Chang Si, Ying Wang, Cheng Liu, Lei Zhang:
NASA: Accelerating Neural Network Design with a NAS Processor. 790-803 - Korakit Seemakhupt, Sihang Liu, Yasas Senevirathne, Muhammad Shahbaz, Samira Manabi Khan:
PMNet: In-Network Data Persistence. 804-817 - Jonathan M. Baker, Andrew Litteken, Casey Duckering, Henry Hoffmann, Hannes Bernien, Frederic T. Chong:
Exploiting Long-Distance Interactions and Tolerating Atom Loss in Neutral Atom Quantum Architectures. 818-831 - Gushu Li, Yunong Shi, Ali Javadi-Abhari:
Software-Hardware Co-Optimization for Computational Chemistry on Superconducting Quantum Processors. 832-845 - Lingling Lao, Prakash Murali, Margaret Martonosi, Dan E. Browne:
Designing Calibration and Expressivity-Efficient Instruction Sets for Quantum Computing. 846-859 - Kyle Shiflett, Avinash Karanth, Razvan C. Bunescu, Ahmed Louri:
Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon Photonics. 860-873 - Moein Ghaniyoun, Kristin Barber, Yinqian Zhang, Radu Teodorescu:
INTROSPECTRE: A Pre-Silicon Framework for Discovery and Analysis of Transient Execution Vulnerabilities. 874-887 - Raghavendra Pradyumna Pothukuchi, Sweta Yamini Pothukuchi, Petros G. Voulgaris, Alexander G. Schwing, Josep Torrellas:
Maya: Using Formal Control to Obfuscate Power Side Channels. 888-901 - George Papadimitriou, Dimitris Gizopoulos:
Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers. 902-915 - Mohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov, Ryan Piersma, Simha Sethumadhavan:
No-FAT: Architectural Support for Low Overhead Memory Safety Checks. 916-929 - Yeonju Ro, Seongwook Jin, Jaehyuk Huh, John Kim:
Ghost Routing to Enable Oblivious Computation on Memory-centric Networks. 930-943 - Ataberk Olgun, Minesh Patel, Abdullah Giray Yaglikçi, Haocong Luo, Jeremie S. Kim, Nisa Bostanci, Nandita Vijaykumar, Oguz Ergin, Onur Mutlu:
QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips. 944-957 - Salvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, Torsten Hoefler:
A RISC-V in-network accelerator for flexible high-performance low-power packet processing. 958-971 - Sankha Baran Dutta, Hoda Naghibijouybari, Nael B. Abu-Ghazaleh, Andres Marquez, Kevin J. Barker:
Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems. 972-984 - Jawad Haj-Yahya, Lois Orosa, Jeremie S. Kim, Juan Gómez-Luna, Abdullah Giray Yaglikçi, Mohammed Alser, Ivan Puddu, Onur Mutlu:
IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors. 985-998 - Mohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov, Simha Sethumadhavan:
ZeRØ: Zero-Overhead Resilient Operation Under Pointer Integrity Attacks. 999-1012 - Zhanhong Tan, Hongyu Cai, Runpei Dong, Kaisheng Ma:
NN-Baton: DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip Accelerators. 1013-1026 - Graham Gobieski, Ahmet Oguz Atli, Kenneth Mai, Brandon Lucia, Nathan Beckmann:
Snafu: An Ultra-Low-Power, Energy-Minimal CGRA-Generation Framework and Architecture. 1027-1040 - Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, Kunle Olukotun:
SARA: Scaling a Reconfigurable Dataflow Accelerator. 1041-1054 - Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang:
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation. 1055-1068 - Yifan Yang, Joel S. Emer, Daniel Sánchez:
SpZip: Architectural Support for Effective Data Compression In Irregular Applications. 1069-1082 - Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng:
Dual-side Sparse Tensor Core. 1083-1095 - Chao-Tsung Huang:
RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging. 1096-1109 - Chunhua Deng, Yang Sui, Siyu Liao, Xuehai Qian, Bo Yuan:
GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network Accelerator. 1110-1123
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.