default search action
52nd ISCA 2025: Tokyo, Japan
- Proceedings of the 52nd Annual International Symposium on Computer Architecture, ISCA 2025, Tokyo, Japan, June 21-25, 2025. ACM 2025, ISBN 979-8-4007-1261-6
Session 1A: ML Accelerators I
- Zheng Xu
, Dehao Kong, Jiaxin Liu, Jinxi Li, Jingxiang Hou
, Xu Dai
, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin:
WSC-LLM: Efficient LLM Service and Architecture Co-exploration for Wafer-scale Chips. 1-17 - Liang Liu
, Sadra Rahimi Kari
, Xin Xin
, Nathan Youngblood
, Youtao Zhang
, Jun Yang
:
LightML: A Photonic Accelerator for Efficient General Purpose Machine Learning. 18-33 - Saeed Rashidi
, William Won
, Sudarshan Srinivasan
, Puneet Gupta
, Tushar Krishna
:
FRED: A Wafer-scale Fabric for 3D Parallel DNN Training. 34-48 - Qize Yang
, Taiquan Wei
, Sihan Guan
, Chengran Li
, Haoran Shang
, Jinyi Deng
, Huizheng Wang
, Chao Li
, Lei Wang
, Yan Zhang
, Shouyi Yin
, Yang Hu
:
PD Constraint-aware Physical/Logical Topology Co-Design for Network on Wafer. 49-64
Session 1B: Crypto & Fully Homomorphic Encryption
- Tianwei Pan
, Tianao Dai
, Jianlei Yang
, Hongbin Jing
, Yang Su
, Zeyu Hao
, Xiaotao Jia
, Chunming Hu
, Weisheng Zhao
:
Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design. 65-77 - Ali Hajiabadi
, Trevor E. Carlson
:
Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs. 78-91 - Shengyu Fan
, Xianglong Deng
, Liang Kong
, Guiming Shi
, Guang Fan
, Dan Meng
, Rui Hou
, Mingzhe Zhang
:
FAST: An FHE Accelerator for Scalable-parallelism with Tunable-bit. 92-106 - Dian Jiao
, Xianglong Deng
, Zhiwei Wang
, Shengyu Fan
, Yi Chen
, Dan Meng
, Rui Hou
, Mingzhe Zhang
:
Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core. 107-121
Session 1C: GPUs & Ray Tracing
- Yuan Feng
, Yuke Li
, Jiwon Lee
, Won Woo Ro
, Hyeran Jeon
:
Heliostat: Harnessing Ray Tracing Accelerators for Page Table Walks. 122-136 - Mao Lin
, Yuan Feng
, Guilherme Cox
, Hyeran Jeon
:
Forest: Access-aware GPU UVM Management. 137-152 - Minseong Gil
, Dongho Ha
, Simla Burcu Harma
, Myung Kuk Yoon
, Babak Falsafi
, Won Woo Ro
, Yunho Oh
:
Avant-Garde: Empowering GPUs with Scaled Numeric Formats. 153-165 - Yavuz Selim Tozlu
, Huiyang Zhou
:
CoopRT: Accelerating BVH Traversal for Ray Tracing via Cooperative Threads. 166-179
Session 2A: Best Paper Nominees
- Zhewen Pan
, Joshua San Miguel
:
The XOR Cache: A Catalyst for Compression. 180-193 - Cong Li
, Yihan Yin
, Xintong Wu
, Jingchen Zhu
, Zhutianya Gao
, Dimin Niu
, Qiang Wu
, Xin Si
, Yuan Xie
, Chen Zhang
, Guangyu Sun
:
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference. 194-210 - Ben Simner
, Alasdair Armstrong
, Thomas Bauereiss
, Brian Campbell
, Ohad Kammar
, Jean Pichon-Pharabod
, Peter Sewell
:
Precise exceptions in relaxed architectures. 211-224 - Gan Fang
, Jianping Zeng
, Aditya Gupta
, Changhee Jung
:
Rethinking Prefetching for Intermittent Computing. 225-240
Session 3A: Quantum I
- Yuchen Zhu
, Jinglei Cheng
, Boxi Li
, Kecheng Liu
, Yidong Zhou
, Hanrui Wang
, Yufei Ding
, Zhiding Liang
:
Hardware-aware Calibration Protocol for Quantum Computers. 241-256 - Christopher A. Pattison
, Gefen Baranes
, Juan Pablo Bonilla Ataides
, Mikhail D. Lukin
, Hengyun Zhou
:
Constant-Rate Entanglement Distillation for Fast Quantum Interconnects. 257-270 - Chenghong Zhu
, Xian Wu
, Jingbo Wang
, Xin Wang
:
S-SYNC: Shuttle and Swap Co-Optimization in Quantum Charge-Coupled Devices. 271-284 - Wuwei Tian
, Liqiang Lu
, Siwei Tan
, Yun Liang
, Tingting Li
, Kaiwen Zhou
, Xinghui Jia
, Jianwei Yin
:
ARTERY: Fast Quantum Feedback using Branch Prediction. 285-298 - Chenning Tao
, Liqiang Lu
, Size Zheng
, Li-Wen Chang
, Minghua Shen
, Hanyu Zhang
, Fangxin Liu
, Kaiwen Zhou
, Jianwei Yin
:
Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing. 299-312
Session 3B: Domain Specific Accelerators I
- Justin Ting
, Minsik Kim
, Junkang Zhu
, Haotian Sheng
, Zhengya Zhang
:
HiPER: Hierarchically-Composed Processing for Efficient Robot Learning-Based Control. 313-326 - Yiyang Huang, Yuhui Hao
, Bo Yu
, Feng Yan, Yuxin Yang
, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan:
Dadu-Corki: Algorithm-Architecture Co-Design for Embodied AI-powered Robotic Manipulation. 327-343 - Haiyu Wang
, Wenxuan Liu
, Kenneth Chen
, Qi Sun
, Sai Qian Zhang
:
Process Only Where You Look: Hardware and Algorithm Co-optimization for Efficient Gaze-Tracked Foveated Rendering in Virtual Reality. 344-358 - Hongrui Zhang
, Yunan Zhang
, Hung-Wei Tseng
:
RTSpMSpM: Harnessing Ray Tracing for Efficient Sparse Matrix Computations. 359-373 - Yen-Chieh Huang
, Chen-Pin Yang
, Tsung Tai Yeh
:
AQB8: Energy-Efficient Ray Tracing Accelerator through Multi-Level Quantization. 374-387
Session 3C: Storage
- Ryan Wong
, Nikita Kim
, Aniket Das
, Kevin Higgs
, Engin Ipek
, Sapan Agarwal
, Saugata Ghose
, Ben Feinberg
:
ANVIL: An In-Storage Accelerator for Name-Value Data Stores. 388-404 - Xinyue Yi
, Hongchao Du
, Yu Wang
, Jie Zhang
, Qiao Li
, Chun Jason Xue
:
ArtMem: Adaptive Migration in Reinforcement Learning-Enabled Tiered Memory. 405-418 - Ipoom Jeong
, Jinghan Huang
, Chuxuan Hu
, Dohyun Park
, Jaeyoung Kang
, Nam Sung Kim
, Yongjoo Park
:
UPP: Universal Predicate Pushdown to Smart Storage. 419-433 - Li Peng
, Wenbo Wu
, Shushu Yi
, Xianzhang Chen
, Chenxi Wang
, Shengwen Liang
, Zhe Wang
, Nong Xiao
, Qiao Li
, Mingzhe Zhang
, Jie Zhang
:
XHarvest: Rethinking High-Performance and Cost-Efficient SSD Architecture with CXL-Driven Harvesting. 434-449 - Rohan Mahapatra
, Harsha Santhanam
, Christopher Priebe
, Hanyang Xu
, Hadi Esmaeilzadeh
:
In-Storage Acceleration of Retrieval Augmented Generation as a Service. 450-466
Session 4A: LLMs
- Jiaming Xu
, Jiayi Pan
, Yongkang Zhou
, Siming Chen
, Jinhao Li
, Yaoxiu Lian
, Junyi Wu
, Guohao Dai
:
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting. 467-481 - Minsu Kim
, Seongmin Hong
, Ryeowook Ko
, Soongyu Choi
, Hunjong Lee
, Junsoo Kim
, Joo-Young Kim
, Jongse Park
:
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization. 482-497 - Le Qin
, Junwei Cui
, Weilin Cai
, Jiayi Huang
:
Chimera: Communication Fusion for Hybrid Parallelism in Large Language Models. 498-513 - Zhiwen Mo
, Lei Wang
, Jianyu Wei
, Zhichen Zeng
, Shijie Cao
, Lingxiao Ma
, Naifeng Jing
, Ting Cao
, Jilong Xue
, Fan Yang
, Mao Yang
:
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference. 514-528 - Jaeyong Lee
, Hyeunjoo Kim
, Sanghun Oh
, Myoungjun Chun
, Myungsuk Kim
, Jihong Kim
:
AiF: Accelerating On-Device LLM Inference Using In-Flash Processing. 529-543 - Hyungyo Kim
, Nachuan Wang
, Qirong Xia
, Jinghan Huang
, Amir Yazdanbakhsh
, Nam Sung Kim
:
LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading. 544-558
Session 4B: Microarchitecture I
- Lingzhe Chester Cai
, Aniket Deshmukh
, Yale N. Patt
:
Enabling Ahead Prediction with Practical Energy Constraints. 559-571 - Mengming Li
, Qijun Zhang
, Yichuan Gao
, Wenji Fang
, Yao Lu
, Yongqing Ren
, Zhiyao Xie
:
Profile-Guided Temporal Prefetching. 572-585 - Noureldin Hassan
, Byounguk Min
, Changhee Jung
, Yan Solihin
, Jongouk Choi
:
WarmCache: Exploiting STT-RAM Cache for Low-Power Intermittent Systems. 586-600 - Gelin Fu
, Tian Xia
, Mingzhuo Yin
, Prashant J. Nair
, Mieszko Lis
, Pengju Ren
:
Magellan: A High-Performance Loop-Guided Prefetcher for Indirect Memory Access. 601-615 - Haris Volos
, Stylianos Vassiliou
, Georgia Antoniou
, Davide Basilio Bartolini
, Yiannakis Sazeides
:
Leveraging control-flow similarity to reduce branch predictor cold effects in microservices. 616-630
Session 4C: Datacenter & Cloud
- Xingmao Yu
, Dingcheng Jiang
, Jinyi Deng
, Jingyao Liu
, Chao Li
, Shouyi Yin
, Yang Hu
:
Cramming a Data Center into One Cabinet, a Co-Exploration of Computing and Hardware Architecture of Waferscale Chip. 631-645 - Leo Han
, Jash Kakadia
, Benjamin C. Lee
, Udit Gupta
:
Fair-CO2: Fair Attribution for Cloud Carbon Emissions. 646-663 - Jiaqi Lou
, Srikar Vanavasam
, Yifan Yuan
, Ren Wang
, Nam Sung Kim
:
Dynamic Load Balancer in Intel Xeon Scalable Processor: Performance Analyses, Enhancements, and Guidelines. 664-678 - Haneul Park
, Jiaqi Lou
, Sangjin Lee
, Yifan Yuan
, KyoungSoo Park
, Yongseok Son
, Ipoom Jeong
, Nam Sung Kim
:
A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices. 679-693 - Yuanlong Li
, Atri Bhattacharyya
, Madhur Kumar
, Abhishek Bhattacharjee
, Yoav Etsion
, Babak Falsafi
, Sanidhya Kashyap
, Mathias Payer
:
Single-Address-Space FaaS with Jord. 694-707 - Jovan Stojkovic
, Chunao Liu
, Muhammad Shahbaz
, Josep Torrellas
:
HardHarvest: Hardware-Supported Core Harvesting for Microservices. 708-722
Session 5A: RowHammer
- Suhas Vittal
, Salman Qazi
, Poulami Das
, Moinuddin Qureshi
:
MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Counting. 723-738 - Jeonghyun Woo
, Joyce Qu
, Gururaj Saileshwar
, Prashant Jayaprakash Nair
:
When Mitigations Backfire: Timing Channel Attacks and Defense for PRAC-Based RowHammer Mitigations. 739-756 - Ismail Emir Yuksel
, Akash Sood
, Ataberk Olgun
, Oguzhan Canpolat
, Haocong Luo
, Nisa Bostanci
, Mohammad Sadrosadati
, A. Giray Yaglikçi
, Onur Mutlu
:
PuDHammer: Experimental Analysis of Read Disturbance Effects of Processing-using-DRAM in Real DRAM Chips. 757-775 - Hritvik Taneja
, Moinuddin K. Qureshi
:
DREAM: Enabling Low-Overhead Rowhammer Mitigation via Directed Refresh Management. 776-792
Session 5B: HPC for ML/AI
- Feng Cheng
, Cong Guo
, Chiyue Wei
, Junyao Zhang
, Changchun Zhou
, Edward Hanson
, Jiaqi Zhang
, Xiaoxiao Liu
, Hai Li
, Yiran Chen
:
Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression. 793-807 - Seungjae Moon
, Junseo Cha
, Hyunjun Park
, Joo-Young Kim
:
Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window. 808-820 - Hyoungwook Nam
, Gerasimos Gerogiannis
, Josep Torrellas
:
MeshSlice: Efficient 2D Tensor Parallelism for Distributed DNN Training. 821-834 - Dezun Dong
, Ziyu Wang
, Fei Lei
:
Zettafly: A Network Topology with Flexible Non-blocking Regions for Large-scale AI and HPC Systems. 835-848
Session 5C: Processing-in-Memory
- Yuanpeng Zhang
, Xing Hu
, Xi Chen
, Zhihang Yuan
, Cong Li
, Jingchen Zhu
, Zhao Wang
, Chenguang Zhang
, Xin Si
, Wei Gao
, Qiang Wu
, Runsheng Wang
, Guangyu Sun
:
AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIM. 849-866 - Jiantao Liu
, Minxuan Zhou
, Yue Pan
, Chien-Yi Yang
, Lana Josipovic
, Tajana Rosing
:
OptiPIM: Optimizing Processing-in-Memory Acceleration Using Integer Linear Programming. 867-883 - Chaoqiang Liu
, Haifeng Liu
, Dan Chen
, Yu Huang
, Yi Zhang
, Wenjing Xiao
, Xiaofei Liao
, Hai Jin
:
HeterRAG: Heterogeneous Processing-in-Memory Acceleration for Retrieval-augmented Generation. 884-898 - Yongwon Shin
, Dookyung Kang
, Hyojin Sung
:
ATiM: Autotuning Tensor Programs for Processing-in-DRAM. 899-915
Session 6A: ML Accelerators II
- Rhys Gretsch
, Michael Beyeler
, Jeremy Lau
, Timothy Sherwood
:
Single Spike Artificial Neural Networks. 916-929 - Chiyue Wei
, Bowen Duan
, Cong Guo
, Jingyang Zhang
, Qingyue Song
, Hai Li
, Yiran Chen
:
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks. 930-943 - Boxun Xu
, Yuxuan Yin
, Vikram Iyer
, Peng Li
:
Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-constrained Pruning. 944-957 - Michael Shen
, Muhammad Umar
, Kiwan Maeng
, G. Edward Suh
, Udit Gupta
:
Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Generation At-Scale. 958-973 - Wenqi Jiang
, Suvinay Subramanian
, Cat Graves
, Gustavo Alonso
, Amir Yazdanbakhsh
, Vidushi Dadu
:
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving. 974-989 - Cong Guo
, Chiyue Wei
, Jiaming Tang
, Bowen Duan
, Song Han
, Hai Li
, Yiran Chen
:
Transitive Array: An Efficient GEMM Accelerator with Result Reuse. 990-1004
Session 6B: Microarchitecture II
- Saba Mostofi
, Setu Gupta
, Ahmad Hassani
, Krishnam Tibrewala
, Elvira Teran
, Paul V. Gratz
, Daniel A. Jiménez
:
Light-weight Cache Replacement for Instruction Heavy Workloads. 1005-1019 - Changxi Liu
, Miao Yu
, Yifan Sun
, Trevor E. Carlson
:
The Sparsity-Aware LazyGPU Architecture. 1020-1034 - Dai Cheol Jung
, Michael B. Taylor
:
Evaluating Ruche Networks: Physically Scalable, Cost-Effective, Bandwidth-Flexible NoCs. 1035-1048 - Jaewon Kwon
, Yongju Lee
, Jiwan Kim
, Enhyeok Jang
, Hongju Kal
, Won Woo Ro
:
Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server Workloads. 1049-1063 - Amel Fatima
, Yang Yang
, Yifan Sun
, Rachata Ausavarungnirun
, Adwait Jog
:
NetCrafter: Tailoring Network Traffic for Non-Uniform Bandwidth Multi-GPU Systems. 1064-1078 - Pedro Henrique Exenberger Becker
, Franyell Silfa
, José-María Arnau, Antonio González
:
Caravan: A Hardware/Software Co-Design for Efficient SIMD Neighbor Search on Point Clouds. 1079-1092
Session 6C: Memory Acceleration
- Yiwei Li
, Yuxin Jin
, Boyu Tian
, Huanchen Zhang
, Mingyu Gao
:
ANSMET: Approximate Nearest Neighbor Search with Near-Memory Processing and Hybrid Early Termination. 1093-1107 - Derrick Quinn
, E. Ezgi Yücel
, Martin Prammer
, Zhenxing Fan
, Kevin Skadron
, Jignesh M. Patel
, José F. Martínez
, Mohammad Alian
:
DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware Codesign. 1108-1124 - Taehwan Kim
, Yunki Han
, Seohye Ha
, Jiwan Kim
, Lee-Sup Kim
:
EOD: Enabling Low Latency GNN Inference via Near-Memory Concatenate Aggregation. 1125-1139 - Ziyuan Wen
, Alexis Le Glaunec
, Konstantinos Mamouras
, Kaiyuan Yang
:
RAP: Reconfigurable Automata Processor. 1140-1154 - Chang Eun Song
, Priyansh Bhatnagar
, Zihan Xia
, Nam Sung Kim
, Tajana Rosing
, Mingu Kang
:
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution. 1155-1170 - Kangqi Chen
, Rakesh Nadig
, Manos Frouzakis
, Nika Mansouri-Ghiasi
, Yu Liang
, Haiyu Mao
, Jisung Park
, Mohammad Sadrosadati
, Onur Mutlu
:
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing. 1171-1192
Session 7A: ML Acceleration III
- Akshat Ramachandran
, Souvik Kundu
, Tushar Krishna
:
MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization. 1193-1209 - Dahu Feng
, Erhu Feng
, Dong Du
, Pinjie Xu
, Yubin Xia
, Haibo Chen
, Rong Zhao
:
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units. 1210-1224 - August Ning
, David Wentzlaff
:
Chip Architectures Under Advanced Computing Sanctions✱. 1225-1239 - Jiaqi Yang
, Hao Zheng
, Ahmed Louri
:
DiTile-DGNN: An Efficient Accelerator for Distributed Dynamic Graph Neural Network Inference. 1240-1253 - Tianbo Liu
, Xinkai Song
, Zhifei Yue
, Rui Wen
, Xing Hu
, Zhuoran Song
, Yuanbo Wen
, Yifan Hao
, Wei Li
, Zidong Du
, Rui Zhang
, Jiaming Guo
, Di Huang
, Shaohui Peng
, Guangzhong Sun
, Qi Guo
, Tianshi Chen
:
Cambricon-SR: An Accelerator for Neural Scene Representation with Sparse Encoding Table. 1254-1268 - Haomin Li
, Fangxin Liu
, Yichi Chen
, Zongwu Wang
, Shiyuan Huang
, Ning Yang
, Dongxu Lyu
, Li Jiang
:
FATE: Boosting the Performance of Hyper-Dimensional Computing Intelligence with Flexible Numerical DAta TypE. 1269-1282
Session 7B: Systems
- Jingqi Feng
, Yukai Huang
, Rui Zhang
, Sicheng Liang
, Ming Yan
, Jie Wu
:
WindServe: Efficient Phase-Disaggregated LLM Serving with Stream-based Dynamic Scheduling. 1283-1295 - Joseph Rogers
, Lieven Eeckhout
, Taha Soliman
, Magnus Jahre
:
Neoscope: How Resilient Is My SoC to Workload Churn? 1296-1310 - Yanpeng Yu
, Nicolai Oswald
, Anurag Khandelwal
:
CORD: Low-Latency, Bandwidth-Efficient and Scalable Release Consistency via Directory Ordering. 1311-1326 - Panagiotis Miliadis
, Dimitris Theodoropoulos
, Nectarios Koziris
, Dionisios N. Pnevmatikatos
:
Nyx: Virtualizing dataflow execution on shared FPGA platforms. 1327-1341 - Russel Arbore
, Xavier Routh
, Abdul Rafae Noor
, Akash Kothari
, Haichao Yang
, Weihong Xu
, Sumukh Pinge
, Minxuan Zhou
, Tajana Rosing
, Vikram S. Adve
:
HPVM-HDC: A Heterogeneous Programming System for Accelerating Hyperdimensional Computing. 1342-1355 - Xia Zhao
, Guangda Zhang
, Lu Wang
, Huadong Dai
:
UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource Efficiency. 1356-1369
Session 7C: Quantum II
- Satvik Maurya
, Swamit Tannu
:
Synchronization for Fault-Tolerant Quantum Computers. 1370-1385 - Joshua Viszlai
, Jason D. Chadwick
, Sarang Joshi
, Gokul Subramanian Ravi
, Yanjing Li
, Frederic T. Chong
:
SWIPER: Minimizing Fault-Tolerant Quantum Program Latency via Speculative Window Decoding. 1386-1401 - Xiang Fang, Keyi Yin, Yuchen Zhu, Jixuan Ruan, Dean Tullsen, Zhiding Liang, Andrew Sornborger
, Ang Li, Travis S. Humble, Yufei Ding, Yunong Shi:
CaliQEC: In-situ Qubit Calibration for Surface Code Quantum Error Correction. 1402-1416 - Siddharth Dangwal
, Suhas Vittal
, Lennart Maximilian Seifert
, Frederic T. Chong
, Gokul Subramanian Ravi
:
Variational Quantum Algorithms in the era of Early Fault Tolerance. 1417-1431 - Hengyun Zhou
, Casey Duckering
, Chen Zhao
, Dolev Bluvstein
, Madelyn Cain
, Aleksander Kubica
, Sheng-Tao Wang
, Mikhail D. Lukin
:
Resource Analysis of Low-Overhead Transversal Architectures for Reconfigurable Atom Arrays. 1432-1448 - Hezi Zhang
, Yiran Xu
, Haotian Hu
, Keyi Yin
, Hassan Shapourian
, Jiapeng Zhao
, Ramana Rao Kompella
, Reza Nejabati
, Yufei Ding
:
SwitchQNet: Optimizing Distributed Quantum Computing for Quantum Data Centers with Switch Networks. 1449-1463
Session 8A: Performance and Modeling
- Jian Weng
, Boyang Han
, Derui Gao
, Ruijie Gao
, Wanning Zhang
, An Zhong
, Ceyu Xu
, Jihao Xin
, Yangzhixin Luo
, Lisa Wu Wills
, Marco Canini
:
Assassyn: A Unified Abstraction for Architectural Simulation and Implementation. 1464-1479 - Arash Nasr-Esfahany
, Mohammad Alizadeh
, Victor Lee
, Hanna Alam
, Brett W. Coon
, David E. Culler
, Vidushi Dadu
, Martin Dixon
, Henry M. Levy
, Santosh Pandey
, Parthasarathy Ranganathan
, Amir Yazdanbakhsh
:
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion. 1480-1494 - Shiheng Cao
, Junmin Wu
, Junshi Chen
, Hong An
, Zhibin Yu
:
AMALI: An Analytical Model for Accurately Modeling LLM Inference on Modern GPUs. 1495-1508 - Hanna Cha
, Sungchul Lee
, Jounghoo Lee
, Yeonan Ha
, Joonsung Kim
, Youngsok Kim
:
GCStack+GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis. 1509-1523 - Ying Li
, Yuhui Bao
, Gongyu Wang
, Xinxin Mei
, Pranav Vaid
, Anandaroop Ghosh
, Adwait Jog
, Darius Bunandar
, Ajay Joshi
, Yifan Sun
:
TrioSim: A Lightweight Simulator for Large-Scale DNN Workloads on Multi-GPU Systems. 1524-1538
Session 8B: Quantum III
- Meng Wang
, Swamit Tannu
, Prashant J. Nair
:
Accelerating Simulation of Quantum Circuits under Noise via Computational Reuse. 1539-1553 - Junyao Zhang
, Hanrui Wang
, Qi Ding
, Jiaqi Gu
, Reouven Assouly
, William D. Oliver
, Song Han
, Kenneth R. Brown
, Hai Li
, Yiran Chen
:
QPlacer: Frequency-Aware Component Placement for Superconducting Quantum Computers. 1554-1567 - Hyungseok Kim
, Enhyeok Jang
, Seungwoo Choi
, Youngmin Kim
, Won Woo Ro
:
QR-Map: A Map-Based Approach to Quantum Circuit Abstraction for Qubit Reuse Optimization. 1568-1582 - Zihan Chen
, Jiakang Li
, Minghao Guo
, Henry Chen
, Zirui Li
, Joel Bierman
, Yipeng Huang
, Huiyang Zhou
, Yuan Liu
, Eddy Z. Zhang
:
Genesis: A Compiler for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers. 1583-1597 - Yingheng Li
, Yue Dai
, Aditya Pawar
, Rongchao Dong
, Jun Yang
, Youtao Zhang
, Xulong Tang
:
Reinforcement Learning-Guided Graph State Generation in Photonic Quantum Computers. 1598-1612
Session 8C: Domain Specific Accelerators II
- Xintong Li
, Zhiyao Li
, Mingyu Gao
:
HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches. 1613-1626 - Souradip Ghosh
, Graham Gobieski
, Keyi Zhang
, Brandon Lucia
, Nathan Beckmann
, Tony Nowatzki
:
NUPEA: Optimizing Critical Loads on Spatial Dataflow Architectures via Non-Uniform Processing-Element Access. 1627-1640 - Alireza Khadem
, Kamalavasan Kamalakkannan
, Zhenyan Zhu
, Akash Poptani
, Yufeng Gu
, Jered Benjamin Dominguez-Trujillo
, Nishil Talati
, Daichi Fujiki
, Scott A. Mahlke
, Galen M. Shipman
, Reetuparna Das
:
DX100: Programmable Data Access Accelerator for Indirection. 1641-1658 - Ryan Hou
, Thomas Twomey
, Vasileios Milionis
, Evangelos Dikopoulos
, Tianrui Ma
, Yuhao Zhu
, Georgios Tzimpragos
:
SEAL: A Single-Event Architecture for In-Sensor Visual Localization. 1659-1674 - Suquan Zhang
, Yu Hu
, Yunfei Xiang
, Dawei Zhao
, Yuanfan Xu
, Qingmin Liao
, Jincheng Yu
, Yu Wang
:
IDEA-GP: Instruction-Driven Architecture with Efficient Online Workload Allocation for Geometric Perception. 1675-1688
Session 9A: Industry Track
- Joel Coburn
, Chunqiang Tang
, Sameer Abu Asal
, Neeraj Agrawal
, Raviteja Chinta, Harish Dattatraya Dixit, Brian Dodds, Saritha Dwarakapuram
, Amin Firoozshahian
, Cao Gao, Kaustubh Gondkar, Tyler Graf, Junhan Hu, Jian Huang, Sterling Hughes, Adam Hutchin, Bhasker Jakka, Guoqiang Jerry Chen, Indu Kalyanaraman, Ashwin Kamath, Pankaj Kansal, Erum Kazi, Roman Levenstein, Mahesh Maddury, Alex Mastro, Siji Medaiyese, Pritesh Modi, Jack Montgomery, Nadathur Satish, Amit Nagpal, Ashwin Narasimha, Maxim Naumov, Eleanor Ozer, Jongsoo Park, Poorvaja Ramani, Harikrishna Reddy, David Reiss, Deboleena Roy, Sathish Sekar, Arushi Sharma, Pavan Shetty, Aravind Sukumaran-Rajam, Eran Tal, Mike Tsai, Shreya Varshini, Richard Wareing, Olívia Wu, Xiaolong Xie, Jinghan Yang, Hangchen Yu, Tanmay Zargar, Zitong Zeng, Feixiong Zhang, Ajit Mathews, Xun Jiao, Jiyuan Zhang, Emmanuel Menage, Truls Edvard Stokke, Mohammed Sourouri:
Meta's Second Generation AI Chip: Model-Chip Co-Design and Productionization Experiences. 1689-1702 - Weiwei Chu
, Xinfeng Xie
, Jiecao Yu
, Jie Wang
, Amar Phanishayee
, Chunqiang Tang
, Yuchen Hao
, Jianyu Huang
, Mustafa Ozdal
, Jun Wang
, Vedanuj Goswami
, Naman Goyal
, Abhishek Kadian
, Andrew Gu
, Chris Cai
, Feng Tian
, Xiaodong Wang
, Min Si
, Pavan Balaji
, Ching-Hsiang Chu
, Jongsoo Park
:
Scaling Llama 3 Training with Efficient Parallelism Strategies. 1703-1716 - Wei Su
, Abhishek Dhanotia
, Carlos Torres
, Jayneel Gandhi
, Neha Gholkar
, Shobhit O. Kanaujia
, Maxim Naumov
, Kalyan Subramanian
, Valentin Andrei
, Yifan Yuan
, Chunqiang Tang
:
DCPerf: An Open-Source, Battle-Tested Performance Benchmark Suite for Datacenter Workloads. 1717-1730 - Chenggang Zhao
, Chengqi Deng
, Chong Ruan
, Damai Dai
, Huazuo Gao
, Jiashi Li
, Liyue Zhang
, Panpan Huang
, Shangyan Zhou
, Shirong Ma
, Wenfeng Liang
, Ying He
, Yuqing Wang
, Yuxuan Liu
, Y. X. Wei
:
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures. 1731-1745
Session 9B: HPC
- Gwangeun Byeon
, Seongwook Kim
, Hyungjin Kim
, Sukhyun Han
, Jinkwon Kim
, Prashant J. Nair
, Taewook Kang
, Seokin Hong
:
Avalanche: Optimizing Cache Utilization via Matrix Reordering for Sparse Matrix Multiplication Accelerator. 1746-1759 - Yunjae Lee
, Juntaek Lim
, Jehyeon Bang
, Eunyeong Cho
, Huijong Jeong
, Taesu Kim
, Hyungjun Kim
, Joonhyung Lee
, Jinseop Im
, Ranggi Hwang
, Se Jung Kwon
, Dongsoo Lee
, Minsoo Rhu
:
Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving. 1760-1776 - Fabian Wildgrube
, Pete Ehrett
, Paul Trojahn
, Richard Membarth
, Bradford M. Beckmann
, Dominik Baumeister
, Matthäus G. Chajdas
:
GPUs All Grown-Up: Fully Device-Driven SpMV Using GPU Work Graphs. 1777-1791 - Xiaochen Hao
, Hao Luo
, Chu Wang
, Chao Yang
, Yun Liang
:
Telos: A Dataflow Accelerator for Sparse Triangular Solver of Partial Differential Equations. 1792-1805
Session 9C: Memory Technology
- Renhao Fan
, Yikai Cui
, Weike Li
, Mingyu Wang
, Zhaolin Li
:
MagiCache: A Virtual In-Cache Computing Engine. 1806-1818 - Vignesh Adhinarayanan
, Bradford M. Beckmann
, Wantong Li
, Mohammad Seyedzadeh
, Sergey Blagodurov
, Derrick Aguren
, Hayden Hyungdong Lee
:
Folded Banks: 3D-Stacked HBM Design for Fine-Grained Random-Access Bandwidth. 1819-1833 - Heewoo Kim
, Sanjay Sri Vallabh Singapuram
, Haojie Ye
, Joseph Izraelevitz
, Trevor N. Mudge
, Ronald G. Dreslinski
, Nishil Talati
:
NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome Assembly. 1834-1847
Session 10A: ML Accelerators IV
- Chengyue Wang
, Xiaofan Zhang
, Jason Cong
, James C. Hoe
:
Reconfigurable Stream Network Architecture. 1848-1866 - Chunshu Wu
, Ruibing Song
, Chuan Liu
, Pouya Haghi
, Ang Li
, Michael Huang
, Tony Tong Geng
:
DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node Interaction. 1867-1879 - Guyue Huang
, Hao Li
, Le Qin
, Jiayi Huang
, Yangwook Kang
, Yufei Ding
, Yuan Xie
:
TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model. 1880-1893 - Seock-Hwan Noh
, Banseok Shin
, Jeik Choi
, Seungpyo Lee
, Jaeha Kung
, Yeseong Kim
:
FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF Rendering. 1894-1909 - Jiale Yan
, Hiroaki Ito
, Yuta Nagahara
, Kazushi Kawamura
, Masato Motomura
, Thiem Van Chu
, Daichi Fujiki
:
BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT. 1910-1924
Session 10B: Domain Specific Accelerators III
- Yu Feng
, Weikai Lin
, Yuge Cheng
, Zihan Liu
, Jingwen Leng
, Minyi Guo
, Chen Chen
, Shixuan Sun
, Yuhao Zhu
:
Lumina: Real-Time Neural Rendering by Exploiting Computational Redundancy. 1925-1939 - Seunghee Han
, Soongyu Choi
, Joo-Young Kim
:
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization. 1940-1955 - Ning Kang
, Guojun Yuan
, Zihan Yan
, Beining Zhang
, Boyang Li
, Zeyu Li
, Shuo Wang
, Guanglei Chen
, Jiayi Rao
, Zhan Wang
, Weile Jia
, Ninghui Sun
, Guangming Tan
:
MD-pipe: A Strong Scaling Enhanced Pipeline Architecture for Ab Initio Accuracy Molecular Dynamics. 1956-1968 - Yeongwoo Jang
, Daye Jung
, Seunghyun Song
, Hunjun Lee
, Jangwoo Kim
:
InfiniMind: A Learning-Optimized Large-Scale Brain-Computer Interface. 1969-1985
Session 10C: Security
- Alhad Daftardar
, Jianqiao Mo
, Joey Ah-kiow
, Benedikt Bünz
, Ramesh Karri
, Siddharth Garg
, Brandon Reagen
:
Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs. 1986-2001 - Jianyi Cheng
, A. Theodore Markettos
, Alexandre Joannou
, Paul Metzger
, Matthew Naylor
, Peter Rugg
, Timothy M. Jones
:
Adaptive CHERI Compartmentalization for Heterogeneous Accelerators. 2002-2016 - Sunho Lee
, Seonjin Na
, Jeongwon Choi
, Jinwon Pyo
, Jaehyuk Huh
:
Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors. 2017-2031 - Saber Ganjisaffar
, Esmaeil Mohmmadian Koruyeh
, Jason Zellmer
, Hodjat Asghari Esfeden
, Chengyu Song
, Nael B. Abu-Ghazaleh
:
SpecASan: Mitigating Transient Execution Attacks Using Speculative Address Sanitization. 2032-2045
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.