default search action
ACM Transactions on Architecture and Code Optimization, Volume 21
Volume 21, Number 1, March 2024
- Longfei Luo, Dingcui Yu, Yina Lv, Liang Shi:
Critical Data Backup with Hybrid Flash-Based Consumer Devices. 1:1-1:23 - Peng Chen, Hui Chen, Weichen Liu, Linbo Long, Wanli Chang, Nan Guan:
DAG-Order: An Order-Based Dynamic DAG Scheduling for Real-Time Networks-on-Chip. 2:1-2:24 - Zhang Jiang, Ying Chen, Xiaoli Gong, Jin Zhang, Wenwen Wang, Pen-Chung Yew:
JiuJITsu: Removing Gadgets with Safe Register Allocation for JIT Code Generation. 3:1-3:26 - Hayfa Tayeb, Ludovic Paillat, Bérenger Bramas:
Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations. 4:1-4:25 - Xueying Wang, Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang:
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs. 5:1-5:26 - Hao Fan, Yiliang Ye, Shadi Ibrahim, Zhuo Huang, Xingru Li, Weibin Xue, Song Wu, Chen Yu, Xuanhua Shi, Hai Jin:
QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDs. 6:1-6:25 - Yunping Zhao, Sheng Ma, Hengzhu Liu, Libo Huang, Yi Dai:
SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNs. 7:1-7:26 - Tong-Yu Liu, Jianmei Guo, Bo Huang:
Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping. 8:1-8:26 - Lei Liu, Xinglei Dou:
QuCloud+: A Holistic Qubit Mapping Scheme for Single/Multi-programming on 2D/3D NISQ Quantum Computers. 9:1-9:27 - Lingxi Wu, Minxuan Zhou, Weihong Xu, Ashish Venkat, Tajana Rosing, Kevin Skadron:
Abakus: Accelerating k-mer Counting with Storage Technology. 10:1-10:26 - Seokwon Kang, Jongbin Kim, Gyeongyong Lee, Jeongmyung Lee, Jiwon Seo, Hyungsoo Jung, Yong Ho Song, Yongjun Park:
ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization Opportunities. 11:1-11:24 - Prasoon Mishra, V. Krishna Nandivada:
COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop. 12:1-12:26 - Joongun Park, Seunghyo Kang, Sanghyeon Lee, Taehoon Kim, Jongse Park, Youngjin Kwon, Jaehyuk Huh:
Hardware-hardened Sandbox Enclaves for Trusted Serverless Computing. 13:1-13:25 - Tyler N. Allen, Bennett Cooper, Rong Ge:
Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory. 14:1-14:24 - Zhonghua Wang, Yixing Guo, Kai Lu, Jiguang Wan, Daohui Wang, Ting Yao, Huatao Wu:
Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL. 15:1-15:26 - Linbo Long, Shuiyong He, Jingcheng Shen, Renping Liu, Zhenhua Tan, Congming Gao, Duo Liu, Kan Zhong, Yi Jiang:
WA-Zone: Wear-Aware Zone Management Optimization for LSM-Tree on ZNS SSDs. 16:1-16:23 - Zhihua Fan, Wenming Li, Zhen Wang, Yu Yang, Xiaochun Ye, Dongrui Fan, Ninghui Sun, Xuejun An:
Improving Utilization of Dataflow Unit for Multi-Batch Processing. 17:1-17:26 - Dunbo Zhang, Qingjie Lang, Ruoxi Wang, Li Shen:
Extension VM: Interleaved Data Layout in Vector Memory. 18:1-18:23 - Can Firtina, Kamlesh R. Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie S. Kim, Taha Shahroodi, Meryem Banu Cavlak, Joël Lindegger, Mohammed Alser, Juan Gómez-Luna, Sreenivas Subramoney, Onur Mutlu:
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis. 19:1-19:29 - Khalid Ahmad, Cris Cecka, Michael Garland, Mary W. Hall:
Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs. 20:1-20:20
Volume 21, Number 2, June 2024
- Chandra Sekhar Mummidi, Victor da Cruz Ferreira, Sudarshan Srinivasan, Sandip Kundu:
Highly Efficient Self-checking Matrix Multiplication on Tiled AMX Accelerators. 21 - Zhonghua Wang, Chen Ding, Fengguang Song, Kai Lu, Jiguang Wan, Zhihu Tan, Changsheng Xie, Guokuan Li:
WIPE: A Write-Optimized Learned Index for Persistent Memory. 22 - Gino A. Chacon, Charles Williams, Johann Knechtel, Ozgur Sinanoglu, Paul V. Gratz, Vassos Soteriou:
Coherence Attacks and Countermeasures in Interposer-based Chiplet Systems. 23 - Yan Wei, Xingjun Zhang:
A Concise Concurrent B+-Tree for Persistent Memory. 24 - Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso:
An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs. 25 - Fernando Fernandes dos Santos, Luigi Carro, Flavio Vella, Paolo Rech:
Assessing the Impact of Compiler Optimizations on GPUs Reliability. 26 - Valentin Isaac-Chassande, Adrian Evans, Yves Durand, Frédéric Rousseau:
Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey. 27 - Benyi Xie, Yue Yan, Chenghao Yan, Sicheng Tao, Zhuangzhuang Zhang, Xinyu Li, Yanzhi Lan, Xiang Wu, Tianyi Liu, Tingting Zhang, Fuxin Zhang:
An Instruction Inflation Analyzing Framework for Dynamic Binary Translators. 28 - Samuel Rac, Mats Brorsson:
Cost-aware Service Placement and Scheduling in the Edge-Cloud Continuum. 29 - Feng Xue, Chenji Han, Xinyu Li, Junliang Wu, Tingting Zhang, Tianyi Liu, Yifan Hao, Zidong Du, Qi Guo, Fuxin Zhang:
Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses. 30 - Kunpeng Xie, Ye Lu, Xinyu He, Dezhi Yi, Huijuan Dong, Yao Chen:
Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs. 31 - Ke Liu, Kan Wu, Hua Wang, Ke Zhou, Peng Wang, Ji Zhang, Cong Li:
SLAP: Segmented Reuse-Time-Label Based Admission Policy for Content Delivery Network Caching. 32 - Panagiotis Miliadis, Dimitris Theodoropoulos, Dionisios N. Pnevmatikatos, Nectarios Koziris:
Architectural Support for Sharing, Isolating and Virtualizing FPGA Resources. 33 - Haitao Du, Yuhan Qin, Song Chen, Yi Kang:
FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration. 34 - Michael Canesche, Vanderson Martins do Rosário, Edson Borin, Fernando Magno Quintão Pereira:
The Droplet Search Algorithm for Kernel Scheduling. 35 - Asmita Pal, Keerthana Desai, Rahul Chatterjee, Joshua San Miguel:
Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces. 36 - Chengying Huan, Yongchao Liu, Heng Zhang, Shuaiwen Song, Santosh Pandey, Shiyang Chen, Xiangfei Fang, Yue Jin, Baptiste Lepers, Yanjun Wu, Hang Liu:
TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture. 37 - Soojin Hwang, Daehyeon Baek, Jongse Park, Jaehyuk Huh:
Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication. 38 - Siddhartha Raman Sundara Raman, Lizy Kurian John, Jaydeep P. Kulkarni:
NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks. 39 - Yan Chen, Qiwen Ke, Huiba Li, Yongwei Wu, Yiming Zhang:
xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object Storage. 40 - Vidush Singhal, Laith Sakka, Kirshanthan Sundararajah, Ryan Newton, Milind Kulkarni:
Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals. 41
Volume 21, Number 3, September 2024
- Hajar Falahati, Mohammad Sadrosadati, Qiumin Xu, Juan Gómez-Luna, Banafsheh Saber Latibari, Hyeran Jeon, Shaahin Hessabi, Hamid Sarbazi-Azad, Onur Mutlu, Murali Annavaram, Massoud Pedram:
Cross-core Data Sharing for Energy-efficient GPUs. 42:1-42:32 - Ching-Jui Lee, Tsung Tai Yeh:
ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors. 43:1-43:24 - Ziheng Wang, Xiaoshe Dong, Yan Kang, Heng Chen, Qiang Wang:
An Example of Parallel Merkle Tree Traversal: Post-Quantum Leighton-Micali Signature on the GPU. 44:1-44:25 - Jiang Wu, Zhuo Zhang, Deheng Yang, Jianjun Xu, Jiayu He, Xiaoguang Mao:
Knowledge-Augmented Mutation-Based Bug Localization for Hardware Design Code. 45:1-45:26 - Chen Ding, Jian Zhou, Kai Lu, Sicen Li, Yiqin Xiong, Jiguang Wan, Ling Zhan:
D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage. 46:1-46:22 - Zhuohao Wang, Lei Liu, Limin Xiao:
iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments. 47:1-47:24 - Junkaixuan Li, Yi Kang:
GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core Systems. 48:1-48:25 - Ke Wu, Dezun Dong, Weixia Xu:
COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign. 49:1-49:26 - Qunyou Liu, Darong Huang, Luis Costero, Marina Zapater, David Atienza:
Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads. 50:1-50:23 - Dongmoon Min, Ilkwon Byun, Gyu-hyeon Lee, Jangwoo Kim:
CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature Scaling. 51:1-51:27 - Hai Zhou, Dan Feng:
Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star Networks. 52:1-52:24 - Bobin Deng, Bhargava Nadendla, Kun Suo, Chloe Yixin Xie, Dan Chia-Tien Lo:
Fixed-point Encoding and Architecture Exploration for Residue Number Systems. 53:1-53:27 - Yizhuo Wang, Fangli Chang, Bingxin Wei, Jianhua Gao, Weixing Ji:
Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs. 54:1-54:27 - Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang:
Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access. 55:1-55:28 - Yunping Zhao, Sheng Ma, Hengzhu Liu, Dongsheng Li:
SAL: Optimizing the Dataflow of Spin-based Architectures for Lightweight Neural Networks. 56:1-56:27 - Kai Lu, Siqi Zhao, Haikang Shan, Qiang Wei, Guokuan Li, Jiguang Wan, Ting Yao, Huatao Wu, Daohui Wang:
Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated Memory. 57:1-57:26 - Wangqi Peng, Yusen Li, Xiaoguang Liu, Gang Wang:
Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job Colocation. 58:1-58:23 - Feng Zhang, Fulin Nan, Binbin Xu, Zhirong Shen, Jiebin Zhai, Dmitrii I. Kaplun, Jiwu Shu:
Achieving Tunable Erasure Coding with Cluster-Aware Redundancy Transitioning. 59:1-59:24 - Ataberk Olgun, F. Nisa Bostanci, Geraldo Francisco de Oliveira Junior, Yahya Can Tugrul, Rahul Bera, Abdullah Giray Yaglikçi, Hasan Hassan, Oguz Ergin, Onur Mutlu:
Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture. 60:1-60:29 - Xiaohui Wei, Chenyang Wang, Hengshan Yue, Jingweijia Tan, Zeyu Guan, Nan Jiang, Xinyang Zheng, Jianpeng Zhao, Meikang Qiu:
ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error Detection. 61:1-61:26 - Qiao Li, Yu Chen, Guanyu Wu, Yajuan Du, Min Ye, Xinbiao Gan, Jie Zhang, Zhirong Shen, Jiwu Shu, Chun Xue:
Characterizing and Optimizing LDPC Performance on 3D NAND Flash Memories. 62:1-62:26 - Jiahong Xu, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, Xiaokang Yang, Huize Li, Cong Liu, Fubing Mao, Yu Zhang:
ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators. 63:1-63:26 - Jiang Wu, Zhuo Zhang, Deheng Yang, Jianjun Xu, Jiayu He, Xiaoguang Mao:
Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data Purification. 64:1-64:25
Volume 21, Number 4, December 2024
- Zhuoran Song, Zhongkai Yu, Xinkai Song, Yifan Hao, Li Jiang, Naifeng Jing, Xiaoyao Liang:
Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client Hierarchies. 65:1-65:26 - Georgia Antoniou, Davide B. Bartolini, Haris Volos, Marios Kleanthous, Zhe Wang, Kleovoulos Kalaitzidis, Tom Rollet, Ziwei Li, Onur Mutlu, Yiannakis Sazeides, Jawad Haj-Yahya:
Agile C-states: A Core C-state Architecture for Latency Critical Applications Optimizing both Transition and Cold-Start Latency. 66:1-66:26 - Xinbiao Gan, Tiejun Li, Feng Xiong, Bo Yang, Xinhai Chen, Chunye Gong, Shijie Li, Kai Lu, Qiao Li, Yiming Zhang:
MST: Topology-Aware Message Aggregation for Exascale Graph Processing of Traversal-Centric Algorithms. 67:1-67:22 - Yujie Cui, Wei Chen, Xu Cheng, Jiangfang Yi:
Hyperion: A Highly Effective Page and PC Based Delta Prefetcher. 68:1-68:27 - Jianhua Gao, Weixing Ji, Yizhuo Wang:
Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems. 69:1-69:24 - Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun:
AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs. 70:1-70:25 - Wenbo Zhang, Yiqi Liu, Tianhao Zang, Zhenshan Bao:
EA4RCA: Efficient AIE accelerator design framework for regular Communication-Avoiding Algorithm. 71:1-71:24 - Arun Thangamani, Vincent Loechner, Stéphane Genaud:
A Survey of General-purpose Polyhedral Compilers. 72:1-72:26 - Junqing Lin, Jingwei Sun, Xiaolong Shi, Honghe Zhang, Xianzhi Yu, Xinzhi Wang, Jun Yao, Guangzhong Sun:
LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs. 73:1-73:25 - Chenglong Yi, Jintong Liu, Shenggang Wan, Juntao Fang, Bin Sun, Liqiang Zhang:
Data Deduplication Based on Content Locality of Transactions to Enhance Blockchain Scalability. 74:1-74:24 - Joshua Dennis Booth, Phillip Allen Lane:
A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler. 75:1-75:22 - Yu Tang, Qiao Li, Lujia Yin, Dongsheng Li, Yiming Zhang, Chenyu Wang, Xingcheng Zhang, Linbo Qiao, Zhaoning Zhang, Kai Lu:
DELTA: Memory-Efficient Training via Dynamic Fine-Grained Recomputation and Swapping. 76:1-76:25 - Zhenhua Tan, Linbo Long, Jingcheng Shen, Renping Liu, Congming Gao, Kan Zhong, Yi Jiang:
Optimizing Garbage Collection for ZNS SSDs via In-storage Data Migration and Address Remapping. 77:1-77:25 - Xiang Li, Qiong Chang, Aolong Zha, Shijie Chang, Yun Li, Jun Miyazaki:
An Optimized GPU Implementation for GIST Descriptor. 78:1-78:24 - Xiaobo Lu, Jianbin Fang, Lin Peng, Chun Huang, Zidong Du, Yongwei Zhao, Zheng Wang:
Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product. 79:1-79:25 - Yu Feng, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Han Zhao, Xiaofeng Hou, Jieru Zhao, Yuhao Zhu:
Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture. 80:1-80:25 - Changxi Liu, Alen Sabu, Akanksha Chaudhari, Qingxuan Kang, Trevor E. Carlson:
Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling. 81:1-81:26 - Saurabh Raje, Yufan Xu, Atanas Rountev, Edward F. Valeev, P. Sadayappan:
CoNST: Code Generator for Sparse Tensor Networks. 82:1-82:24 - Danlin Jia, Geng Yuan, Yiming Xie, Xue Lin, Ningfang Mi:
A Data-Loader Tunable Knob to Shorten GPU Idleness for Distributed Deep Learning. 83:1-83:25 - Shaobu Wang, Guangyan Zhang, Junyu Wei, Yang Wang, Jiesheng Wu, Qingchao Luo:
Understanding Silent Data Corruption in Processors for Mitigating its Effects. 84:1-84:27 - Yen-Yu Lu, Chin-Hsien Wu, Shih-Jen Li, Cheng-Tze Lee, Cheng-Yen Wu:
A Stable Idle Time Detection Platform for Real I/O Workloads. 85:1-85:23 - Lingyu Sun, Xiaofeng Hou, Chao Li, Jiacheng Liu, Xinkai Wang, Quan Chen, Minyi Guo:
A2: Towards Accelerator Level Parallelism for Autonomous Micromobility Systems. 86:1-86:20 - Manojna Sistla, Yiding Liu, Xin Fu:
Towards High Performance QNNs via Distribution-Based CNOT Gate Reduction. 87:1-87:22 - Fubing Mao, Xu Liu, Yu Zhang, Haikun Liu, Xiaofei Liao, Hai Jin, Wei Zhang, Jian Zhou, Yufei Wu, Longyu Nie, Yapu Guo, Zihan Jiang, Jingkang Liu:
PMGraph: Accelerating Concurrent Graph Queries over Streaming Graphs. 88:1-88:25 - Wentong Li, Yina Lv, Longfei Luo, Yunpeng Song, Liang Shi:
Access Characteristic-Guided Remote Swapping Across Mobile Devices. 89:1-89:25 - Yinan Zhang, Shun Yang, Huiqi Hu, Chengcheng Yang, Peng Cai, Xuan Zhou:
SuccinctKV: a CPU-efficient LSM-tree Based KV Store with Scan-based Compaction. 90:1-90:26 - Siyuan Ma, Kaustubh Mhatre, Jian Weng, Bagus Hanindhito, Zhengrong Wang, Tony Nowatzki, Lizy K. John, Aman Arora:
PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation. 91:1-91:27
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.