default search action
Shuaiwen Song
Person information
- affiliation: University of Sydney, Australia
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j23]Chengying Huan, Yongchao Liu, Heng Zhang, Shuaiwen Song, Santosh Pandey, Shiyang Chen, Xiangfei Fang, Yue Jin, Baptiste Lepers, Yanjun Wu, Hang Liu:
TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture. ACM Trans. Archit. Code Optim. 21(2): 37 (2024) - [j22]Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu, Bei Gong, Shuaiwen Song, Jiguo Yu:
MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors. IEEE Trans. Computers 73(4): 980-993 (2024) - [j21]Yufei Yang, Chenhao Xie, Liansheng Liu, Philip H. W. Leong, Shuaiwen Leon Song:
Efficient Radius Search for Adaptive Foveal Sizing Mechanism in Collaborative Foveated Rendering Framework. IEEE Trans. Mob. Comput. 23(5): 3620-3632 (2024) - [j20]Chengying Huan, Yongchao Liu, Heng Zhang, Hang Liu, Shiyang Chen, Shuaiwen Leon Song, Yanjun Wu:
TeGraph+: Scalable Temporal Graph Processing Enabling Flexible Edge Modifications. IEEE Trans. Parallel Distributed Syst. 35(8): 1469-1487 (2024) - [c79]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. IPDPS (Workshops) 2024: 1206-1208 - [c78]Donglin Zhuang, Zhen Zheng, Haojun Xia, Xiafei Qiu, Junjie Bai, Wei Lin, Shuaiwen Leon Song:
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures. OSDI 2024: 989-1005 - [c77]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminadabi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. PODC 2024: 121-130 - [c76]Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song:
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs. USENIX ATC 2024: 699-713 - [i30]Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song:
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. CoRR abs/2401.14112 (2024) - [i29]Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou:
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model. CoRR abs/2406.00977 (2024) - [i28]Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem:
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models. CoRR abs/2406.05223 (2024) - 2023
- [j19]Jianda Wang, Zhendong Wang, Bo Yu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu:
Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How? IEEE Internet Things J. 10(18): 15857-15871 (2023) - [j18]Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song:
Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity. Proc. VLDB Endow. 17(2): 211-224 (2023) - [j17]Lening Wang, Qiyu Wan, Peixun Ma, Jing Wang, Mingsong Chen, Shuaiwen Leon Song, Xin Fu:
Enabling High-Efficient ReRAM-Based CNN Training Via Exploiting Crossbar-Level Insignificant Writing Elimination. IEEE Trans. Computers 72(11): 3218-3230 (2023) - [c75]Yue Jin, Chengying Huan, Heng Zhang, Yongchao Liu, Shuaiwen Leon Song, Rui Zhao, Yao Zhang, Changhua He, Wenguang Chen:
G-Sparse: Compiler-Driven Acceleration for Generalized Sparse Computation for Graph Neural Networks on Modern GPUs. PACT 2023: 137-149 - [c74]Chengying Huan, Shuaiwen Leon Song, Santosh Pandey, Hang Liu, Yongchao Liu, Baptiste Lepers, Changhua He, Kang Chen, Jinlei Jiang, Yongwei Wu:
TEA: A General-Purpose Temporal Graph Random Walk Engine. EuroSys 2023: 182-198 - [c73]Yu Wen, Chenhao Xie, Shuaiwen Leon Song, Xin Fu:
Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing. HPCA 2023: 390-402 - [c72]Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. ICS 2023: 324-335 - [c71]Qiyu Wan, Lening Wang, Jing Wang, Shuaiwen Leon Song, Xin Fu:
NAS-SE: Designing A Highly-Efficient In-Situ Neural Architecture Search Engine for Large-Scale Deployment. MICRO 2023: 756-768 - [c70]Alan Robertson, Shuaiwen Song:
Mitigating Coupling Map Constrained Correlated Measurement Errors on Quantum Devices. SC 2023: 62:1-62:13 - [i27]Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. CoRR abs/2304.07334 (2023) - [i26]Huwan Peng, Scott Davidson, Richard Shi, Shuaiwen Leon Song, Michael B. Taylor:
Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models. CoRR abs/2307.02666 (2023) - [i25]Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He:
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. CoRR abs/2308.01320 (2023) - [i24]Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Ameneh Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song:
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model. CoRR abs/2309.00810 (2023) - [i23]Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song:
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity. CoRR abs/2309.10285 (2023) - [i22]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. CoRR abs/2309.14509 (2023) - [i21]Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan A. Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael W. Irvin, J. Gregory Pauloski, Logan T. Ward, Valérie Hayot-Sasson, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian T. Foster, James J. Davis, Michael E. Papka, Thomas S. Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi A. Hanson, Thomas E. Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton D. Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin M. Aji, Angela Dalton, Michael J. Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens:
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. CoRR abs/2310.04610 (2023) - 2022
- [j16]Yiding Liu, Xingyao Zhang, Donglin Zhuang, Xin Fu, Shuaiwen Song:
DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata Processor. ACM Trans. Archit. Code Optim. 19(4): 60:1-60:26 (2022) - [j15]Jidong Zhai, Liyan Zheng, Feng Zhang, Xiongchao Tang, Haojie Wang, Teng Yu, Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen:
Detecting Performance Variance for Parallel Applications Without Source Code. IEEE Trans. Parallel Distributed Syst. 33(10): 4239-4255 (2022) - [c69]Chengying Huan, Shuaiwen Leon Song, Yongchao Liu, Heng Zhang, Hang Liu, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu:
T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture. PACT 2022: 69-82 - [c68]Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin:
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures. ASPLOS 2022: 359-373 - [c67]Chengying Huan, Hang Liu, Mengxing Liu, Yongchao Liu, Changhua He, Kang Chen, Jinlei Jiang, Yongwei Wu, Shuaiwen Leon Song:
TeGraph: A Novel General-Purpose Temporal Graph Computing Engine. ICDE 2022: 578-592 - [c66]Heng Zhang, Lingda Li, Hang Liu, Donglin Zhuang, Rui Liu, Chengying Huan, Shuang Song, Dingwen Tao, Yongchao Liu, Charles He, Yanjun Wu, Shuaiwen Leon Song:
Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems. ICS 2022: 11:1-11:14 - [c65]Donglin Zhuang, Xingyao Zhang, Shuaiwen Song, Sara Hooker:
Randomness in Neural Network Training: Characterizing the Impact of Tooling. MLSys 2022 - [c64]Liyan Zheng, Jidong Zhai, Xiongchao Tang, Haojie Wang, Teng Yu, Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen:
Vapro: performance variance detection and diagnosis for production-run parallel applications. PPoPP 2022: 150-162 - [c63]Shaoshan Liu, Jianda Wang, Zhendong Wang, Bo Yu, Wei Hu, Yahui Liu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu:
Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System. RTAS 2022: 293-296 - [i20]Shaoshan Liu, Jianda Wang, Zhendong Wang, Bo Yu, Wei Hu, Yahui Liu, Jie Tang, Shuaiwen Leon Song, Cong Liu, Yang Hu:
Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System. CoRR abs/2207.00737 (2022) - [i19]Zhendong Wang, Xiaoming Zeng, Shuaiwen Leon Song, Yang Hu:
Towards Efficient Architecture and Algorithms for Sensor Fusion. CoRR abs/2209.06272 (2022) - [i18]Jieyang Chen, Chenhao Xie, Jesun Sahariar Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li:
MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems. CoRR abs/2209.07552 (2022) - 2021
- [j14]Cody Rivera, Jieyang Chen, Nan Xiong, Jing Zhang, Shuaiwen Leon Song, Dingwen Tao:
TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs. J. Parallel Distributed Comput. 151: 70-85 (2021) - [j13]Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Song, Dingwen Tao:
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. Proc. VLDB Endow. 15(4): 886-899 (2021) - [j12]Xingyao Zhang, Xin Fu, Donglin Zhuang, Chenhao Xie, Shuaiwen Leon Song:
Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design. IEEE Trans. Computers 70(4): 495-510 (2021) - [c62]Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael B. Taylor, Shuaiwen Leon Song:
Q-VR: system-level design for future mobile collaborative virtual reality. ASPLOS 2021: 587-599 - [c61]Chenhao Xie, Jieyang Chen, Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li:
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures. ICPP 2021: 53:1-53:11 - [c60]Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao:
ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. ICS 2021: 266-278 - [c59]Xingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael B. Taylor, Shuaiwen Leon Song:
η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities. ISCA 2021: 567-580 - [c58]Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu:
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving. MICRO 2021: 885-897 - [c57]Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao, Yanjun Wu, Shuaiwen Leon Song:
An efficient uncertain graph processing framework for heterogeneous architectures. PPoPP 2021: 477-479 - [c56]Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao:
A novel memory-efficient deep learning training framework via error-bounded lossy compression. PPoPP 2021: 485-487 - [c55]Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu:
Dr. Top-k: delegate-centric Top-k on GPUs. SC 2021: 39 - [c54]Kiran Ranganath, Joshua D. Suetterlein, Joseph B. Manzano, Shuaiwen Leon Song, Daniel Wong:
MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU servers. SC 2021: 99 - [c53]Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen, Xu Liu:
Toward efficient interactions between Python and native libraries. ESEC/SIGSOFT FSE 2021: 1117-1128 - [i17]Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael B. Taylor, Shuaiwen Leon Song:
Q-VR: System-Level Design for Future Mobile Collaborative Virtual Reality. CoRR abs/2102.13191 (2021) - [i16]Donglin Zhuang, Xingyao Zhang, Shuaiwen Leon Song, Sara Hooker:
Randomness In Neural Network Training: Characterizing The Impact of Tooling. CoRR abs/2106.11872 (2021) - [i15]Jialiang Tan, Yu Chen, Zhenming Liu, Bin Ren, Shuaiwen Leon Song, Xipeng Shen, Xu Liu:
Toward Efficient Interactions between Python and Native Libraries. CoRR abs/2107.00064 (2021) - [i14]Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu:
Dr. Top-k: Delegate-Centric Top-k on GPUs. CoRR abs/2109.08219 (2021) - [i13]Kiran Ranganath, Joshua D. Suetterlein, Joseph B. Manzano, Shuaiwen Leon Song, Daniel Wong:
MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers. CoRR abs/2110.03214 (2021) - [i12]Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu:
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving. CoRR abs/2110.03553 (2021) - [i11]Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao:
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. CoRR abs/2111.09562 (2021) - 2020
- [j11]Jingweijia Tan, Kaige Yan, Shuaiwen Leon Song, Xin Fu:
Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity. ACM Trans. Design Autom. Electr. Syst. 25(6): 52:1-52:18 (2020) - [j10]Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker:
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Trans. Parallel Distributed Syst. 31(1): 94-110 (2020) - [c52]Xingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, Jing Wang, Weigong Zhang, Xin Fu:
Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design. HPCA 2020: 542-555 - [i10]Chenhao Xie, Xin Fu, Mingsong Chen, Shuaiwen Leon Song:
OO-VR: NUMA Friendly Object-Oriented VR Rendering Framework For Future NUMA-Based Multi-GPU Systems. CoRR abs/2001.03537 (2020) - [i9]Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, Dingwen Tao:
ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs. CoRR abs/2002.03258 (2020) - [i8]Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu, Bei Gong, Shuaiwen Song, Jiguo Yu:
MalFox: Camouflaged Adversarial Malware Example Generation Based on C-GANs Against Black-Box Detectors. CoRR abs/2011.01509 (2020) - [i7]Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao:
A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression. CoRR abs/2011.09017 (2020) - [i6]Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao:
An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning. CoRR abs/2011.10170 (2020) - [i5]Chenhao Xie, Jieyang Chen, Jesun Sahariar Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li:
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures. CoRR abs/2012.06959 (2020)
2010 – 2019
- 2019
- [j9]Kiran Ranganath, AmirAli Abdolrashidi, Shuaiwen Leon Song, Daniel Wong:
Speeding up Collective Communications Through Inter-GPU Re-Routing. IEEE Comput. Archit. Lett. 18(2): 128-131 (2019) - [c51]Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, Martin C. Herbordt:
LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism. ASAP 2019: 9-16 - [c50]Jingweijia Tan, Kaige Yan, Shuaiwen Leon Song, Xin Fu:
LoSCache: Leveraging Locality Similarity to Build Energy-Efficient GPU L2 Cache. DATE 2019: 1190-1195 - [c49]Chenhao Xie, Xingyao Zhang, Ang Li, Xin Fu, Shuaiwen Song:
PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube. HPCA 2019: 609-622 - [c48]Chenhao Xie, Xin Fu, Mingsong Chen, Shuaiwen Leon Song:
OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems. ISCA 2019: 53-65 - [c47]Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker:
BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. SC 2019: 38:1-38:30 - [i4]Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker:
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. CoRR abs/1903.04611 (2019) - [i3]Xingyao Zhang, Shuaiwen Leon Song, Chenhao Xie, Jing Wang, Weigong Zhang, Xin Fu:
Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design. CoRR abs/1911.03451 (2019) - 2018
- [j8]Probir Roy, Shuaiwen Leon Song, Sriram Krishnamoorthy, Abhinav Vishnu, Dipanjan Sengupta, Xu Liu:
NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks. ACM Trans. Archit. Code Optim. 15(2): 24:1-24:26 (2018) - [c46]Probir Roy, Shuaiwen Leon Song, Sriram Krishnamoorthy, Xu Liu:
Lightweight detection of cache conflicts. CGO 2018: 200-213 - [c45]Du Shen, Shuaiwen Leon Song, Ang Li, Xu Liu:
CUDAAdvisor: LLVM-based runtime profiling for modern GPUs. CGO 2018: 214-227 - [c44]Chenhao Xie, Xin Fu, Shuaiwen Song:
Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors. HPCA 2018: 362-374 - [c43]Ang Li, Weifeng Liu, Linnan Wang, Kevin J. Barker, Shuaiwen Leon Song:
Warp-Consolidation: A Novel Execution Model for GPUs. ICS 2018: 53-64 - [c42]Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan R. Tallent, Kevin J. Barker:
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite. IISWC 2018: 191-202 - [c41]Shuaiwen Leon Song, Natalie J. Bates, Ang Li:
Introduction to HPPAC 2018. IPDPS Workshops 2018: 674 - [c40]Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska:
Superneurons: dynamic GPU memory management for training deep neural networks. PPoPP 2018: 41-53 - [i2]Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska:
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks. CoRR abs/1801.04380 (2018) - 2017
- [c39]Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, Henk Corporaal:
Locality-Aware CTA Clustering for Modern GPUs. ASPLOS 2017: 297-311 - [c38]Chenhao Xie, Shuaiwen Leon Song, Jing Wang, Weigong Zhang, Xin Fu:
Processing-in-Memory Enabled Graphics Processors for 3D Rendering. HPCA 2017: 637-648 - [c37]Junqiao Qiu, Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song:
Enabling scalability-sensitive speculative parallelization for FSM computations. ICS 2017: 2:1-2:10 - [c36]Shuaiwen Leon Song, Richard W. Vuduc:
HPPAC Workshop Introduction. IPDPS Workshops 2017: 952 - [c35]Shuaiwen Leon Song, Torsten Hoefler:
IPDRM Workshop Introduction. IPDPS Workshops 2017: 1284 - [c34]Ang Li, Wenfeng Zhao, Shuaiwen Leon Song:
BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors. MICRO 2017: 532-545 - [c33]Ang Li, Weifeng Liu, Mads Ruben Burgdorff Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, Shuaiwen Leon Song:
Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels. SC 2017: 26 - [c32]Ning Zhang, Chuntao Jiang, Xian-He Sun, Shuaiwen Leon Song:
Evaluating GPGPU Memory Performance Through the C-AMAT Model. MCHPC@SC 2017: 35-39 - [c31]Dipanjan Sengupta, Shuaiwen Leon Song:
EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU. ISC 2017: 97-119 - 2016
- [j7]Li Tan, Zizhong Chen, Shuaiwen Leon Song:
Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology. ACM Trans. Archit. Code Optim. 12(4): 35:1-35:27 (2016) - [c30]Jingweijia Tan, Shuaiwen Leon Song, Kaige Yan, Xin Fu, Andrès Márquez, Darren J. Kerbyson:
Combating the Reliability Challenge of GPU Register File at Low Supply Voltage. PACT 2016: 3-15 - [c29]Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z. Zhang, Daniel G. Chavarría-Miranda, Henk Corporaal:
Critical points based register-concurrency autotuning for GPUs. DATE 2016: 1273-1278 - [c28]Dingwen Tao, Shuaiwen Leon Song, Sriram Krishnamoorthy, Panruo Wu, Xin Liang, Eddy Z. Zhang, Darren J. Kerbyson, Zizhong Chen:
New-Sum: A Novel Online ABFT Scheme For General Iterative Methods. HPDC 2016: 43-55 - [c27]Probir Roy, Xu Liu, Shuaiwen Leon Song:
SMT-Aware Instantaneous Footprint Optimization. HPDC 2016: 267-279 - [c26]Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, Henk Corporaal:
SFU-Driven Transparent Approximation Acceleration on GPUs. ICS 2016: 15:1-15:14 - [c25]Lingda Li, Ari B. Hayes, Shuaiwen Leon Song, Eddy Z. Zhang:
Tag-Split Cache for Efficient GPGPU Cache Utilization. ICS 2016: 43:1-43:12 - [c24]Ang Li, Shuaiwen Leon Song, Eric Brugel, Akash Kumar, Daniel G. Chavarría-Miranda, Henk Corporaal:
X: A Comprehensive Analytic Model for Parallel Machines. IPDPS 2016: 242-252 - [c23]Barry Rountree, Shuaiwen Leon Song:
HPPAC Introduction and Committees. IPDPS Workshops 2016: 1089 - [c22]Shuaiwen Leon Song, Todd Gamblin:
IPDRM Introduction and Committees. IPDPS Workshops 2016: 1726 - [c21]Ari B. Hayes, Lingda Li, Daniel G. Chavarría-Miranda, Shuaiwen Leon Song, Eddy Z. Zhang:
Orion: A Framework for GPU Occupancy Tuning. Middleware 2016: 18 - [i1]Lingda Li, Ari B. Hayes, Stephen A. Hackler, Eddy Z. Zhang, Mario Szegedy, Shuaiwen Leon Song:
A Graph-based Model for GPU Caching Problems. CoRR abs/1605.02043 (2016) - 2015
- [j6]Yang You, Haohuan Fu, Shuaiwen Leon Song, Amanda Peters Randles, Darren J. Kerbyson, Andres Marquez, Guangwen Yang, Adolfy Hoisie:
Scaling Support Vector Machines on modern HPC platforms. J. Parallel Distributed Comput. 76: 16-31 (2015) - [c20]Sunil Shrestha, Joseph B. Manzano, Andrès Márquez, Stéphane Zuckerman, Shuaiwen Song, Guang R. Gao:
Gregarious Data Re-structuring in a Many Core Architecture. HPCC/CSS/ICESS 2015: 712-720 - [c19]Chao Li, Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Kumar Sastry Hari, Huiyang Zhou:
Locality-Driven Dynamic GPU Cache Bypassing. ICS 2015: 67-77 - [c18]Dipanjan Sengupta, Kapil Agarwal, Shuaiwen Leon Song, Karsten Schwan:
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems. IPDPS Workshops 2015: 604-609 - [c17]Li Tan, Shuaiwen Leon Song, Panruo Wu, Zizhong Chen, Rong Ge, Darren J. Kerbyson:
Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing. IPDPS 2015: 786-796 - [c16]Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, Karsten Schwan:
GraphReduce: processing large-scale graphs on accelerator-based systems. SC 2015: 28:1-28:12 - 2014
- [j5]Yang You, Haohuan Fu, Shuaiwen Leon Song, Maryam Mehri Dehnavi, Lin Gan, Xiaomeng Huang, Guangwen Yang:
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil. Int. J. High Perform. Comput. Appl. 28(3): 301-318 (2014) - [j4]Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, Kirk W. Cameron:
Extending PowerPack for Profiling and Analysis of High-Performance Accelerator-Based Systems. Parallel Process. Lett. 24(4) (2014) - [c15]Andres Marquez, Joseph B. Manzano, Shuaiwen Leon Song, Benoît Meister, Sunil Shrestha, Thomas St. John, Guang R. Gao:
ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution. ICPADS 2014: 289-297 - [c14]Yang You, Shuaiwen Leon Song, Darren J. Kerbyson:
An adaptive cross-architecture combination method for graph traversal. ICS 2014: 169 - [c13]Yang You, Shuaiwen Leon Song, Haohuan Fu, Andres Marquez, Maryam Mehri Dehnavi, Kevin J. Barker, Kirk W. Cameron, Amanda Peters Randles, Guangwen Yang:
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures. IPDPS 2014: 809-818 - [c12]Bo Li, Hung-Ching Chang, Shuaiwen Song, Chun-Yi Su, Timmy Meyer, John Mooring, Kirk W. Cameron:
The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications. IPDPS Workshops 2014: 1448-1456 - 2013
- [j3]Abhinav Vishnu, Shuaiwen Song, Andres Marquez, Kevin J. Barker, Darren J. Kerbyson, Kirk W. Cameron, Pavan Balaji:
Designing energy efficient communication runtime systems: a view from PGAS models. J. Supercomput. 63(3): 691-709 (2013) - [c11]Bo Li, Shuaiwen Leon Song, Ivona Bezáková, Kirk W. Cameron:
EDR: An energy-aware runtime load distribution system for data-intensive applications in the cloud. CLUSTER 2013: 1-8 - [c10]Shuaiwen Song, Chun-Yi Su, Barry Rountree, Kirk W. Cameron:
A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. IPDPS 2013: 673-686 - [c9]Shuaiwen Leon Song, Kevin J. Barker, Darren J. Kerbyson:
Unified performance and power modeling of scientific workloads. E2SC@SC 2013: 4:1-4:8 - 2012
- [c8]Shuaiwen Song, Kirk W. Cameron:
System-level power-performance efficiency modeling for emergent GPU architectures. PACT 2012: 473-474 - [c7]Bo Li, Shuaiwen Song, Ivona Bezáková, Kirk W. Cameron:
Energy-Aware Replica Selection for Data-Intensive Services in Cloud. MASCOTS 2012: 504-506 - [c6]Shuaiwen Leon Song, Chun-Yi Su, Barry Rountree, Kirk W. Cameron:
Abstract: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems. SC Companion 2012: 1344-1345 - [c5]Shuaiwen Leon Song:
Poster: Three Steps to Model Power-Performance Efficiency for Emergent GPU-Based Parallel Systems. SC Companion 2012: 1346 - 2011
- [c4]Shuaiwen Song, Matthew Grove, Kirk W. Cameron:
An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization. CLUSTER 2011: 262-271 - [c3]Shuaiwen Song, Chun-Yi Su, Rong Ge, Abhinav Vishnu, Kirk W. Cameron:
Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation. IPDPS 2011: 128-139 - 2010
- [j2]Rong Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, Kirk W. Cameron:
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications. IEEE Trans. Parallel Distributed Syst. 21(5): 658-671 (2010) - [c2]Abhinav Vishnu, Shuaiwen Song, Andres Marquez, Kevin J. Barker, Darren J. Kerbyson, Kirk W. Cameron, Pavan Balaji:
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models. GreenCom/CPSCom 2010: 229-236 - [c1]Abhinav Vishnu, Huub J. J. Van Dam, Wibe de Jong, Pavan Balaji, Shuaiwen Song:
Fault-tolerant communication runtime support for data-centric programming models. HiPC 2010: 1-9
2000 – 2009
- 2009
- [j1]Shuaiwen Song, Rong Ge, Xizhou Feng, Kirk W. Cameron:
Energy Profiling and Analysis of the HPC Challenge Benchmarks. Int. J. High Perform. Comput. Appl. 23(3): 265-276 (2009)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-26 01:52 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint