S p i c e C ~ Software Speculation for Irregular Parallelism

Rajiv Gupta
Prof. Keval Vora (2017)
Dr. Amlan Kusum (2016)
Dr. Farzad Khorasani (2016)
Dr. Sai Charan Koduru (2015)
Dr. Changhui Lin (2013)
Dr. Min Feng (2012)
Dr. Kishore Kumar Pusukuri (2012)
Dr. Chen Tian (2010)
Prof. Vijayanand Nagarajan (2009)

Description

The advent of multicore processors has brought new opportunities for achieving increased performance on a wide variety of applications. We are developing the SpiceC programming system that enables parallelization of applications with ease. This system incorporates a novel computation model that supports software-managed memory isolation and data transfer between threads. SpiceC supports speculative [PPoPP'12a,HIPS'13] asynchronous [OOPSLA'14] parallelism in the presence of dynamic data structures [TACO'12a,PLDI'10] and I/O operations [PLDI'12]. Efficient runtime support has been developed for detecting misspeculations [ISMM'10,MICRO'08], recovering from misspeculations [PPoPP'12b], and adapting OS scheduling [TACO'16,TACO'15,PACT'14,TACO'13a,TACO'12b,PACT'11] decisions for enhanced performance. Architectural support is being incorportaed to efficiently enforce sequential consistency [SC'14,ICS'13,ASPLOS'12,PACT'10] and to efficiently perform runtime monitoring [ISCA'09,VEE'09,SIGOPS'09] in a sequentially consistent system. We have also extended speculation to GPGPU based heterogenous systems [ICS'16,ICS'15,HPDC'14,LCPC'14,DIDC'14,TACO'13b] and distributed-memory clusters [CLUSTER'15,OOPSLA'14,HIPS'14,PGAS'13].

Publications

Software Speculation on Caching DSMs
S-C. Koduru, K. Vora, and R. Gupta. International Journal of Parallel Programming (IJPP), Volume 46, Issue 2, pages 313-332, April 2018.
CuMAS: Data Transfer Aware Multi-Application Scheduling for Shared GPUs
M. Belviranli, F. Khorasani, L.N. Bhuyan, and R. Gupta. ACM 30th International Conference on Supercomputing (ICS), 12 pages, Istanbul, Turkey, June 2016.
Tumbler: An Effective Load Balancing Technique for MultiCPU Multicore Systems
K.K. Pusukuri, R. Gupta, and L.N. Bhuyan. ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4, Article No. 36, 24 pages, January 2016.
Optimizing Caching DSM for Distributed Software Speculation
S-C. Koduru, K. Vora, and R. Gupta. IEEE International Conference on Cluster Computing (CLUSTER), pages 452-455, Chicago, Illinois, Sept. 2015.
PeerWave: Exploiting Wavefront Parallelism on GPUs with Peer-SM Synchronization
M. Belviranli, P. Deng, L.N. Bhuyan, R. Gupta, and Q. Zhu. ACM 29th International Conference on Supercomputing (ICS), pages 25-35, Newport Beach, California, June 2015.
Fence Scoping
C. Lin, V. Nagarajan, and R. Gupta. ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 105-116, New Orleans, Louisiana, November 2014.
ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM
K. Vora, S-C. Koduru, and R. Gupta. ACM SIGPLAN International Conference on Object Oriented Programming Systems, Languages and Applications (OOPSLA), pages 861-878, Portland, Oregon, October 2014.
Optimistic Parallelism on GPUs
M. Feng, R. Gupta, and L.N. Bhuyan. 27th International Workshop on Languages and Compilers for Parallel Computing (LCPC), LNCS 8967, Chapter 1, pages 3-18, Hillsboro, Oregon, September 2014.
Shuffling: A Framework for Lock Contention Aware Thread Scheduling for Multicore Multiprocessor Systems
K.K. Pusukuri, R. Gupta, and L.N. Bhuyan. International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 289-300, Edmonton, Alberta, Canada, August 2014.
CuSha: Vertex-Centric Graph Processing on GPUs
F. Khorasani, K. Vora, R. Gupta, and, L.N. Bhuyan. 23rd International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC), pages 239-252, Vancouver, Canada, June 2014. download: http://farkhor.github.io/CuSha/.
A Paradigm Shift in GP-GPU Computing: Task Based Execution of Applications with Dynamic Data Dependencies
M.E. Belviranli, C.H. Chou, L.N. Bhuyan, and R. Gupta. International Workshop on Data Intensive Distributed Computing (DIDC), held in conjunction with HPDC, pages 29-34, Vancouver, Canada, June 2014.
ABC²: Adaptively Balancing Computation & Communication in a DSM cluster of Multicores for Irregular Applications
S-C. Koduru, K. Vora, and R. Gupta. Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), pages 391-400, in IEEE IPDPSW Proceedings, Phoenix, May 2014.
Software Based Speculative Parallelization For Multicore/Manycore Architecture
C. Tian, M. Feng, and R. Gupta. In Programming Multi-core and Many-core Computing Systems, John Wiley & Sons, Chapter 10, pages 205-225, Edited by S. Pllana and F. Xhafa, January 2017.
Programming Large Dynamic Data Structures on a DSM Cluster of Multicores
S-C. Koduru, M. Feng, and R. Gupta. 7th International Conference on PGAS Programming Models (PGAS), pages 126-141, Edinburgh, Scotland, October 2013.
Address-aware Fences
C. Lin, V. Nagarajan, and R. Gupta. 27th International Conference on Supercomputing (ICS), pages 313-324, Eugene, Oregon, June 2013.
Programming Support for Speculative Execution with Software Transactional Memory
M. Feng, R. Gupta, and I. Neamtiu. Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), pages 394-403, in IEEE IPDPSW Proceedings, January 2013.
ADAPT: A Framework for Coscheduling Multithreaded Programs
K.K. Pusukuri, R. Gupta, and L.N. Bhuyan. ACM Transactions on Architecture and Code Optimization (TACO), special issue of papers presented at HiPEAC, Volume 9, Issue 4, Article No. 45, 25 pages, January 2013a.
A Dynamic Self Scheduling Scheme for Heterogeneous Multiprocessor Architectures
M.E. Belviranli, L.N. Bhuyan, and R. Gupta. ACM Transactions on Architecture and Code Optimization (TACO), special issue of papers presented at HiPEAC, Volume 9, Issue 4, Article No. 57, 20 pages, January 2013b.
Effective Parallelization of Loops in the Presence of I/O Operations
M. Feng, R. Gupta, and I. Neamtiu. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 487-498, Beijing, China, June 2012.
Efficient Sequential Consistency via Conflict Ordering
C. Lin, V. Nagarajan, R. Gupta, and B. Rajaram. ACM 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 273-286, London, UK, March 2012.
Efficient Sequential Consistency Using Conditional Fences
C. Lin, V. Nagarajan, and R. Gupta. International Journal of Parallel Programming (IJPP), Vol. 40, No. 1, pages 84-117, Feb. 2012. Special issue of best papers from PACT 2010.
PLDS: Partitioning Linked Data Structures for Parallelism
M. Feng, C. Lin, and R. Gupta. ACM Transactions on Architecture and Code Optimization (TACO), special issue of papers presented at HiPEAC, Volume 8, Issue 4, Article No. 38, 21 pages, January 2012a.
Thread Tranquilizer: Dynamically Reducing Performance Variation
K.K. Pusukuri, R. Gupta, and L.N. Bhuyan. ACM Transactions on Architecture and Code Optimization (TACO), special issue of papers presented at HiPEAC, Volume 8, Issue 4, Article No. 46, 21 pages, January 2012b.
Thread Reinforcer: Dynamically Determining Number of Threads via OS Level Monitoring
K.K. Pusukuri, R. Gupta, and L.N. Bhuyan. IEEE International Symposium on Workload Characterization (IISWC), pages 116-125, Austin, Texas, November 2011.
No More Backstabbing... A Faithful Scheduling Policy for Multithreaded Programs
K.K. Pusukuri, R. Gupta, and L.N. Bhuyan. International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 12-21, Galveston Island, Texas, October 2011.
Efficient Sequential Consistency Using Conditional Fences
C. Lin, V. Nagarajan, and R. Gupta. International Journal of Parallel Programming (IJPP), published online June 2011.
SpiceC: Scalable Parallelism via implicit copying and explicit Commit
M. Feng, R. Gupta, and Y. Hu. 16th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP), pages 69-80, San Antonio, Texas, February 2011a.
Enhanced Speculative Parallelization Via Incremental Recovery
C. Tian, C. Lin, M. Feng, and R. Gupta. 16th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP), pages 189-200, San Antonio, Texas, February 2011b.
Efficient Sequential Consistency Using Conditional Fences
C. Lin, V. Nagarajan, and R. Gupta. 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 295-306, Vienna, Austria, September 2010.
Supporting Speculative Parallelization in the Presence of Dynamic Data Structures
C. Tian, M. Feng, and R. Gupta. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 62-73, Toronto, Canada, June 2010.
Speculative Parallelization Using State Separation and Multiple Value Prediction
C. Tian, M. Feng, and R. Gupta. Ninth International Symposium on Memory Management (ISMM), pages 63-72, Toronto, Canada, June 2010.
Speculative Optimizations for Parallel Programs on Multicores
V. Nagarajan and R. Gupta. 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC), LNCS 5898/2010, pages 323-337, Newark, Delaware, October 2009.
ECMon: Exposing Cache Events for Monitoring
V. Nagarajan and R. Gupta. ACM/IEEE 36th International Symposium on Computer Architecture (ISCA), pages 349-360, Austin, Texas, June 2009.
Runtime Monitoring on Multicores via OASES
V. Nagarajan and R. Gupta. ACM SIGOPS Operating Systems Review, (SIGOPS), special issue on the interaction among the OS, Compilers, and Multicore Processors, pages 15-24, Vol. 43, No. 2, April 2009 (Invited Paper).
Architectural Support for Shadow Memory in Multiprocessors
V. Nagarajan and R. Gupta. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), pages 1-10, Washington DC, March 2009.
Speculative Parallelization of Sequential Loops On Multicores
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. International Journal of Parallel Programming (IJPP), Vol. 37, No. 5, pages 508-535, 2009.
Copy Or Discard Execution Model For Speculative Parallelization On Multicores
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. IEEE/ACM 41th International Symposium on Microarchitecture, (MICRO), pages 330-341, Lake Como, Italy, Nov. 2008.
SENSS: Security Enhancement to Symmeteric Shared Memory Multiprocessors
Y. Zhang, L. Gao, J. Yang, X. Zhang and R. Gupta. IEEE 11th International Symposium on High Performance Computer Architecture (HPCA), pages 352-362, San Francisco, California, February 2005.
Distributed Path Reservation Algorithms for Multiplexed All-Optical Interconnection Networks
X. Yuan, R. Melhem, and R. Gupta. IEEE 3rd International Symposium on High-Performance Computer Architecture (HPCA), pages 38-47, San Antonio, Texas, February 1997.
SPMD Execution of Programs with Pointer-based Data Structures on Distributed-Memory Machines
R. Gupta. Journal of Parallel and Distributed Computing (JPDC), special issue on Multicomputer Programming and Application, Vol. 16, No. 2, pages 92-107, October 1992.
SPMD Execution of Programs with Dynamic Data Structures on Distributed Memory Machines
R. Gupta. IEEE 4th International Conference on Computer Languages (ICCL), pages 232-241, Oakland, California, April 1992.
A Shape Matching Approach for Scheduling Fine-Grained Parallelism
B. Malloy, R. Gupta, and M.L. Soffa. IEEE/ACM 25th International Symposium on Microarchitecture (MICRO), pages 264-267, Portland, Oregon, December 1992.
Executing Loops on a Fine-Grained MIMD Architecture
S. Lee and R. Gupta. IEEE/ACM 24th International Symposium on Microarchitecture (MICRO), pages 199-205, Albuquerque, New Mexico, November 1991.
The Design of a RISC based Multiprocessor Chip
R. Gupta, M. Epstein, and M. Whelan. Supercomputing'90 (SC), pages 920-929, New York, November 1990.
A Fine-grained MIMD Architecture based upon Register Channels
R. Gupta. IEEE/ACM 23rd Workshop on Microprogramming and Microarchitecture (MICRO), pages 28-37, Orlando, Florida, December 1990.
Employing Register Channels for the Exploitation of Instruction Level Parallelism
R. Gupta. ACM SIGPLAN 2nd Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 118-127, Seattle, Washington, March 1990.
The Fuzzy Barrier: A Mechanism for High-Speed Synchronization of Processors
R. Gupta. ACM 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 54-64, Boston, April 1989.

Funding

Size Oblivious Programming for Large Dynamic Data Structures,
Google Research Award, 3/2013-2/2014.
SHF: Small: Memory Consistency - Hardware, Compiler, and Programming Support,
National Science Foundation, CCF-1318103, 9/2013-8/2017.
EAGER: Developing a Programming Environment for Heterogeneous Multiprocessors,
National Science Foundation, CNS-1157377, 9/2012-8/2015.
SHF: Medium: Programmable Monitoring Framework for Multicore Systems,
National Science Foundation, CCF-0963996, 9/2010-8/2014.
SHF: Medium: Hardware/Software Partitioning for Hybrid Shared Memory Multiprocessors,
National Science Foundation, CCF-0905509, 9/2009-8/2015.