0% found this document useful (0 votes)

145 views51 pages

Multicores, Multiprocessors, and P, Clusters

The document discusses parallel computing using multicore processors and multiprocessors/clusters. It introduces key concepts like shared memory multiprocessors, message passing architectures, and different types of parallelism including multithreading. Examples are provided to illustrate parallel algorithms for tasks like sum reduction and challenges like load balancing and Amdahl's law. Different parallel computing models like SISD, MIMD, SIMD and vector processors are also covered.

Uploaded by

Adip Chy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views51 pages

Multicores, Multiprocessors, and P, Clusters

Uploaded by

Adip Chy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Chapter 7

Multicores, p , and Multiprocessors, Clusters

9.1 Intro oduction

Introduction

Goal: connecting multiple computers to get higher g e performance pe o a ce

Multiprocessors Scalability, y, availability, y, power p efficiency y High g throughput oug pu for o independent depe de jobs Single program run on multiple processors Chips with multiple processors (cores)
Chapter 7 Multicores, Multiprocessors, and Clusters 2

Job-level (process-level) parallelism

Parallel processing program

Multicore microprocessors

Hardware and Software

Hardware

Serial: e.g., e g Pentium 4 Parallel: e.g., quad-core Xeon e5345 Sequential: e.g., matrix multiplication Concurrent: e.g., operating system

Software

Sequential/concurrent software can run on serial/parallel hardware

Challenge: making effective use of parallel hardware

Chapter 7 Multicores, Multiprocessors, and Clusters 3

What Weve Already Covered

2.11: Parallelism and Instructions

Synchronization Associativity

3.6: Parallelism and Computer Arithmetic

4.10: Parallelism and Advanced Instruction-Level Instruction Level Parallelism 5.8: Parallelism and Memory Hierarchies

Cache Coherence Redundant Arrays of Inexpensive Disks

Chapter 7 Multicores, Multiprocessors, and Clusters 4

6.9: Parallelism and I/O:

7.2 The Difficulty of Creatin ng Parallel Processin ng Programs

Parallel Programming

Parallel software is the problem Need to get significant performance improvement

Otherwise, Oth i j just t use a f faster t uniprocessor, i since its easier! Partitioning Coordination Communications overhead

Diffi lti Difficulties

Chapter 7 Multicores, Multiprocessors, and Clusters 5

Amdahls Law

Sequential part can limit speedup Example: 100 processors processors, 90 speedup?

Tnew = Tparallelizable/100 + Tsequential

1 Speedup = = 90 (1 Fp parallelizable ) + Fp parallelizable /100

Solving: Fparallelizable = 0.999

Need sequential part to be 0 0.1% 1% of original time

Chapter 7 Multicores, Multiprocessors, and Clusters 6

Scaling Example

Workload: sum of 10 scalars, and 10 10 matrix sum

S Speed d up f from 10 t to 100 processors

Single processor: Time = (10 + 100) tadd 10 processors

Time = 10 tadd + 100/10 tadd = 20 tadd Speedup = 110/20 = 5.5 (55% of potential) Time = 10 tadd + 100/100 tadd = 11 tadd Speedup = 110/11 0/ = 10 0( (10% 0% o of po potential) e a)

100 processors

Assumes load can be balanced across processors

Chapter 7 Multicores, Multiprocessors, and Clusters 7

Scaling Example (cont)

What if matrix size is 100 100? Single processor: Time = (10 + 10000) tadd dd 10 processors

Time = 10 tadd dd + 10000/10 tadd dd = 1010 tadd dd Speedup = 10010/1010 = 9.9 (99% of potential) Time = 10 tadd + 10000/100 tadd = 110 tadd Speedup = 10010/110 = 91 (91% of potential)

100 processors

Assuming load balanced

Chapter 7 Multicores, Multiprocessors, and Clusters 8

Strong vs Weak Scaling

Strong scaling: problem size fixed

As in example

Weak scaling: problem size proportional to n mber of processors number

10 processors, 10 10 matrix

Ti Time = 20 tadd Ti Time = 10 tadd + 1000/100 tadd = 20 tadd

100 processors, 32 32 matrix

Constant performance in this example

Chapter 7 Multicores, Multiprocessors, and Clusters 9

7.3 Sha ared Memo ory Multipr rocessors

Shared Memory

SMP: shared memory multiprocessor

Hardware provides single physical address space for all processors Synchronize shared variables using locks Memory access time

UMA (uniform) vs. NUMA (nonuniform)

Chapter 7 Multicores, Multiprocessors, and Clusters 10

Example: Sum Reduction

Sum 100,000 numbers on 100 processor UMA

Each processor has ID: 0 Pn 99 Partition 1000 numbers per processor Initial summation on each processor sum[Pn] = 0; for (i = 1000*Pn; i < 1000*(Pn+1); i = i + 1) sum[Pn] = sum[Pn] + A[i]; Reduction: divide and conquer Half the processors add pairs, then quarter, Need to synchronize between reduction steps
Chapter 7 Multicores, Multiprocessors, and Clusters 11

Now need to add these partial sums

Example: Sum Reduction

half = 100; ; repeat synch(); if ( (half%2 != 0 && Pn == 0) ) sum[0] = sum[0] + sum[half-1]; /* Conditional sum needed when half is odd; Processor0 gets missing element */ / half = half/2; /* dividing line on who sums */ if (Pn < half) sum[Pn] = sum[Pn] + sum[Pn+half]; until (half == 1);
Chapter 7 Multicores, Multiprocessors, and Clusters 12

7.4 Clus sters and O Other Mes ssage-Pas ssing Multiprocessors

Message Passing

Each processor has private physical address add ess space Hardware sends/receives messages p between processors

Chapter 7 Multicores, Multiprocessors, and Clusters 13

Loosely Coupled Clusters

Network of independent computers

Each has private memory and OS Connected using I/O system

E.g., Ethernet/switch, Internet

Suitable for applications with independent tasks

Web servers, databases, simulations,

High availability, scalable, affordable Problems

Administration cost (prefer virtual machines) Low interconnect bandwidth

c f processor/memory bandwidth on an SMP c.f.

Chapter 7 Multicores, Multiprocessors, and Clusters 14

Sum Reduction (Again)

Sum 100,000 on 100 processors First distribute 100 numbers to each

The do partial sums sum = 0; 0 for (i = 0; i<1000; i = i + 1) sum = sum + AN[i]; Half H lf the th processors send, d other th half h lf receive i and add Th quarter The t send, d quarter t receive i and d add, dd
Chapter 7 Multicores, Multiprocessors, and Clusters 15

Reduction

Sum Reduction (Again)

Given send() and receive() operations

limit = 100; half = 100;/ 100;/* 100 processors */ / repeat half = (half+1)/2; /* send vs. receive dividing line */ / if (Pn >= half && Pn < limit) send(Pn - half, sum); if (Pn < (limit/2)) sum = sum + receive(); limit = half; /* upper limit of senders */ until ( (half == 1); ); /* / exit with final sum */ /

Send/receive also provide synchronization Assumes send/receive take similar time to addition
Chapter 7 Multicores, Multiprocessors, and Clusters 16

Grid Computing

Separate computers interconnected by long-haul networks

E.g., Internet connections Work units farmed out, out results sent back E.g., SETI@home, World Community Grid

Can make use of idle time on PCs

Chapter 7 Multicores, Multiprocessors, and Clusters 17

7.5 Hard dware Multithreadin ng

Multithreading

Performing multiple threads of execution in parallel

Replicate registers, PC, etc. Fast switching between threads Switch threads after each cycle Interleave instruction execution If one thread stalls, others are executed Only switch on long stall (e.g., L2-cache miss) Simplifies hardware, but doesnt hide short stalls (eg data hazards) (eg,

Fi Fine-grain i multithreading ltith di

Coarse-grain multithreading

Chapter 7 Multicores, Multiprocessors, and Clusters 18

Simultaneous Multithreading

In multiple-issue dynamically scheduled p ocesso processor

Schedule instructions from multiple threads Instructions from independent p threads execute when function units are available Within threads, dependencies handled by scheduling h d li and d register i t renaming i Two threads: T h d d duplicated li d registers, i shared h d function units and caches

Example: Intel Pentium-4 HT

Chapter 7 Multicores, Multiprocessors, and Clusters 19

Multithreading Example

Chapter 7 Multicores, Multiprocessors, and Clusters 20

Future of Multithreading

Will it survive? In what form? Power considerations simplified microarchitectures

Si l forms Simpler f of f multithreading ltith di Thread switch may be most effective

Tolerating cache-miss latency

Multiple p simple p cores might g share resources more effectively

Chapter 7 Multicores, Multiprocessors, and Clusters 21

7.6 SISD, MIMD, SIMD, SP PMD, and Vector

Instruction and Data Streams

An alternate classification
Data Streams Single Multiple SIMD: SSE instructions of x86 MIMD: Intel Xeon e5345

Instruction Single Streams Multiple

SISD: Intel Pentium 4 MISD: No examples today

SPMD: Single Program Multiple Data

A parallel program on a MIMD computer Conditional code for different p processors

Chapter 7 Multicores, Multiprocessors, and Clusters 22

SIMD

Operate elementwise on vectors of data

E g MMX and SSE instructions in x86 E.g.,

Multiple data elements in 128-bit wide registers

All p processors execute the same instruction at the same time

Each with different data address, etc.

Simplifies synchronization Reduced instruction control hardware Works best for highly data-parallel applications
Chapter 7 Multicores, Multiprocessors, and Clusters 23

Vector Processors

Highly pipelined function units Stream data from/to vector registers to units

Data collected from memory into registers Results stored from registers g to memory y 32 64-element registers g ( (64-bit elements) ) Vector instructions

Example: Vector extension to MIPS

lv, sv: load/store vector addv.d dd d: add dd vectors t of f double d bl addvs.d: add scalar to each element of vector of double

Significantly reduces instruction-fetch instruction fetch bandwidth

Chapter 7 Multicores, Multiprocessors, and Clusters 24

Example: DAXPY (Y = a X + Y)
Conventional MIPS code l.d $f0,a($sp) addiu r4,$s0,#512 , , loop: l.d $f2,0($s0) mul.d $f2,$f2,$f0 l.d $f4,0($s1) $f4,$f4,$f2 ,$ ,$ add.d $ s.d $f4,0($s1) addiu $s0,$s0,#8 addiu $s1,$s1,#8 $t0,r4,$s0 , ,$ subu $ bne $t0,$zero,loop Vector MIPS code l.d $f0,a($sp) l lv $v1,0($s0) $ 1 0($ 0) mulvs.d $v2,$v1,$f0 lv $v3,0($s1) addv.d $v4,$v2,$v3 sv $ $v4,0($s1) ($ )

;load scalar a ;upper ; pp bound of what to load ;load x(i) ;a x(i) ;load y(i) ;a ; x(i) ( ) + y(i) y( ) ;store into y(i) ;increment index to x ;increment index to y ;compute ; p bound ;check if done ;load scalar a ;load l d vector x ;vector-scalar multiply ;load vector y ;add y to product ;store the h result l

Chapter 7 Multicores, Multiprocessors, and Clusters 25

Vector vs. Scalar

Vector architectures and compilers

Simplify data-parallel data parallel programming Explicit statement of absence of loop-carried dependences

Reduced checking in hardware

Regular access patterns benefit from i t l interleaved d and db burst t memory Avoid control hazards by avoiding loops

More general than ad ad-hoc hoc media extensions (such as MMX, SSE)

Better match with compiler technology

Chapter 7 Multicores, Multiprocessors, and Clusters 26

7.7 Intro oduction to o Graphics s Processing Units

History of GPUs

Early video cards

Frame buffer memory with address generation for video output Originally high-end computers (e.g., SGI) Moores Law lower cost, higher density 3D graphics cards for PCs and game consoles Processors oriented P i d to 3D graphics hi tasks k Vertex/pixel processing, shading, texture mapping, rasterization

3D graphics processing

Graphics Processing Units

Chapter 7 Multicores, Multiprocessors, and Clusters 27

Graphics in the System

Chapter 7 Multicores, Multiprocessors, and Clusters 28

GPU Architectures

Processing is highly data-parallel

GPUs are highly multithreaded U thread Use h d switching i hi to hid hide memory l latency

Less reliance on multi-level caches

Graphics memory is wide and high-bandwidth Heterogeneous CPU/GPU systems CPU for sequential code code, GPU for parallel code DirectX, OpenGL C for Graphics (Cg), High Level Shader Language (HLSL) Compute p Unified Device Architecture ( (CUDA) )
Chapter 7 Multicores, Multiprocessors, and Clusters 29

Trend toward general purpose GPUs

Programming languages/APIs

Example: NVIDIA Tesla

Streaming multiprocessor

8 Streaming processors
Chapter 7 Multicores, Multiprocessors, and Clusters 30

Example: NVIDIA Tesla

Streaming Processors

Single precision FP and integer units Single-precision Each SP is fine-grained multithreaded Executed in parallel, SIMD style y

Warp: group of 32 threads

8 SPs 4 clock cycles

Hardware contexts for 24 warps

Registers, g , PCs, ,
Chapter 7 Multicores, Multiprocessors, and Clusters 31

Classifying GPUs

Dont fit nicely into SIMD/MIMD model

Conditional execution in a thread allows an illusion of MIMD

But with performance degredation Need to write general purpose code with care
Static: Discovered at Compile Time Dynamic: y Discovered at Runtime Superscalar Tesla Multiprocessor

Instruction-Level Parallelism Data-Level Parallelism

VLIW SIMD or Vector

Chapter 7 Multicores, Multiprocessors, and Clusters 32

7.8 Intro oduction to o Multiproc cessor Ne etwork Top pologies

Interconnection Networks

Network topologies

Arrangements of processors, switches, and links

Bus

Ring

N-cube (N = 3) 2D Mesh Fully connected

Chapter 7 Multicores, Multiprocessors, and Clusters 33

Multistage Networks

Chapter 7 Multicores, Multiprocessors, and Clusters 34

Network Characteristics

Performance

Latency per message (unloaded network) Throughput

Link bandwidth Total network bandwidth Bisection bandwidth

C Congestion ti d delays l (d (depending di on t traffic) ffi )

Cost Power Routability in silicon

Chapter 7 Multicores, Multiprocessors, and Clusters 35

7.9 Mult tiprocesso or Benchm marks

Parallel Benchmarks

Linpack: matrix linear algebra SPECrate: p parallel run of SPEC CPU p programs g

Job-level parallelism

SPLASH: Stanford Parallel Applications for Shared Memory

Mix of kernels and applications, strong scaling computational fluid dynamics kernels

NAS (NASA Advanced Supercomputing) suite

PARSEC (Princeton Application Repository for Shared Memory Computers) suite

Multithreaded applications using Pthreads and O OpenMP MP

Chapter 7 Multicores, Multiprocessors, and Clusters 36

Code or Applications?

Traditional benchmarks

Fixed code and data sets Should algorithms, algorithms programming languages languages, and tools be part of the system? Compare p systems, y p provided they y implement p a given application E.g., Linpack, Berkeley Design Patterns

Parallel programming is evolving

Would foster innovation in approaches to parallelism

Chapter 7 Multicores, Multiprocessors, and Clusters 37

7.10 Ro oofline: A S Simple Performance e Model

Modeling Performance

Assume performance metric of interest is achievable ac e ab e G GFLOPs/sec O s/sec

Measured using computational kernels from Berkeley Design Patterns FLOPs per byte of memory accessed Peak GFLOPS ( (from data sheet) ) Peak memory bytes/sec (using Stream benchmark)

Arithmetic intensity of a kernel

For a given computer, determine

Chapter 7 Multicores, Multiprocessors, and Clusters 38

Roofline Diagram

Attainable GPLOPs/sec = Max ( Peak Memory y BW Arithmetic Intensity, y, Peak FP Performance )

Chapter 7 Multicores, Multiprocessors, and Clusters 39

Comparing Systems

Example: Opteron X2 vs. Opteron X4

2 core vs. 4-core, 2-core 4 core, 2 2 FP performance/core, 2.2GHz vs. 2.3GHz Same memory system

To get higher performance on X4 than X2

Need high arithmetic intensity Or working set must fit in X4s 2MB L-3 cache

Chapter 7 Multicores, Multiprocessors, and Clusters 40

Optimizing Performance

Optimize FP performance

Balance adds & multiplies Improve superscalar ILP and use of SIMD instructions Software prefetch

Optimize memory usage

Avoid load stalls Avoid non-local data accesses

Chapter 7 Multicores, Multiprocessors, and Clusters 41

M Memory affinity ffi it

Optimizing Performance

Choice of optimization depends on arithmetic intensity of code

Arithmetic e c intensity e s y is s not always fixed

May scale with problem size Caching g reduces memory accesses

Increases arithmetic intensity

Chapter 7 Multicores, Multiprocessors, and Clusters 42

7.11 Re eal Stuff: B Benchmark king Four Multicores s

Four Example Systems

2 quad quad-core core Intel Xeon e5345 (Clovertown)

2 quad-core AMD Opteron X4 2356 (Barcelona)

Chapter 7 Multicores, Multiprocessors, and Clusters 43

Four Example Systems

2 oct-core Sun UltraSPARC T2 5140 (Niagara 2)

2 oct-core IBM Cell QS20

Chapter 7 Multicores, Multiprocessors, and Clusters 44

And Their Rooflines

Kernels
SpMV (left) LBHMD (right)

Some optimizations change arithmetic intensity x86 systems have higher peak GFLOPs

But harder to achieve, given memory g y bandwidth

Chapter 7 Multicores, Multiprocessors, and Clusters 45

Performance on SpMV

Sparse matrix/vector multiply

Irregular memory accesses, memory bound 0.166 before memory y optimization, p , 0.25 after

Arithmetic intensity

Xeon vs. Opteron

Similar peak FLOPS Xeon limited by shared FSBs and chipset 20 30 vs. 75 peak GFLOPs More cores and memory bandwidth

UltraSPARC/Cell vs vs. x86

Chapter 7 Multicores, Multiprocessors, and Clusters 46

Performance on LBMHD

Fluid dynamics: structured grid over time steps

Each point: 75 FP read/write, 1300 FP ops 0.70 before optimization, p , 1.07 after

Arithmetic intensity

Opteron vs. UltraSPARC

More powerful cores, not limited by memory bandwidth Still suffers ff f from memory bottlenecks

Xeon vs. others

Chapter 7 Multicores, Multiprocessors, and Clusters 47

Achieving Performance

Compare nave vs. optimized code

If nave code performs well well, its it s easier to write high performance code for the system
Kernel SpMV LBMHD SpMV LBMHD SpMV LBMHD SpMV LBMHD Nave GFLOPs/sec 1.0 46 4.6 1.4 7.1 3.5 3 5 9.7 Nave code not feasible Optimized GFLOPs/sec 1.5 56 5.6 3.6 14.1 4.1 4 1 10.5 6.4 16 7 16.7 Nave as % of optimized 64% 82% 38% 50% 86% 93% 0% 0%

System Intel Xeon AMD Opteron X4 Sun UltraSPARC T2 IBM Cell QS20

Chapter 7 Multicores, Multiprocessors, and Clusters 48

7.12 Fa allacies and d Pitfalls

Fallacies

Amdahls Law doesnt apply to parallel computers

Since we can achieve linear speedup But only on applications with weak scaling

Peak performance tracks observed performance f

Marketers like this approach! But compare Xeon with others in example Need to be aware of bottlenecks
Chapter 7 Multicores, Multiprocessors, and Clusters 49

Pitfalls

Not developing the software to take account of a multiprocessor architecture

Example: using a single lock for a shared composite resource

Serializes accesses, even if they could be done in parallel Use finer-granularity locking

Chapter 7 Multicores, Multiprocessors, and Clusters 50

7.13 Co oncluding R Remarks

Concluding Remarks

Goal: higher performance by using multiple p processors Difficulties

Developing p gp parallel software Devising appropriate architectures Changing software and application environment Chip-level multiprocessors with lower latency, hi h b higher bandwidth d idth i interconnect t t

Many y reasons for optimism

An ongoing challenge for computer architects!

Chapter 7 Multicores, Multiprocessors, and Clusters 51

CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
CompArch 23a MP-1
No ratings yet
CompArch 23a MP-1
17 pages
Lec 4
No ratings yet
Lec 4
36 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Unit 6
No ratings yet
Unit 6
36 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Lecture 03
No ratings yet
Lecture 03
39 pages
Chapter 6 Parallel Processors From Client To Cloud 5th
No ratings yet
Chapter 6 Parallel Processors From Client To Cloud 5th
36 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Chapter 12 Multiprocessor Systems
No ratings yet
Chapter 12 Multiprocessor Systems
110 pages
Intro to Parallel & Distributed Systems
No ratings yet
Intro to Parallel & Distributed Systems
15 pages
Chapter 6 Parallel Processor
No ratings yet
Chapter 6 Parallel Processor
21 pages
Module 07 - Multiprocessing
No ratings yet
Module 07 - Multiprocessing
60 pages
Parallel Programming
No ratings yet
Parallel Programming
5 pages
Patterson6e MIPS Ch06 PPT
No ratings yet
Patterson6e MIPS Ch06 PPT
74 pages
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
No ratings yet
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
40 pages
Multicore02 2
No ratings yet
Multicore02 2
18 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
BCSE412L - Parallel Computing 03
No ratings yet
BCSE412L - Parallel Computing 03
11 pages
Ayushagrawal HPC
No ratings yet
Ayushagrawal HPC
17 pages
Chapter 06
No ratings yet
Chapter 06
57 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Patterson6e MIPS Ch06 PPT
No ratings yet
Patterson6e MIPS Ch06 PPT
63 pages
10 Multithreading
No ratings yet
10 Multithreading
60 pages
CS-3006 2 PDC Overview Compressed
No ratings yet
CS-3006 2 PDC Overview Compressed
107 pages
Chapter 06
No ratings yet
Chapter 06
57 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Multiprocessors: Cs 152 L1 5 .1 DAP Fa97, U.CB
No ratings yet
Multiprocessors: Cs 152 L1 5 .1 DAP Fa97, U.CB
38 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
2 ND
No ratings yet
2 ND
19 pages
Group 2 Assignment 1
No ratings yet
Group 2 Assignment 1
10 pages
RS - Pds-Oe 3010
No ratings yet
RS - Pds-Oe 3010
8 pages
PDS Merged
No ratings yet
PDS Merged
182 pages
Lecture 19
No ratings yet
Lecture 19
20 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
BDS Session 2
No ratings yet
BDS Session 2
59 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
100% (1)
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
57 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Multicore02 1 Updated
No ratings yet
Multicore02 1 Updated
25 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Threads
No ratings yet
Threads
12 pages
Parallel Computers Architecture and Programming V. Rajaraman
No ratings yet
Parallel Computers Architecture and Programming V. Rajaraman
56 pages
Mod 7
No ratings yet
Mod 7
56 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
51 pages
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
No ratings yet
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
55 pages
Google C++ Testing Framework: Running Test Programs: Advanced Options
No ratings yet
Google C++ Testing Framework: Running Test Programs: Advanced Options
18 pages
Comparator PDF
No ratings yet
Comparator PDF
24 pages
Computer Architecture Homework
No ratings yet
Computer Architecture Homework
5 pages
CMOS Gate Design for Aging Robustness
No ratings yet
CMOS Gate Design for Aging Robustness
5 pages
EE538 Homework 2
No ratings yet
EE538 Homework 2
1 page
EE538 HW1 Solution
No ratings yet
EE538 HW1 Solution
4 pages
Instructions: Language of The Computer P
No ratings yet
Instructions: Language of The Computer P
92 pages
Arithmetic For Computers
No ratings yet
Arithmetic For Computers
48 pages
Chapter 4B: The Processor, Part B: Mary Jane Irwin
No ratings yet
Chapter 4B: The Processor, Part B: Mary Jane Irwin
56 pages
MIPS Addressing Modes & ALU Guide
No ratings yet
MIPS Addressing Modes & ALU Guide
31 pages
Antennas & Propagation Course Overview
No ratings yet
Antennas & Propagation Course Overview
1 page
EE 478 Lec02 DD Fundamentals1
No ratings yet
EE 478 Lec02 DD Fundamentals1
25 pages
Gem 5
100% (2)
Gem 5
3 pages
Homework #3 Solutions: Spring 2013
No ratings yet
Homework #3 Solutions: Spring 2013
2 pages
The Complete Guide To Blender Graphics Computer Modeling and Animation Fourth Edition John M. Blain Updated 2025
No ratings yet
The Complete Guide To Blender Graphics Computer Modeling and Animation Fourth Edition John M. Blain Updated 2025
94 pages
TransGaming SwiftShader Whitepaper-20130129
No ratings yet
TransGaming SwiftShader Whitepaper-20130129
10 pages
PBR Texture Guide for Artists
100% (1)
PBR Texture Guide for Artists
31 pages
OpenGL Programming Course Guide
No ratings yet
OpenGL Programming Course Guide
111 pages
Deferred Rendering in Leadwerks Engine
No ratings yet
Deferred Rendering in Leadwerks Engine
10 pages
Valient Killzone Shadow Fall Demo Postmortem
No ratings yet
Valient Killzone Shadow Fall Demo Postmortem
103 pages
Nvidia Ada Gpu Architecture
No ratings yet
Nvidia Ada Gpu Architecture
40 pages
Glencoe Mcgraw Hill Geometry Homework Practice Workbook Answer Key
50% (2)
Glencoe Mcgraw Hill Geometry Homework Practice Workbook Answer Key
8 pages
Texture Slicing
No ratings yet
Texture Slicing
6 pages
PS2 Lighting & HDR Techniques
No ratings yet
PS2 Lighting & HDR Techniques
154 pages
Project Computer Graphics
0% (2)
Project Computer Graphics
18 pages
Brochure Phoenix FD Max PDF
0% (1)
Brochure Phoenix FD Max PDF
4 pages
Computer Game Programming
No ratings yet
Computer Game Programming
139 pages
VALVE 2014 GDC Grimes Csgo Econ Content
No ratings yet
VALVE 2014 GDC Grimes Csgo Econ Content
90 pages
Three-Dimensional Viewing: Computer Graphics
No ratings yet
Three-Dimensional Viewing: Computer Graphics
52 pages
Texto Cod .L
No ratings yet
Texto Cod .L
9 pages
FOREST PACK-GIAO DIỆN
No ratings yet
FOREST PACK-GIAO DIỆN
106 pages
Compiler Theory Course Overview
No ratings yet
Compiler Theory Course Overview
33 pages
Fight Night Champion GDC Presentation
No ratings yet
Fight Night Champion GDC Presentation
75 pages
DirectX 11 for Game Developers
No ratings yet
DirectX 11 for Game Developers
54 pages
Resource
No ratings yet
Resource
1 page
Opengl - What Are The Key Differences Between Open Shader Language and GLSL - Stack Overflow
No ratings yet
Opengl - What Are The Key Differences Between Open Shader Language and GLSL - Stack Overflow
2 pages
AMD Graphics Pipeline GIC2020
No ratings yet
AMD Graphics Pipeline GIC2020
44 pages
Webgl Tutorial
50% (2)
Webgl Tutorial
31 pages
Unity & Unreal Engine Tutorials Digest
No ratings yet
Unity & Unreal Engine Tutorials Digest
9 pages
Move Truespace Models to Dark Basic
No ratings yet
Move Truespace Models to Dark Basic
13 pages
Log
No ratings yet
Log
10 pages
3603introduction To Computer Graphics With OpenGL ES First Edition Junghyun Han Download
No ratings yet
3603introduction To Computer Graphics With OpenGL ES First Edition Junghyun Han Download
117 pages
Autodesk® 3ds Max® Design 2010 Software Features and Benefits
No ratings yet
Autodesk® 3ds Max® Design 2010 Software Features and Benefits
11 pages
Analyzing Your Game Performance Using Event Tracing For Windows
No ratings yet
Analyzing Your Game Performance Using Event Tracing For Windows
54 pages