CS6303-COMPUTER ARCHITECTURE
UNIT I OVERVIEW & INSTRUCTIONS 
Eight ideas  Components of a computer system  Technology  Performance  Power wall 
Uniprocessors   to   multiprocessors;   Instructions      operations   and   operands      representing
instructions  Logical operations  control operations  Addressing and addressing modes.
Machine Structures
Casses !" C!#$utin% A$$icati!ns an& Their Characteristics
Pers!na c!#$uters 'PCs(
Personal   computers  emphasie  deli!ery  of  good  performance  to  single  users  at  low  cost   and
usually  e"ecute  third#party  soft  ware.  A computer  designed  for  use  $y  an  indi!idual%   usually
incorporating a graphics display% a &ey$oard% and a mouse.
Ser)ers
A computer   used  for  running  larger  programs  for  multiple  users%   oft   en  simultaneously%   and
typically accessed only !ia a networ&.  Ser)ers  are the modern form of what were once much
larger computers% and are sually accessed only !ia a networ&. 'er!ers are oriented to carrying
large wor&loads% which may consist of either single comple" applications(usually a scientifi c
or engineering application(or handling any small )o$s% such as would occur in $uilding a large
we$ ser!er.
Th ese low#end ser!ers are typically used for fi le storage% small $usiness applications% or simple
we$ ser!ing. At the other e"treme are su$erc!#$uters% which at the present consist of tens of
thousands of processors and many tera*+tes of memory% and cost tens to hundreds of millions
of dollars.
A$$icati!ns !" Su$erc!#$uters,
high#end scientifi c and engineering calculations% such as weather forecasting% oil e"ploration%
protein structure determination% and other large#scale pro$lems.
E#*e&&e& c!#$uters
A computer inside another de!ice used for running one predetermined application or collection
of soft ware.
  *************###################################################**************
EI-HT -REAT I.EAS IN COMPUTER ARCHITECTURE
/0 .esi%n "!r M!!re1s 2a3
The   numbers   of   transistors   incorporated  in  a  chip  will   approximately   double   every   24
months 
M!!re1s  a3  resute&  "r!#  a  /465  $re&icti!n  !"   such  %r!3th  in  IC  ca$acit+  #a&e  *+
-!r&!n M!!re6 c!-"!un&er !" Inte0
70 Use A*stracti!ns t! Si#$i"+ .esi%n
A ma)or producti!ity techni+ue for hardware and software is to use a$stractions to represent the
design  at   different   le!els  of  representation%   lower#le!el   details  are  hidden  to  offer   a  simpler
model at higher e!els. 
30 Ma8e the c!##!n case "ast
,a&ing the common case fast will tend to enhance performance $etter than optimiing the rare
case. Ironically% the common case is often simpler than the rare case and hence is often easier to
enhance. 
90 Per"!r#ance )ia Paraeis#
Computer architects ha!e offered designs that get more performance $y performing operations in
parallel. 
 Parallel -e+uests Assigned to computer    e.g. search ./arcia0
 Parallel Threads Assigned to core    e.g. loo&up% ads
 Parallel Instructions 1 2 instruction 3 one time e.g. 4 pipelined instructions
 Parallel 5ata 1 2 data item 3 one time e.g. add of 6 pairs of words
50 Per"!r#ance )ia Pi$einin% 
Pipelining   is   an   implementation   techni+ue   where   multiple   instructions   are   o!erlapped   in
e"ecution.   The   computer   pipeline   is   di!ided  in  stages.   Each  stage   completes   a   part   of   an
instruction in parallel. The stages are connected one to the ne"t to form a pipe # instructions enter
at one end% progress through the stages% and e"it at the other end.
60 Per"!r#ance )ia Pre&icti!n
In some cases it can $e faster on a!erage to guess and start wor&ing rather than wait until you
&now for sure% assuming that the mechanism to reco!er from a misprediction is not too e"pensi!e
and your prediction is relati!ely accurate.
:0 Hierarch+ !" #e#!ries
Programmers   want   memory   to   $e   fast%   large%   and   cheap%   as   memory   speed   often   shapes
performance% capacity limits the sie of pro$lems that can $e sol!ed% and the cost of memory
today is often the ma)ority of computer cost. Architects ha!e found that they can address these
conflicting demands with a hierarchy of memories% with the fastest% smallest% and most e"pensi!e
memory per $it at the top of the hierarchy and the slowest% largest% and cheapest per $it at the
$ottom. Caches gi!e the programmer the illusion that main memory is nearly as fast as the top of
the hierarchy and nearly as $ig and cheap as the $ottom of the hierarchy.  A layered triangle  icon
is used to represent the memory hierarchy. The shape indicates speed% cost% and sie7 the closer to
the top% the faster and more e"pensi!e per $it the memory; the wider the $ase of the layer% the
$igger the memory.
;0 .e$en&a*iit+ )ia Re&un&anc+
Computers  not   only  need  to  $e  fast;   they  need  to  $e  dependa$le.   'ince  any  physical
de!ice can fail% we ma&e systems dependa$le $y including redundant components that
can ta&e o!er when a failure occurs and to help detect failures. 
               8        ################################################################# "
C!#$!nents !" a c!#$uter s+ste#
The underlying hardware in any computer performs the same $asic functions7 inputting
data% outputting data% processing data% and storing data.
In$ut &e)ice
A mechanism through which the computer is fed information% such as a &ey$oard. 
Out$ut &e)ice
A mechanism that con!eys the result of a computation to a user% such as a display% or to another
computer.
'CPU( Also called processor. The acti!e part of the computer% which contains the datapath and
control and which adds num$ers% tests num$ers% signals I9: de!ices to acti!ate% and so on.
.ata$ath 
The component of the processor that performs arithmetic operations 
C!ntr!,    The  component   of   the  processor   that   commands   the  datapath%   memory%   and  I9:
de!ices according to the instructions of the program.
Me#!r+ 
Th e storage area in which programs are &ept when they are running and that contains the data
needed $y the running programs. 
Th e #e#!r+ is where the programs are &ept when they are running; it also contains the data
needed $y the running programs. Th e memory is $uilt from 5-A, chips.  DRAM stands for
&+na#ic   ran&!#  access   #e#!r+.   ,ultiple   5-A,s   are   used   together   to   contain   the
instructions and data of a program. In contrast to se+uential access memories% such as magnetic
tapes% the RAM portion of the term 5-A, means that memory accesses ta&e $asically the same
amount of time no matter what portion of the memory is read.
.+na#ic ran&!# access #e#!r+ '.RAM(
,emory $uilt as an integrated circuit; it pro!ides random access to any location. Access times
are 4; nanoseconds and cost per giga$yte in <;2< was =4 to =2;.
            ""#####################################################################################################""
Hierarchica a+ers !" har&3are an& s!"t3are
>igure shows that the layers of software are organied primarily in a hierarchical fashion% with
applications   $eing  the  outermost   ring  and  a  !ariety  of   systems   software  sitting  $etween  the
hardware and applications software. 
System Software7 'oftware that pro!ides ser!ices that are commonly useful% including operating
systems% compilers% loaders% and assem$lers. 
There are many types of systems software% $ut two types of systems software are central to e!ery
computer system software today7 an operating system and a compiler. 
An  :perating  system  interfaces  $etween  a  user?s  program  and  the  hardware  and  pro!ides  a
!ariety of ser!ices and super!isory functions.  Among the most important functions are7
 @andling $asic input and output operations
 Allocating storage and memory
 Pro!iding  for  protected  sharing  of  the  computer  among  multiple  applications  using  it
simultaneously.
E"amples of operating systems in use today are Linu"% i:'% and Aindows
Compiler:  A  program  that   translates   high#le!el   language  statements   into  assem$ly  language
statements. 
Instruction7 A command that computer hardware understand and o$eys. 
Assembler: A program that translates a sym$olic !ersion of instructions into the $inary !ersion
Assembly an!ua!e7 A sym$olic representation of machine instructions.
"achine an!ua!e7 A $inary representation of machine instructions.   
*******************##############################******************
Techn!!%ies "!r <ui&in% Pr!cess!rs an& Me#!r+
Processors and memory ha!e impro!ed at an incredi$le rate% $ecause computer designers ha!e
long  em$raced  the  latest   in  electronic  technology  to  try  to  win  the  race  to  design  a  $etter
computer.
A  transistor   is   simply  an  on9off   switch  controlled  $y  electricity.   The   integrated  circuitBICC
com$ined  doens   to  hundreds   of   transistors   into  a  single  chip.   To  descri$e  the  tremendous
increase in the num$er of transistors from hundreds to million% the ad)ecti!e !ery large scale is
added to the term% creating the a$$re!iation DL'I% for !ery large scale integrated circuit.
The process starts with a silicon crystal ingot which loo&s li&e a giant sausage. An ingot is finely
sliced  into  wafers  no  more  than  ;.2  inches  thic&.   These  wafers  then  go  through  a  series  of
processing  steps%   during  which  patterns  of  chemicals  are  placed  on  each  wafer%   creating  the
transistors% conductors and insulators. 
"anufacturin! #rocess of $nte!rated Circuits
The patterned wafer is then chopped up% or diced% into these components called dies and more
informally &nown as chi$s. 
5icing ena$les you to discard only those dies that were unluc&y enough to contain the fl aws%
rather than the whole wafer. This concept is +uantified $y the +ie& of a process% which is defined
as the percentage of good dies from the total num$er of dies on the wafer.
The cost of an integrated circuit rises +uic&ly as the die sie increases% due $oth to the lower
yield  and  the  smaller  num$er  of  dies  that   fit   on  a  wafer. To  reduce  the  cost%   using  the  ne"t
generation process shrin&s a large die as it uses smaller sies for $oth transistors and wires. This
impro!es the yield and the die count per wafer. 
The cost of an IC can $e e"pressed in three simple e+uations7
**************#####################*****************
Per"!r#ance
Accurately   measuring   and   comparing   different   computers   is   critical.   Performance   can   $e
determined $y different ways.
Res$!nse  ti#e  Also  called  e=ecuti!n  ti#e.   Th  e  total   time   re+uired  for   the  computer   to
complete  a  tas&%   including  dis&  accesses%   memory  accesses%   I9:  acti!ities%   operating  system
o!erhead% CPU e"ecution time% and so on.
5atacenter managers  are  oft en interested in  increasing  thr!u%h$ut  or  *an&3i&th(the  total
amount of wor& done in a gi!en time.
Thr!u%h$ut an& Res$!nse Ti#e
5o the following changes to a computer system increase throughput% decrease response time% or
$othE
2. -eplacing the processor in a computer with a faster !ersion
<. Adding additional processors to a system that uses multiple processors
for separate tas&s(for e"ample% searching the we$
5ecreasing response time almost always impro!es throughput. @ence% in case 2% $oth response
time   and  throughput   are  impro!ed.   In  case  <%   no  one  tas&  gets   wor&  done  faster%   so  only
throughput increases.
Performance of computers primarily concerned with response time. To ma"imie performance%
minimie the response time of e"ecution time for some tas&. Thus% performance and e"ecution
time can $e related for a computer 8 as%
Performance of two different computers can $e related +uantitati!ely li&e .8 is  n times faster
than F0 or e+ui!alently .8 is n times as fast as F0  to mean 
Measurin% Per"!r#ance
Time  is  the  measure  of  computer   performance7   the  computer   that   performs  the  same
amount of wor& in the least time is the fastest. Program execution time is measured in seconds
per program.
Th e most straightforward defi nition of time is called wall clock time% response time% or elapsed
time.  Th  ese  terms  mean  the  total   time  to  complete  a  tas&%   including  dis&  accesses%   memory
accesses% input/output BI9:C acti!ities% operating system o!erhead(e!erything.
CPU e=ecuti!n ti#e Also called CPU ti#e. Th e actual time the CPU spends computing
for a specific tas&.
User CPU ti#e , The CPU time spent in a program itself.
S+ste# CPU ti#e The CPU time spent in the operating system performing tas&s on $ehalf of the
program.
Almost all computers are constructed using a cloc& that determines when e!ents ta&e place in the
hardware.  Th  ese  discrete  time  inter!als  are  called  c!c8  c+ces  Bor  tic8s%  c!c8  tic8s%  c!c8
$eri!&s%  c!c8s%  c+cesC. 5esigners refer to the length of a c!c8 $eri!& $oth as the time for a
complete clock cycle Be.g.% <4; picoseconds% or <4; psC and as the clock rate Be.g.% 6 gigahert% or
6 /@C% which is the in!erse of the cloc& period.
CPU Per"!r#ance an& Its >act!rs
Users  and  designers  oft   en  e"amine  performance  using  different   metrics.   A  simple  formula
relates the most $asic metrics Bcloc& cycles and cloc& cycle timeC to CPU time7
This formula ma&es it clear that the hardware designer can impro!e performance $y reducing the
num$er of cloc& cycles re+uired for a program or the length of the cloc& cycle.
Instructi!n Per"!r#ance
The num$er of cloc& cycles re+uired for a program
The  term c!c8  c+ces   $er  instructi!n%   which  is  the  a!erage  num$er   of   cloc&  cycles   each
instruction ta&es to e"ecute% is oft en a$$re!iated as CPI. 'ince different instructions may ta&e
diff erent amounts of time depending on what they do% CPI is an a!erage of all the instructions
e"ecuted in the program. CPI pro!ides one way of comparing two diff erent implementations of
the same instruction set architecture% since the num$er of instructions e"ecuted for a program
will% of course% $e the same.
The   performance   of   a  program  depends   on  the  algorithm%   the  language%   the   compiler%   the
architecture  and  the actual   hardware.  The  following ta$le summaries how these  components
affect the factors in the CPU performance e+uation.
A#&ah1s 2a3
Amdahl?s Law states that the performance impro!ement to $e gained from using some faster
mode of e"ecution is limited $y the fraction of the time the faster mode can $e used.
The P!3er 3a
Goth cloc& rate and power increased rapidly and grew together since they are correlated. Gattery
life can trump performance in the personal mo$ile de!ice% and the architects of warehouse scale
computers try to reduce the costs of powering and cooling 2;;%;;; ser!ers as the costs are high
at this scale. Hust as measuring time in seconds is a safer measure of program performance than a
rate li&e ,IP'% the energy metric )oules is a $etter measure than a power rate li&e watts% which is
)ust )oules9second.
The dominant technology for integrated circuits is called C,:' Bcomplementary metal o"ide
semiconductorC.   >or  C,:'%   the  primary  source  of  energy  consumption  is  so#called  dynamic
energy(that  is% energy that is consumed when transistors switch states from ; to 2 and !ice
!ersa. The dynamic energy depends on the capaciti!e loading of each transistor and the !oltage
applied7
>re+uency  switched  is  a  function  of  the  cloc&  rate.   Th  e  capaciti!e  load  per   transistor   is  a
function of $oth the num$er of transistors connected to an output Bcalled the  fanoutC and the
technology% which determines the capacitance of $oth wires and transistors.
**********************#############################************************
The switch from Uni$r!cess!rs t! Muti$r!cess!rs
-easons for switching from unicore processors to ,ulticore processors7 
 5ifficult to ma&e single#core cloc& fre+uencies e!en higher
 5eeply pipelined circuits7
 heat pro$lems
 speed of light pro$lems
 difficult design and !erification
 large design teams necessary
 ser!er farms need e"pensi!e air#conditioning
 ,any new applications are multithreaded
 /eneral trend in computer architecture Bshift towards more parallelismC