Lecture Notes on
COMPUTER ORGANIZATION AND ARCHITECTURE-17EC13
(B. Tech V Semester)
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(Autonomous), Mylavaram-521230
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 1
UNIT-I Notes
Definition of Computer
A programmable electronic device designed to accept data, perform prescribed mathematical and
logical operations at high speed, and display the results of these operations.
Functional Units
A computer consists of five functionally independent main parts: input, memory, arithmetic and
logic, output, and control units,
Fig 1:Basic Functional Units of a Computer
Input Unit:
Computers accept coded information through input units
The input unit accepts coded information from human operators using devices such as
keyboards, or from other computers over digital communication lines.
The most common input device is the keyboard. Whenever a key is pressed, the
corresponding letter or digit is automatically translated into its corresponding binary code
and transmitted to the processor.
Ex: Keyboard, Mouse, Scanner, Microphone etc.
Memory Unit:
The function of the memory unit is to store programs and data.
There are two classes of storage, called primary and secondary.
Primary memory, also called main memory, is a fast memory that operates at electronic
speeds. Programs must be stored in this memory while they are being executed.
The memory consists of a large number of semiconductor storage cells, each capable of
storing one bit of information.
These cells are rarely read or written individually. Instead, they are handled in groups of
fixed size called words.
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 2
The memory is organized so that one word can be stored or retrieved in one basic
operation.
The number of bits in each word is referred to as the word length of the computer,
typically 16, 32, or 64 bits.
A memory in which any location can be accessed in a short and fixed amount of time
after specifying its address is called a Random-access memory (RAM).
The time required to access one word is called the memory access time.
Secondary storage is used when large amounts of data and many programs have to be
stored, particularly for information that is accessed infrequently.
Access times for secondary storage are longer than for primary memory. A wide
selection of secondary storage devices is available, including magnetic disks, optical
disks (DVD and CD), and flash memory devices.
Arithmetic and Logic Unit:
Most computer operations are executed in the arithmetic and logic unit (ALU) of the
processor.
Any arithmetic or logic operation, such as addition, subtraction, multiplication, division,
or comparison of numbers, is initiated by bringing the required operands into the
processor, where the operation is performed by the ALU.
Ex: if two numbers located in the memory are to be added, they are brought into the
processor, and the addition is carried out by the ALU. The sum may then be stored in the
memory or retained in the processor for immediate use.
Output Unit:
The function of the unit is to send processed results to the outside world.
Ex: Printer, Monitor, Speakers, Projector etc.
Control Unit:
It co-ordinates the operation of memory, arithmetic and logic, and I/O units
Control circuits are responsible for generating the timing signals that govern the transfers
and determine when a given action is to take place.
Stored Program Concept
The simplest way to organize a computer is to have one processor register and an
instruction code format with two parts.
The first part specifies the operation to be performed and the second specifies an address.
The memory address tells the control where to find an operand in memory.
This operand is read from memory and used as the data to be operated on together with the
data stored in the processor register.
Fig 2: Memory organization
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 3
Basic Operational Concepts
To perform a given task, an appropriate program consisting of a list of instructions is stored
in the memory. Individual instructions are brought from the memory into the processor, which
executes the specified operations. Data to be used as instruction operands are also stored in the
memory.
Ex: Load R2, LOC
This instruction reads the contents of a memory location whose address is represented
symbolically by the label LOC and loads them into processor register R2.
The original contents of location LOC are preserved, whereas those of register R2 are
overwritten.
Execution of this instruction requires several steps.
First, the instruction is fetched from the memory into the processor.
Next, the operation to be performed is determined by the control unit.
The operand at LOC is then fetched from the memory into the processor.
Finally, the operand is stored in register R2.
Fig-3 shows how the memory and the processor can be connected.
In addition to the ALU and the control circuitry, the processor contains a number of
registers used for several different purposes.
The instruction register (IR) holds the instruction that is currently being executed. Its
output is available to the control circuits, which generate the timing signals that control the
various processing elements involved in executing the instruction.
The Program Counter (PC) contains the memory address of the next instruction to be
fetched and executed. During the execution of an instruction, the contents of the PC are
updated to correspond to the address of the next instruction to be executed.
The general-purpose registers R0 through Rn−1, often called processor registers. They
serve a variety of functions, including holding operands that have been loaded from the
memory for processing.
The processor-memory interface is a circuit which manages the transfer of data between
the main memory and the processor.
If a word is to be read from the memory, the interface sends the address of that word to the
memory along with a Read control signal.
The interface waits for the word to be retrieved, then transfers it to the appropriate
processor register.
If a word is to be written into memory, the interface transfers both the address and the
word to the memory along with a Write control signal.
Finally, two registers facilitate communication with the memory. These are the memory
address register (MAR) and the memory data register (MDR). The MAR holds the address
of the location to be accessed. The MDR contains the data to be written into or read out of
the addressed location.
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 4
Fig 3: Connection between the processor and the main memory.
Bus Structures
To achieve a reasonable speed of operation, a computer must be organized so that all
its units can handle one full word of data at a given time.
When a word of data is transferred between units, all its bits are transferred in parallel,
that is, the bits are transferred simultaneously over many wires, or lines, one bit per
line.
A group of lines that serves as a connecting path for several devices, is called a bus. In
addition to the lines that carry the data, the bus must have lines for address and control
purposes. The simplest way to interconnect functional units is to use a single bus, as
shown in Fig-3. All units are connected to this bus. Because the bus can be used for
only one transfer at a time, only two units can actively use the bus at any given time.
Bus control lines are used to arbitrate multiple requests for use of the bus. The main
virtue of the single-bus structure is its low cost and its flexibility for attaching
peripheral devices.
Systems that contain multiple buses achieve more concurrency in operations by
allowing two or more transfers to be carried out at the same time. This leads to better
performance but at an increased cost.
The devices connected to a bus vary widely in their speed of operation. Some
electromechanical devices, such as keyboards and printers, are relatively slow. Others,
like magnetic or optical disks, are considerably faster.
Memory and processor units operate at electronic speeds, making them the fastest parts
of a computer. Because all these devices must communicate with each other over a bus,
an efficient transfer mechanism that is not constrained by the slow devices and that can
be used to smooth out the differences in timing among processors, memories, and
external devices is necessary.
A common approach is to include buffer registers with the devices to hold the
information during transfers. To illustrate this technique, consider the transfer of an
encoded character from a processor to a character printer.
The processor sends the character over the bus to the printer buffer. Since the buffer is
an electronic register, this transfer requires relatively little time. Once the buffer is
loaded, the printer can start printing without further intervention by the processor.
The bus and the processor are no longer needed and can be released for other activity.
The printer continues printing the character in its buffer and is not available for further
transfers until this process is completed.
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 5
Thus, buffer registers smooth out timing differences among processors, memories, and
I/O devices. They prevent a high-speed processor from being locked to a slow I/O
device during a sequence of data transfers.
This allows the processor to switch rapidly from one device to another, interweaving
its processing activity with data transfers involving several I/O devices.
Fig-4: Single Bus Structure
Software:
System software is a collection of programs that are executed as needed to perform such
function as
Receiving and Interpreting User Commands
Entering and editing application programs and storing them as files in secondary
storage devices
Managing the storage and retrieval of files in secondary storage devices
Running standard application programs such as word processors, spreadsheets, or
games, with data supplied by the user
Controlling I/O units to receive input information and produce output results
Translating programs from source form prepared by the user into object form
consisting of machine instructions
Linking and running user-written application programs with existing standard
library routines, such as numerical computation packages.
System software is thus responsible for the coordination of all activities in a computing system.
The purpose of this section is to introduce some basic aspects of system software.
A programmer using a high-level language need not know the details of machine program
instructions. A system software program called a compiler translates the high-level language
program into a suitable machine language program containing instructions.
Another important system program that all programmers use is a text editor. It is used for entering
and editing application programs. The user of this program interactively executes commands that
allow statements of a source program entered at a keyboard to be accumulated in a file.
A file is simply a sequence of alphanumeric characters or binary data that is stored in memory or
in secondary storage. A file can be referred to by a name chosen by the user.
A key system software component called the operating system (OS). This is a large program, or
actually a collection of routines, that is used to control the sharing of and interaction among
various computer units as they execute application programs.
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 6
In order to understand the basics of operating systems, let us consider a system with one
processor, one disk, and one printer.
Assume that the application program has been compiled from a high-level language form into a
machine language form and stored on the disk
The first step is to transfer this file into the memory. When the transfer is complete, execution of
the program is started. Assume that part of the program's task involves reading a data file from
the disk into the memory, performing some computation on the data, and printing the results.
When execution of the program reaches the point where the data file is needed, the program
requests the operating system to transfer the data file from the disk to the memory.
The OS performs this task and passes execution control back to the application program, which
then proceeds to perform the required computation. When the computation is completed and the
results are ready to be printed, the application program again sends a request to the operating
system. An OS routine is then executed to cause the printer to print the results.
We have seen how execution control passes back and forth between the application program and
the OS routines. A convenient way to illustrate this sharing of the processor execution time is by
a time-line diagram, such as that shown in Figure 5.
During the time period to to t1, an OS routine initiates loading the application program from disk
to memory, waits until the transfer is completed, and then passes execution control to the
application program.
Fig-5: User Program and OS routine sharing.
A similar pattern of activity occurs during period t2 to t3 and period t4 to t5, when the operating
system transfers the data file from the disk and prints the results.
At t5, the operating system may load and execute another application program. Now, let us point
out a way that computer resources can be used more efficiently if several application programs
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 7
are to be processed. Notice that the disk and the processor are idle during most of the time period
t4 to t5. The operating system can load the next program to be executed into the memory from the
disk while the printer is operating.
Similarly, during t0 to t1, the operating system can arrange to print the previous program's results
while the current program is being loaded from the disk. Thus, the operating system manages the
concurrent execution of several application programs to make the best possible use of computer
resources. This pattern of concurrent execution is called multiprogramming or multitasking.
Performance
The speed with which a computer executes programs is affected by the design of its
hardware and its machine language instructions
Because programs are usually written in a high-level language, performance is also
affected by the compiler that translates programs into machine languages
For best performance, the following factors must be considered
Compiler
Instruction set
Hardware design
The elapsed time is a measure of the performance of the entire computer system. Just as the
elapsed time for the execution of a program depends on all units in a computer system, the
processor time depends on the hardware involved in the execution of individual machine
instructions. This hardware comprises the processor and the memory, which are usually
connected by a bus including the cache memory as part of the processor unit.
Let us examine the flow of program instructions and data between the memory and the processor.
At the start of execution, all program instructions and the required data arc stored in the main
memory.
As execution proceeds, instructions are fetched one by one over the bus into the processor, and
a copy is placed in the cache. When the execution of an instruction calls for data located in the
main memory, the data are fetched and a copy is placed in the cache.
Later, if the same instruction or data item is needed a second time, it is read directly from the
cache. The processor and a relatively small cache memory can be fabricated on a single integrated
circuit chip.
The internal speed of performing the basic steps of instruction processing on such chips is very
high and is considerably faster than the speed at which instructions and data can be fetched from
the main memory.
A program will be executed faster if the movement of instructions and data between the main
memory and the processor is minimized, which is achieved by using the cache. For example,
Fig 6: The Processor Cache
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 8
Processor Clock:
Processor circuits are controlled by a timing signal called a clock. The clock defines regular
time intervals, called clock cycles.
To execute a machine instruction, the processor divides the action to be performed into a
sequence of basic steps, such that each step can be completed in one clock cycle.
The length P of one clock cycle is an important parameter that affects processor performance.
Its inverse is the clock rate, R = 1/P, which is measured in cycles per second.
Processors used in today's personal computers and workstations have clock rates that range
from a few hundred million to over a billion cycles per second.
Basic Performance Equation:
Let T be the processor time required to execute a program that has been prepared in some
high-level language.
The compiler generates a machine language object program that corresponds to the source
program. Assume that complete execution of the program requires the execution of N machine
language instructions. The number N is the actual number of instruction executions and is not
necessarily equal to the number of machine instructions in the object program.
Some instructions may be executed more than once, which is the case for instructions
inside a program loop. Others may not be executed at all, depending on the input data used.
Suppose that the average number of basic steps needed to execute one machine instruction is S,
where each basic step is completed in one clock cycle. If the clock rate is R cycles per second,
the program execution time is
𝑁×𝑆
𝑇=
𝑅
This is often referred to as “Basic Performance Equation”.
Pipelining and superscalar operation:
Instructions are not necessarily executed one after another.
The value of S does not have to be the number of clock cycles to execute one instruction.
Pipelining – overlapping the execution of successive instructions.
Add R1, R2, R3
Superscalar operation – different instructions are concurrently executed with multiple
instruction pipelines. This means that multiple functional units are needed
Goal – reduce S (could become <1!)
Clock rate:
Improving the integrated-circuit technology makes logic circuits faster, which reduces the
time needed to complete a basic step
CISC and RISC:
Simple instructions require a small number of basic steps to execute. Complex instructions
involve many steps. For a processor that has only simple instructions, a large number of
instructions may be needed to perform a given programming task. This could lead to a large value
for N and a small value for S. On the other hand, if individual instructions perform more complex
Department of ECE, Lakireddy Bali Reddy College of Engineering Page 9
operations, fewer instructions will be needed, leading to a lower value of N and a larger value of
S. It is not obvious if one choice is better than the other.
The former arc called Reduced Instruction Set Computers (RISC), and the latter are referred to
as Complex Instruction Set Computers (CISC).
Compiler:
A compiler translates a high-level language program into a sequence of machine instructions. To
reduce N, we need to have a suitable machine instruction set and a compiler that makes good use
of it.
An optimizing compiler takes advantage of various features of the target processor to reduce the
product N x S, which is the total number of clock cycles needed to execute a program.
The compiler may rearrange program instructions to achieve better performance. Of course, such
changes must not affect the result of the computation.
Performance Measurement
The Performance measure is the time it takes a computer to execute the given benchmark.
The accepted practice today is to use an agreed-upon selection of real application programs to
evaluate performance.
A non-profit organization called System Performance Evaluation Corporation (SPEC) selects
and publishes representative application programs for different application domains, together
with test results for many commercially available computers.
For general-purpose computers, a suite of benchmark programs was selected in 1989. It was
modified somewhat and published in 1995 and again in 2000.
The programs selected range from game playing, compiler, and database applications to
numerically intensive programs in astrophysics and quantum chemistry.
In each case, the program is compiled for the computer under test, and the running time on a real
computer is measured. (Simulation is not allowed.) The same program is also compiled and run
on one computer selected as a reference. For SPEC95, the reference is the SUN SPARCstation
10/40. For SPEC2000, the reference computer is an Ultra-SPARCI 0 workstation with a 300-
MHz UltraSPARC-Ili processor. The SPEC rating is computed as follows
Thus, a SPEC rating of 50 means that the computer under test is 50 times as fast as the
UltraSPARC10 for this particular benchmark. The test is repeated for all the programs in the
SPEC suite, and the geometric mean of the results is computed. Let SPEC be the rating for
program i in the suite. The overall SPEC rating for the computer is given by
Where n is the number of programs in the suite.
Department of ECE, Lakireddy Bali Reddy College of Engineering P a g e 10
Multiprocessors and Multicomputers
Large computer systems may contain a number of processor units, in which case they are called
multiprocessor systems.
These systems either execute a number of different application tasks in parallel, or they execute
subtasks of a single large task in parallel.
All processors usually have access to all of the memory in such systems, and the term shared-
memory multiprocessor systems is often used to make this clear. The high performance of these
systems comes with much increased complexity and cost. In addition to multiple processors and
memory units, cost is increased because of the need for more complex interconnection networks.
In contrast to multiprocessor systems, it is also possible to use an interconnected group of
complete computers to achieve high total computational power. The computers normally have
access only to their own memory units. When the tasks they are executing need to communicate
data, they do so by exchanging messages over a communication network. This property
distinguishes them from shared-memory multiprocessors, leading to the name message-passing
multicomputers.
Department of ECE, Lakireddy Bali Reddy College of Engineering P a g e 11