Embedded System Design
Concepts
Module-2
Contents
• Characteristics and Quality Attributes of Embedded Systems
• Operational and non-operational quality attributes
• Embedded Systems-Application and Domain specific
• Hardware Software Co-Design and Program Modelling (excluding
UML)
• Embedded firmware design and development (excluding C language)
Characteristics and Quality
Attributes of Embedded Systems
CHARACTERISTICS OF AN EMBEDDED SYSTEM
Some of the important characteristics of an embedded system are as
follows:
(1) Application and domain specific
(2) Reactive and Real Time
(3) Operates in harsh environments
(4) Distributed
(5) Small size and weight
(6) Power concerns
(1) Application and domain specific
• Each embedded system is designed to perform set of defined
functions and they are developed in such a manner to do the
intended functions only.
• They cannot be used for any other purpose.
• It is the major criterion which distinguishes an embedded system
from a general purpose computing system.
• For example, you cannot replace the embedded control unit of your
microwave oven with your air conditioner’s embedded control unit,
because the embedded control units of microwave oven and air
conditioner are specifically designed to perform certain specific tasks.
(2) Reactive and Real Time
• As mentioned earlier, embedded systems are in constant interaction
with the Real world through sensors and user-defined input devices
which are connected to the input port of the system.
• Any changes happening in the Real world are captured by the sensors
or input devices in Real Time and the control algorithm running inside
the unit reacts in a designed manner to bring the controlled output
variables to the desired level.
• Real Time System operation means the timing behaviour of the
system should be deterministic; meaning the system should respond
to requests or tasks in a known amount of time.
(3) Operates in harsh environments
• The environment in which the embedded system is deployed may be
a dusty one or a high temperature zone or an area subject to
vibrations and shock.
• Systems placed in such areas should be capable to withstand all these
adverse operating conditions.
• The design should take care of the operating conditions of the area
where the system is going to implement.
• For example, if the system needs to be deployed in a high
temperature zone, then all the components used in the system
should be of high temperature grade.
(4) Distributed
• The term distributed means that embedded systems may be a part of
larger systems.
• Many numbers of such distributed embedded systems form a single large
embedded control unit.
• An Automatic Teller Machine (ATM) is a typical example for this.
• An ATM contains a card reader embedded unit, responsible for reading and
validating the user’s ATM card, transaction unit for performing
transactions, a currency counter for dispatching/vending currency to the
authorised person and a printer unit for printing the transaction details.
We can visualise these as independent embedded systems. But they work
together to achieve a common goal.
(5) Small size and weight
• Product aesthetics is an important factor in choosing a product.
• For example, when you plan to buy a new mobile phone, you may
make a comparative study on the pros and cons of the products
available in the market.
• Definitely the product aesthetics (size, weight, shape, style, etc.) will
be one of the deciding factors to choose a product.
• People believe in the phrase “Small is beautiful”.
• In embedded domain also compactness is a significant deciding
factor.
(6) Power concerns
• Embedded systems should be designed in such a way as to minimise
the heat dissipation by the system.
• The production of high amount of heat demands cooling
requirements like cooling fans which in turn occupies additional space
and make the system bulky.
• Nowadays ultra low power components are available in the market.
• Select the design according to the low power components like low
dropout regulators, and controllers/processors with power saving
modes.
Operational and non-operational
quality attributes
Quality Attributes of Embedded Systems
• Quality attributes are the non-functional requirements that need to
be documented properly in any system design.
• If the quality attributes are more concrete and measurable it will give
a positive impact on the system development process and the end
product.
• The various quality attributes that needs to be addressed in any
Embedded System development are broadly classified into two,
namely ‘Operational Quality Attributes’ and ‘Non-Operational Quality
Attributes’.
Operational Quality Attributes
• The operational quality attributes represent the relevant quality
attributes related to the Embedded System when it is in the
operational mode or ‘online’ mode.
• The important quality attributes coming under this category are listed
below:
(1) Response
(2) Throughput
(3) Reliability
(4) Maintainability
(5) Security
(6) Safety
(1) Response
• Response is a measure of quickness of the system. It gives you an idea
about how fast your system is tracking the changes in input variables.
• Most of the embedded systems demand fast response which should be
almost Real Time.
• For example, an embedded system deployed in flight control application
should respond in a Real Time manner.
• Any response delay in the system will create potential impact to the safety
of the flight as well as the passengers.
• It is not necessary that all embedded systems should be Real Time in
response. For example, the response time requirement for an electronic
toy is not at all time-critical.
(2) Throughput
• Throughput deals with the efficiency of a system.
• In general it can be defined as the rate of production or operation of a
defined process over a stated period of time.
• The rates can be expressed in terms of units of products, batches
produced, or any other meaningful measurements.
• In the case of a Card Reader, throughput means how many
transactions the Reader can perform in a minute or in an hour or in a
day.
• Throughput is generally measured in terms of ‘Benchmark’.
(3) Reliability
• Reliability is a measure of how much % you can rely upon the proper
functioning of the system or what is the % susceptibility of the system
to failures.
• Mean Time Between Failures ( MTBF) and Mean Time To Repair (
MTTR) are the terms used in defining system reliability.
• MTBF gives the frequency of failures in hours/weeks/months.
• MTTR specifies how long the system is allowed to be out of order
following a failure.
• For an embedded system with critical application need, it should be
of the order of minutes.
(4) Maintainability
• Maintainability deals with support and maintenance to the end user or client in
case of technical issues and product failures or on the basis of a routine system
check-up.
• Reliability and maintainability are considered as two complementary disciplines.
• Maintainability can be broadly classified into two categories, namely, ‘Scheduled
or Periodic Maintenance (preventive maintenance)’ and ‘Maintenance to
unexpected failures (corrective maintenance)’.
• The period may be based on the total hours of the system usage or the total
output the system delivered.
• A printer is a typical example for illustrating the two types of maintainability.
• An inkjet printer uses ink cartridges, which are consumable components and as
per the printer manufacturer the end user should replace the cartridge after each
‘n’ number of printouts to get quality prints.
(5) Security
• ‘Confidentiality’, ‘ Integrity’, and ‘ Availability’ are the three major
measures of information security.
• Confidentiality deals with the protection of data and application from
unauthorised disclosure.
• Integrity deals with the protection of data and application from
unauthorised modification.
• Availability deals with protection of data and application from
unauthorised users.
(6) Safety
• Safety deals with the possible damages that can happen to the
operators, public and the environment due to the breakdown of an
embedded system or due to the emission of radioactive or hazardous
materials from the embedded products.
• The breakdown of an embedded system may occur due to a hardware
failure or a firmware failure.
• Safety analysis is a must in product engineering to evaluate the
anticipated damages and determine the best course of action to bring
down the consequences of the damages to an acceptable level.
Non-Operational Quality Attributes
The quality attributes that needs to be addressed for the product ‘not’
on the basis of operational aspects are grouped under this category.
The important quality attributes coming under this category are listed
below.
(1) Testability & Debug-ability
(2) Evolvability
(3) Portability
(4) Time to prototype and market
(5) Per unit and total cost
(1) Testability & Debug-ability
• Testability deals with how easily one can test the design, application and by
which means he/she can test it.
• For an embedded product, testability is applicable to both the embedded
hardware and firmware.
• Embedded hardware testing ensures that the peripherals and the total
hardware functions in the desired manner, whereas firmware testing
ensures that the firmware is functioning in the expected way.
• Debug-ability is a means of debugging the product as such for figuring out
the probable sources that create unexpected behaviour in the total system.
• Debug-ability has two aspects in the embedded system development
context, namely, hardware level debugging and firmware level debugging.
(2) Evolvability
• Evolvability is a term which is closely related to Biology.
• Evolvability is referred as the non-heritable variation.
• For an embedded system, the quality attribute ‘Evolvability’ refers to
the ease with which the embedded product (including firmware and
hardware) can be modified to take advantage of new firmware or
hardware technologies.
• The evolution of the mobile phones from 1G to 5G and the associated
hardware and firmware is the perfect example for Evolvability.
(3) Portability
• Portability is a measure of ‘system independence’.
• An embedded product is said to be portable if the product is capable of
functioning ‘as such’ in various environments, target processors/
controllers and embedded operating systems.
• The ease with which an embedded product can be ported on to a new
platform is a direct measure of the re-work required. A standard
embedded product should always be flexible and portable.
• In embedded products, the term ‘porting’ represents the migration of the
embedded firmware written for one target processor (e.g. Intel x86) to a
different target processor (say an ARM Cortex M3 processor from
Freescale).
(4) Time to prototype and market
• Time-to-market is the time elapsed between the conceptualisation of a
product and the time at which the product is ready for selling (for
commercial product) or use (for non-commercial products).
• The commercial embedded product market is highly competitive and time
to market the product is a critical factor in the success of a commercial
embedded product.
• There may be multiple players in the embedded industry who develop
products of the same category (like mobile phone, portable media players,
etc.).
• If you come up with a new design and if it takes long time to develop and
market it, the competitor product may take advantage of it with their
product.
(5) Per unit and total cost
• Cost is a factor which is closely monitored by both end user (those
who buy the product) and product manufacturer (those who build
the product).
• Cost is a highly sensitive factor for commercial products. Any failure
to position the cost of a commercial product at a nominal rate, may
lead to the failure of the product in the market.
• Proper market study and cost benefit analysis should be carried out
before taking a decision on the per-unit cost of the embedded
product.
• From a designer/product development company perspective the
ultimate aim of a product is to generate marginal profit.
Embedded Systems-Application
and Domain specific
WASHING MACHINE—APPLICATION-SPECIFIC
EMBEDDED SYSTEM
• Washing machine is a typical example of an
embedded system providing extensive support in
home automation applications.
• As mentioned earlier, an embedded system contains
sensors, actuators, control unit and application-
specific user interfaces like keyboards, display units,
etc.
• The actuator part of the washing machine consists of
a motorised agitator, tumble tub, water drawing
pump and inlet valve to control the flow of water
into the unit.
WASHING MACHINE—APPLICATION-SPECIFIC
EMBEDDED SYSTEM
• The sensor part consists of the water temperature sensor, level sensor, etc.
• The control part contains a microprocessor/controller based board with
interfaces to the sensors and actuators.
• The sensor data is fed back to the control unit and the control unit
generates the necessary actuator outputs.
• The control unit also provides connectivity to user interfaces like keypad for
setting the washing time, selecting the type of material to be washed like
light, medium, heavy duty, etc.
• User feedback is reflected through the display unit and LEDs connected to
the control board.
WASHING MACHINE—APPLICATION-SPECIFIC
EMBEDDED SYSTEM
• Washing machine comes in different designs, like top loading and front
loading machines.
• In top loading models the agitator of the machine twists back and forth
and pulls the cloth down to the bottom of the tub.
• On reaching the bottom of the tub the clothes work their way back up to
the top of the tub where the agitator grabs them again and repeats the
mechanism.
• In the front loading machines, the clothes are tumbled and plunged into
the water over and over again. This is the first phase of washing.
• In the second phase of washing, water is pumped out from the tub and the
inner tub uses centrifugal force to wring out more water from the clothes
by spinning at several hundred Rotations Per Minute (RPM). This is called a
‘Spin Phase’.
AUTOMOTIVE–DOMAIN SPECIFIC EXAMPLES
OF EMBEDDED SYSTEM
Inner Workings of Automotive Embedded
Systems
• Automotive embedded systems are the one where electronics take control
over the mechanical and electrical systems.
• The presence of automotive embedded system in a vehicle varies from
simple mirror and wiper controls to complex air bag controller and Anti-
lock Brake Systems (ABS).
• Automotive embedded systems are normally built around microcontrollers
or DSPs or a hybrid of the two and are generally known as Electronic
Control Units (ECUs).
• The various types of electronic control units (ECUs) used in the automotive
embedded industry can be broadly classified into two–High-speed
embedded control units and Low-speed embedded control units.
Automotive Communication Buses
• Automotive applications make use of serial buses for communication,
which greatly reduces the amount of wiring required inside a vehicle.
• Controller Area Network (CAN): It supports medium speed (ISO11519-
class B with data rates up to 125 Kbps) and high speed (ISO11898 class C
with data rates up to 1Mbps) data transfer.
• Local Interconnect Network (LIN):LIN bus is a single master multiple slave
(up to 16 independent slave nodes) communication interface. LIN is a low
speed, single wire communication interface with support for data rates up
to 20 Kbps and is used for sensor/actuator interfacing.
• Media Oriented System Transport (MOST) Bus: The Media Oriented
System Transport ( MOST) is targeted for high-bandwidth automotive
multimedia networking (e.g. audio/video, infotainmentsystem interfacing),
used primarily in European cars.
Hardware Software Co-Design
and Program Modelling
(excluding UML)
FUNDAMENTAL ISSUES IN HARDWARE
SOFTWARE CO-DESIGN
• Selecting the model
• Selecting the Architecture
• Selecting the language
• Partitioning System Requirements into hardware and software
Selecting the model
• In hardware software co-design, models are used for capturing and
describing the system characteristics.
• A model is a formal system consisting of objects and composition rules.
• It is hard to make a decision on which model should be followed in a
particular system design.
• Most often designers switch between a variety of models from the
requirements specification to the implementation aspect of the system
design.
• The reason being, the objective varies with each phase; for example at the
specification stage, only the functionality of the system is in focus and not
the implementation information.
Selecting the Architecture
• A model only captures the system characteristics and does not
provide information on ‘how the system can be manufactured?’.
• The architecture specifies how a system is going to implement in
terms of the number and types of different components and the
interconnection among them.
• Controller Architecture, Datapath Architecture, Complex Instruction
Set Computing (CISC), Reduced Instruction Set Computing (RISC), Very
Long Instruction Word Computing (VLIW), Single Instruction Multiple
Data (SIMD), Multiple Instruction Multiple Data (MIMD), etc. are the
commonly used architectures in system design.
Selecting the language
• A programming language captures a ‘Computational Model’ and maps
it into architecture.
• There is no hard and fast rule to specify this language should be used
for capturing this model.
• A model can be captured using multiple programming languages like
C, C++, C#, Java, etc. for software implementations and languages like
VHDL, System C, Verilog, etc. for hardware implementations.
• The only pre-requisite in selecting a programming language for
capturing a model is that the language should capture the model
easily.
Partitioning System Requirements into hardware
and software
• So far we discussed about the models for capturing the system
requirements and the architecture for implementing the system.
• From an implementation perspective, it may be possible to implement the
system requirements in either hardware or software (firmware).
• It is a tough decision making task to figure out which one to opt.
• Various hardware software trade-offs are used for making a decision on the
hardware-software partitioning.
COMPUTATIONAL MODELS IN EMBEDDED
DESIGN
• Data Flow Graph/Diagram (DFG) Model
• The Data Flow Graph ( DFG) model translates the data
processing requirements into a data flow graph.
• The Data Flow Graph (DFG) model is a data driven model in
which the program execution is determined by data.
• This model emphasises on the data and operations on the
data which transforms the input data to output data.
• Indeed Data Flow Graph (DFG) is a visual model in which
the operation on the data (process) is represented using a
block (circle) and data flow is represented using arrows.
• An inward arrow to the process (circle) represents input
data and an outward arrow from the process (circle)
represents output data in DFG notation.
The computational requirement is • Embedded applications which are computational intensive
x = a + b; and y = x – c and data driven are modelled using the DFG model.
Control Data Flow Graph/Diagram (CDFG)
• The Control DFG ( CDFG) model is used for
modelling applications involving conditional
program execution.
• CDFG models contains both data operations
and control operations.
• The CDFG uses Data Flow Graph (DFG) as
element and conditional (constructs) as
decision makers.
• CDFG contains both data flow nodes and
decision nodes, whereas DFG contains only
data flow nodes.
• If flag = 1, x = a + b; else y = a – b;
• This requirement contains a decision making
process.
State Machine
• The State Machine model is used for modelling reactive or event-driven
embedded systems whose processing behaviour are dependent on state
transitions
• Embedded systems used in the control and industrial applications are
typical examples for event driven systems.
• The State Machine model describes the system behaviour with ‘States’,
‘Events’, ‘Actions’ and ‘Transitions’.
• State is a representation of a current situation.
• An event is an input to the state. The event acts as stimuli for state
transition.
• Transition is the movement from one state to another.
• Action is an activity to be performed by the state machine.
Finite State Machine
• A Finite State Machine ( FSM) model is one in which the number of
states are finite.
• As an example let us consider the design of an embedded system for
driver/passenger ‘Seat Belt Warning’ in an automotive using the FSM
model.
• The system requirements are captured as.
1. When the vehicle ignition is turned on and the seat belt is not fastened
within 10 seconds of ignition ON, the system generates an alarm signal for 5
seconds.
2. The Alarm is turned off when the alarm time (5 seconds) expires or if the
driver/passenger fastens the belt or if the ignition switch is turned off,
whichever happens first.
Finite State Machine
Here the states are ‘Alarm
Off’, ‘Waiting’ and ‘Alarm On’
and the events are ‘Ignition
Key ON’, ‘Ignition Key OFF’,
‘Timer Expire’, ‘Alarm Time
Expire’ and ‘Seat Belt ON’.
Sequential Program Model
• In the sequential programming model, the functions or processing
requirements are executed in sequence.
• It is same as the conventional procedural programming.
• Here the program instructions are iterated and executed conditionally
and the data gets transformed through a series of operations.
• FSMs are good choice for sequential program modelling.
• Another important tool used for modelling sequential program is
Flow Charts.
• The FSM approach represents the states, events, transitions and
actions, whereas the Flow Chart models the execution flow.
Embedded firmware design and
development (excluding C language)
Introduction
• The embedded firmware is responsible for controlling the various
peripherals of the embedded hardware and generating response in
accordance with the functional requirements mentioned in the
requirements for the particular embedded product.
• Firmware is considered as the master brain of the embedded system.
• Imparting intelligence to an embedded system is a one time process and it
can happen at any stage, it can be done immediately after the fabrication
of the embedded hardware or at a later stage.
• Once intelligence is imparted to the embedded product, by embedding the
firmware in the hardware, the product starts functioning properly and will
continue serving the assigned task till hardware breakdown occurs or a
corruption in embedded firmware occurs.
Introduction
• Designing embedded firmware requires understanding of the particular
embedded product hardware, like various component interfacing, memory map
details, I/O port details, configuration and register details of various hardware
chips used and some programming language (either target processor/controller
specific low level assembly language or a high level language like C/C++/JAVA).
• Embedded firmware development process starts with the conversion of the
firmware requirements into a program model using modelling tools like UML or
flow chart based representation.
• The UML diagrams or flow chart gives a diagrammatic representation of the
decision items to be taken and the tasks to be performed.
• Once the program model is created, the next step is the implementation of the
tasks and actions by capturing the model using a language which is
understandable by the target processor/controller.
EMBEDDED FIRMWARE DESIGN APPROACHES
• The firmware design approaches for embedded product is purely
dependent on the complexity of the functions to be performed, the
speed of operation required, etc.
• Two basic approaches are used for Embedded firmware design.
• They are ‘Conventional Procedural Based Firmware Design’ and ‘
Embedded Operating System (OS) Based Design’.
• The conventional procedural based design is also known as ‘Super
Loop Model’.
The Super Loop Based Approach
• The Super Loop based firmware development approach is adopted for applications that
are not time critical and where the response time is not so important (embedded
systems where missing deadlines are acceptable).
• It is very similar to a conventional procedural programming where the code is executed
task by task.
• The task listed at the top of the program code is executed first and the tasks just below
the top are executed after completing the first task.
• The ‘Super loop based design’ doesn’t require an operating system, since there is no
need for scheduling which task is to be executed and assigning priority to each task.
• This type of design is deployed in low-cost embedded products and products where
response time is not time critical.
• For example, reading/writing data to and from a card using a card reader requires a
sequence of operations like checking the presence of card, authenticating the operation,
reading/writing, etc. it should strictly follow a specified sequence and the combination of
these series of tasks constitutes a single task-namely data read/write.
The firmware execution flow for Super Loop
based approach
1. Configure the common parameters and perform initialisation for
various hardware components memory, registers, etc.
2. Start the first task and execute it
3. Execute the second task
4. Execute the next task
5. :
6. :
7. Execute the last defined task
8. Jump back to the first task and follow the same flow
The Super Loop based approach
• From the firmware execution sequence, it is
obvious that the order in which the tasks to be
executed are fixed and they are hard coded in
the code itself.
• Also the operation is an infinite loop based
approach.
• We can visualise the operational sequence listed
above in terms of a ‘C’ program code as
• Since the tasks are running inside an infinite
loop, the only way to come out of the loop is
either a hardware reset or an interrupt assertion.
The Super Loop based approach
• A typical example of a ‘Super loop based’ product is an electronic video
game toy containing keypad and display unit.
• The program running inside the product may be designed in such a way
that it reads the keys to detect whether the user has given any input and if
any key press is detected the graphic display is updated.
• The keyboard scanning and display updating happens at a reasonably high
rate.
• Even if the application misses a key press, it won’t create any critical issues;
rather it will be treated as a bug in the firmware.
• It is not economical to embed an OS into low cost products and it is not
smart to do so if the response requirements are not crucial.
The Super Loop based approach
• The ‘Super loop based design’ is simple and straight forward without any
OS related overheads.
• The major drawback of this approach is that any failure in any part of a
single task may affect the total system.
• If the program hangs up at some point while executing a task, it may
remain there forever and ultimately the product stops functioning.
• Another major drawback of the ‘Super loop’ design approach is the lack of
real timeliness.
• If the number of tasks to be executed within an application increases, the
time at which each task is repeated also increases.
• This brings the probability of missing out some events.
The Embedded Operating System (OS) Based
Approach
• The Operating System (OS) based approach contains operating systems,
which can be either a General Purpose Operating System (GPOS) or a Real
Time Operating System (RTOS) to host the user written application
firmware.
• The General Purpose OS (GPOS) based design is very similar to a
conventional PC based application development where the device contains
an operating system (Windows/Unix/ Linux, etc. for Desktop PCs) and you
will be creating and running user applications on top of it.
• Real Time Operating System (RTOS) based design approach is employed in
embedded products demanding Real-time response. RTOS respond in a
timely and predictable manner to events.
EMBEDDED FIRMWARE DEVELOPMENT
LANGUAGES
Following are the options for the embedded firmware
development languages:
• Assembly language or low level language
• C, C++, JAVA or High Level Language
• a combination of Assembly and High level Language.
Assembly Language based Development
• ‘ Assembly language’ is the human readable notation of ‘ machine
language’, whereas ‘machine language’ is a processor understandable
language.
• Assembly language and machine languages are processor/controller
dependent and an assembly program written for one
processor/controller family will not work with others.
• Assembly language programming is the task of writing processor
specific machine code in mnemonic form, converting the mnemonics
into actual processor instructions (machine language) and associated
data using an assembler.
High Level Language Based Development
• Assembly language based programming is highly time consuming, tedious
and requires skilled programmers with sound knowledge of the target
processor architecture.
• Applications developed in Assembly language are non-portable.
• Here comes the role of high level languages.
• Any high level language (like C, C++ or Java) with a supported cross
compiler (for converting the application developed in high level language
to target processor specific assembly code) for the target processor can be
used for embedded firmware development.
• The most commonly used high level language for embedded firmware
application development is ‘C’.
Mixing Assembly and High Level Language
• Certain embedded firmware development situations may demand the
mixing of high level language with Assembly and vice versa.
• High level language and assembly languages are usually mixed in
three ways;
• Mixing Assembly Language with High Level Language
• Mixing High Level Language with Assembly
• In-line Assembly programming
Mixing Assembly Language with High Level Language
• Assembly routines are mixed with ‘C’ in situations where the entire
program is written in ‘C’ and the cross compiler in use do not have a
built in support for implementing certain features like Interrupt
Service Routine functions (ISR) or if the programmer wants to take
advantage of the speed and optimised code offered by machine code
generated by hand written assembly rather than cross compiler
generated machine code.
• Mixing ‘C’ and Assembly is little complicated in the sense—the
programmer must be aware of how parameters are passed from the
‘C’ routine to Assembly and values are returned from assembly
routine to ‘C’ and how ‘Assembly routine’ is invoked from the ‘C’
code.
Mixing High Level Language with Assembly
(e.g. ‘C’ with Assembly Language)
Mixing the code written in a high level language like ‘C’ and Assembly
language is useful in the following scenarios:
1. The source code is already available in Assembly language and a routine written in a
high level language like ‘C’ needs to be included to the existing code.
2. The entire source code is planned in Assembly code for various reasons like
optimised code, optimal performance, efficient code memory utilisation and proven
expertise in handling the Assembly, etc.
But some portions of the code may be very difficult and tedious to code in Assembly.
For example 16bit multiplication and division in 8051 Assembly Language.
3. To include built in library functions written in ‘C’ language provided by the cross
compiler. For example Built in Graphics library functions and String operations
supported by ‘C’.
Inline Assembly
• Inline assembly is another technique for inserting target
processor/controller specific Assembly instructions at any location of
a source code written in high level language ‘C’.
• This avoids the delay in calling an assembly routine from a ‘C’ code (If
the Assembly instructions to be inserted are put in a subroutine as
mentioned in the section mixing assembly with ‘C’).
• Special keywords are used to indicate that the start and end of
Assembly instructions.
• The keywords are cross-compiler specifi c. C51 uses the keywords
#pragma asm and #pragma endasm to indicate a block of code
written in assembly.
PROGRAMMING IN EMBEDDED C
• Whenever the conventional ‘C’ Language and its extensions are used for programming
embedded systems, it is referred as ‘Embedded C’ programming.
• Programming in ‘Embedded C’ is quite different from conventional Desktop application
development using ‘C’ language for a particular OS platform.
• Desktop computers contain working memory in the range of Giga bytes and storage
memory in the range of Giga/Tera bytes.
• For a desktop application developer, the resources available are surplus in quantity and
the developer is not restricted in terms of the memory usage.
• This is not the case for embedded application developers.
• Almost all embedded systems are limited in both storage and working memory
resources.
• Embedded application developers should be aware of this fact and should develop
applications in the best possible way which optimises the code memory and working
memory usage as well as performance.
C v/s. Embedded C
C Embedded C
• ‘C’ is a well structured, well defined and • Embedded ‘C’ can be considered as a subset
standardised general purpose programming of conventional ‘C’ language.
language with extensive bit manipulation • Embedded ‘C’ supports all ‘C’ instructions and
support. incorporates a few target processor specific
• ‘C’ offers a combination of the features of functions/instructions.
high level language and assembly and helps in
hardware access programming. • The standard ANSI ‘C’ library implementation
is always tailored to the target
• The conventional ‘C’ language follows ANSI processor/controller library files in Embedded
standard and it incorporates various library fi ‘C’.
les for different operating systems.
• The implementation of target
• A platform (operating system) specific processor/controller specific
application, known as, compiler is used for functions/instructions depends upon the
the conversion of programs written in ‘C’ to processor/controller as well as the supported
the target processor (on which the OS is cross-compiler for the particular Embedded
running) specific binary fi les. ‘C’ language.
Compiler vs. Cross-Compiler
Compiler Cross-Compiler
• Compiler is a software tool that converts a source • Cross-compilers are the software tools
code written in a high level language on top of a used in cross-platform development
particular operating system running on a specific
target processor architecture (e.g. Intel applications.
x86/Pentium). • In cross-platform development, the
• Here the operating system, the compiler program compiler running on a particular target
and the application making use of the source code processor/OS converts the source code to
run on the same target processor. machine code for a target processor
• The source code is converted to the target whose architecture and instruction set is
processor specific machine instructions. different from the processor on which the
• The development is platform specific. compiler is running or for an operating
• Compilers aregenerally termed as ‘Native
system which is different from the current
Compilers’. A native compiler generates machine development environment OS.
code for the same machine (processor) on which it • Keil C51 is an example for cross-compiler.
is running.
RTOS and IDE for Embedded
System
Module-3
Contents
• Operating System basics
• Types of operating systems
• Task, process and threads
• Thread pre-emption
• Pre-emptive Task scheduling techniques
• Task Communication
• Task synchronization issues – Racing and Deadlock
• How to choose an RTOS
• Integration and testing of Embedded hardware and firmware
• Embedded system Development Environment – Block diagram (excluding
Keil)
Introduction
• In the previous chapter, we discussed about the Super loop based task
execution model for firmware execution.
• The super loop executes the tasks sequentially in the order in which the
tasks are listed within the loop.
• Here every task is repeated at regular intervals and the task execution is
non-real time.
• If some of the tasks involve waiting for external events or I/O device usage,
the task execution time also gets pushed off in accordance with the ‘wait’
time consumed by the task.
• The priority in which a task is to be executed is fixed and is determined by
the task placement within the loop, in a super loop based execution.
Introduction
• This type of firmware execution is suited for embedded devices where
response time for a task is not time critical.
• Typical examples are electronic toys and video gaming devices.
• Here any response delay is acceptable and it will not create any operational
issues or potential hazards.
• Whereas certain applications demand time critical response to
tasks/events and any delay in the response may become catastrophic.
• Flight Control systems, Air bag control and Anti-lock Brake System (ABS)
systems for vehicles, Nuclear monitoring devices, etc. are typical examples
of applications/devices demanding time critical task response.
Introduction
• How the increasing need for time critical response for tasks/events is
addressed in embedded applications?
• Well the answer is
1. Assign priority to tasks and execute the high priority task when the task is
ready to execute.
2. Dynamically change the priorities of tasks if required on a need basis.
3. Schedule the execution of tasks based on the priorities.
4. Switch the execution of task when a task is waiting for an external event or a
system resource including I/O device operation.
• The introduction of operating system based firmware execution in
embedded devices can address these needs to a greater extent.
OPERATING SYSTEM BASICS
• The operating system acts as a bridge between the user
applications/tasks and the underlying system resources through a set
of system functionalities and services.
• The OS manages the system resources and makes them available to
the user applications/tasks on a need basis.
• A normal computing system is a collection of different I/O
subsystems, working, and storage memory.
• The primary functions of an operating system is
Make the system convenient to use
Organise and manage the system resources efficiently and correctly
The Operating System Architecture
The Kernel
• The kernel is the core of the operating system and is responsible for
managing the system resources and the communication among the
hardware and other system services.
• Kernel acts as the abstraction layer between system resources and
user applications.
• Kernel contains a set of system libraries and services.
• For a general purpose OS, the kernel contains different services for
handling the following.
Kernel Services
• Process management
• Process management deals with managing the processes/tasks.
• Process management includes setting up the memory space for the process,
loading the process’s code into the memory space, allocating system
resources, scheduling and managing the execution of the process, setting up
and managing the Process Control Block (PCB), Inter Process Communication
and synchronisation, process termination/deletion, etc.
• Primary Memory Management
• The Memory Management Unit (MMU) of the kernel is responsible for
Keeping track of which part of the memory area is currently used by which process
Allocating and De-allocating memory space on a need basis (Dynamic memory
allocation).
Kernel Services
• File System Management
The file system management service of Kernel is responsible for
The creation, deletion and alteration of files
Creation, deletion and alteration of directories
Saving of files in the secondary storage memory (e.g. Hard disk storage)
Providing automatic allocation of file space based on the amount of free space available
Providing a flexible naming convention for the files
• I/O System (Device) Management
Kernel is responsible for routing the I/O requests coming from different user applications
to the appropriate I/O devices of the system.
In a well-structured OS, the direct accessing of I/O devices are not allowed and the
access to them are provided through a set of Application Programming Interfaces (APIs)
exposed by the kernel.
Kernel Services
• Secondary Storage Management
The secondary storage management service of kernel deals with
Disk storage allocation
Disk scheduling (Time interval at which the disk is activated to backup data)
Free Disk space management
• Protection Systems
• In multiuser supported operating systems, one user may not be allowed to
view or modify the whole/portions of another user’s data or profile details.
• In addition, some application may not be granted with permission to make
use of some of the system resources.
Kernel Services
• Interrupt Handler
• Kernel provides handler mechanism for all external/internal interrupts generated by
the system.
• Kernel Space and User Space
• The applications/services are classified into two categories, namely: user applications
and kernel applications. In addition, some application may not be granted with
permission to make use of some of the system resources.
• The program code corresponding to the kernel applications/services are kept in a
contiguous area (OS dependent) of primary (working) memory and is protected from
the unauthorised access by user programs/applications.
• The memory space at which the kernel code is located is known as ‘ Kernel Space’.
• Similarly, all user applications are loaded to a specific area of primary memory and
this memory area is referred as ‘ User Space’.
Kernel Services
Monolithic Kernel
• In monolithic kernel architecture, all kernel services run in
the kernel space.
• The tight internal integration of kernel modules in
monolithic kernel architecture allows the effective
utilisation of the low-level features of the underlying
system.
• The major drawback of monolithic kernel is that any error
or failure in any one of the kernel modules leads to the
crashing of the entire kernel application.
• LINUX, SOLARIS, MS-DOS kernels are examples of
monolithic kernel.
Kernel Services
Micro Kernel
• The microkernel design incorporates only the
essential set of Operating System services into
the kernel.
• The rest of the Operating System services are
implemented in programs known as ‘Servers’
which runs in user space.
• Memory management, process management,
timer systems and interrupt handlers are the
essential services, which forms the part of the
microkernel.
• Mach, QNX, Minix 3 kernels are examples for
microkernel.
Types of operating systems
Depending on the type of kernel and kernel services, purpose and type of computing systems where the
OS is deployed and the responsiveness to applications, Operating Systems are classified into different
types.
• General Purpose Operating System (GPOS)
• The operating systems, which are deployed in general computing systems, are referred as
General Purpose Operating Systems ( GPOS).
• The kernel of such an OS is more generalised and it contains all kinds of services required for
executing generic applications.
• Windows 10/8.x/XP/MS-DOS etc. are examples
• Real-Time Operating System (RTOS)
• ‘Real-Time’ implies deterministic timing behaviour.
• Deterministic timing behaviour in RTOS context means the OS services consumes only known
and expected amounts of time regardless the number of services.
• Windows Embedded Compact, QNX, VxWorks MicroC/OS-II etc. are example
The Real-Time Kernel
• The kernel of a Real-Time Operating System is referred as Real-Time
kernel.
• The basic functions of a Real-Time kernel are listed below:
• Task/Process management
• Task/Process scheduling
• Task/Process synchronisation
• Error/Exception handling
• Memory management
• Interrupt handling
• Time management
Task/ Process management
• Deals with setting up the memory space for the tasks, loading the task’s code into the
memory space, allocating system resources, setting up a Task Control Block (TCB) for the
task and task/process termination/deletion.
• A Task Control Block (TCB) is used for holding the information corresponding to a task.
• TCB usually contains the following set of information.
Task ID: Task Identification Number
Task State: The current state of the task (e.g. State = ‘Ready’ for a task which is ready to execute)
Task Type: Task type. Indicates what is the type for this task. The task can be a hard real time or
soft real time or background task.
Task Priority: Task priority (e.g. Task priority = 1 for task with priority = 1)
Task Context Pointer: Context pointer. Pointer for context saving
Task Memory Pointers: Pointers to the code memory, data memory and stack memory for the task
Task System Resource Pointers: Pointers to system resources (semaphores, mutex, etc.) used by
the task
Task Pointers: Pointers to other TCBs (TCBs for preceding, next and waiting tasks)
Other Parameters: Other relevant task parameters
The Real-Time Kernel
• Task/ Process Scheduling
• Deals with sharing the CPU among various tasks/processes.
• A kernel application called ‘Scheduler’ handles the task scheduling.
• Scheduler is nothing but an algorithm implementation, which performs the
efficient and optimal scheduling of tasks to provide a deterministic behaviour.
• Task/ Process Synchronisation
• Deals with synchronising the concurrent access of a resource, which is shared
across multiple tasks and the communication between various tasks.
The Real-Time Kernel
• Error/ Exception Handling
• Deals with registering and handling the errors occurred/exceptions raised
during the execution of tasks.
• Insufficient memory, timeouts, deadlocks, deadline missing, bus error, divide
by zero, unknown instruction execution, etc. are examples of
errors/exceptions. Errors/Exceptions can happen at the kernel level services
or at task level.
• Memory Management
• RTOS makes use of ‘block’ based memory allocation technique, instead of the
usual dynamic memory allocation techniques used by the GPOS.
• RTOS kernel uses blocks of fixed size of dynamic memory and the block is
allocated for a task on a need basis.
The Real-Time Kernel
• Interrupt Handling
• Deals with the handling of various types of interrupts.
• Interrupts provide Real-Time behaviour to systems.
• Interrupts inform the processor that an external device or an associated task
requires immediate attention of the CPU.
• Priority levels can be assigned to the interrupts and each interrupts can be
enabled or disabled individually.
• Most of the RTOS kernel implements ‘Nested Interrupts’ architecture.
• Interrupt nesting allows the pre-emption (interruption) of an Interrupt Service
Routine (ISR), servicing an interrupt, by a high priority interrupt.
The Real-Time Kernel
Time management
• Accurate time management is essential for providing precise time reference
for all applications.
• The time reference to kernel is provided by a high-resolution Real-Time Clock
(RTC) hardware chip (hardware timer).
• The hardware timer is programmed to interrupt the processor/controller at a
fixed rate. This timer interrupt is referred as ‘ Timer tick’.
• The ‘Timer tick’ is taken as the timing reference by the kernel.
• The ‘Timer tick’ interval may vary depending on the hardware timer. Usually
the ‘Timer tick’ varies in the microseconds range.
The Real-Time Kernel
Hard Real-Time
• Real-Time Operating Systems that strictly adhere to the timing constraints for
a task is referred as ‘Hard Real-Time’ systems.
• A Hard Real-Time system must meet the deadlines for a task without any
slippage.
• Missing any deadline may produce catastrophic results for Hard Real-Time
Systems, including permanent data lose and irrecoverable damages to the
system/users.
• A system can have several such tasks and the key to their correct operation
lies in scheduling them so that they meet their time constraints.
• Air bag control systems and Anti-lock Brake Systems (ABS) of vehicles are
typical examples for Hard Real-Time Systems.
The Real-Time Kernel
Soft Real-Time
• Real-Time Operating System that does not guarantee meeting deadlines, but
offer the best effort to meet the deadline are referred as ‘Soft Real-Time’
systems.
• Missing deadlines for tasks are acceptable for a Soft Real-time system if the
frequency of deadline missing is within the compliance limit of the Quality of
Service (QoS).
• A Soft Real-Time system emphasises the principle ‘A late answer is an
acceptable answer, but it could have done bit faster’.
• Automatic Teller Machine (ATM) is a typical example for Soft-Real-Time
System.
Task, process and threads
• The term ‘ task’ refers to something that needs to be done.
• In our day-to-day life, we are bound to the execution of a number of tasks.
• In addition, we will have an order of priority and schedule/timeline for
executing these tasks.
• In the operating system context, a task is defined as the program in
execution and the related information maintained by the operating system
for the program.
• The terms ‘Task’, ‘Job’ and ‘Process’ refer to the same entity in the
operating system context and most often they are used interchangeably.
Process
• A ‘Process’ is a program, or part of it, in execution.
• Process is also known as an instance of a program in execution.
• Multiple instances of the same program can execute simultaneously.
• A process requires various system resources like CPU for executing
the process, memory for storing the code corresponding to the
process and associated variables, I/O devices for information
exchange, etc.
• A process is sequential in execution.
The structure of a process
• The concept of ‘Process’ leads to concurrent execution (pseudo
parallelism) of tasks and thereby the efficient utilisation of the CPU
and other system resources.
• Concurrent execution is achieved through the sharing of CPU among
the processes.
• A process mimics a processor in properties and holds a set of
registers, process status, a Program Counter (PC) to point to the next
executable instruction of the process, a stack for holding the local
variables associated with the process and the code corresponding to
the process.
The structure of a process
• A process which inherits all the
properties of the CPU can be
considered as a virtual processor,
awaiting its turn to have its
properties switched into the
physical processor.
• When the process gets its turn,
its registers and the program
counter register becomes
mapped to the physical registers
of the CPU.
Memory organization of a process
• From a memory perspective, the memory
occupied by the process is segregated into
three regions, namely, Stack memory, Data
memory and Code memory.
• The ‘ Stack’ memory holds all temporary
data such as variables local to the process.
• Data memory holds all global data for the
process.
• The code memory contains the program
code (instructions) corresponding to the
process.
• On loading a process into the main
memory, a specific area of memory is
allocated for the process.
Process States and State Transition
• The creation of a process to its
termination is not a single step
operation.
• The cycle through which a
process changes its state from
‘newly created’ to ‘execution
completed’ is known as ‘
Process Life Cycle’.
• The various states through
which a process traverses
through during a Process Life
Cycle indicates the current
status of the process with
respect to time and also
provides information on what it
is allowed to do next.
Process States and State Transition
• The state at which a process is being created is referred as ‘Created State’.
• The Operating System recognises a process in the ‘Created State’ but no
resources are allocated to the process.
• The state, where a process is incepted into the memory and awaiting the processor
time for execution, is known as ‘Ready State’.
• At this stage, the process is placed in the ‘Ready list’ queue maintained by the OS.
• The state where in the source code instructions corresponding to the process is
being executed is called ‘Running State’.
• ‘Blocked State/Wait State’ refers to a state where a running process is temporarily
suspended from execution and does not have immediate access to resources.
• A state where the process completes its execution is known as ‘Completed State’.
Threads
• A thread is the primitive that can execute code.
• A thread is a single sequential flow of control
within a process.
• ‘Thread’ is also known as lightweight process.
• A process can have many threads of execution.
• Different threads, which are part of a process,
share the same address space; meaning they
share the data memory, code memory and heap
memory area. Figure: The memory model for a
• Threads maintain their own thread status (CPU process and its associated threads
register values), Program Counter (PC) and stack.
The Concept of Multithreading
• Instead of this single sequential execution of the whole process, if the
task/process is split into different threads carrying out the different
sub functionalities of the process, the CPU can be effectively utilised.
• When the thread corresponding to the I/O operation enters the wait
state, another threads which do not require the I/O event for their
operation can be switched into execution.
• This leads to more speedy execution of the process and the efficient
utilisation of the processor time and resources.
The multithreaded architecture of a process
• If the process is split into multiple threads, which executes a portion
of the process, there will be a main thread and rest of the threads will
be created within the main thread.
• Use of multiple threads to execute a process brings the following
advantage.
• Better memory utilisation. Multiple threads of the same process
share the address space for data memory. This also reduces the
complexity of inter thread communication since variables can be
shared across the threads.
• Since the process is split into different threads, when one thread
enters a wait state, the CPU can be utilised by other threads of
the process that do not require the event, which the other thread
is waiting, for processing. This speeds up the execution of the
process.
• Efficient CPU utilisation. The CPU is engaged all time.
Thread Standards
• Thread standards deal with the different standards available for
thread creation and management.
• These standards are utilised by the operating systems for thread
creation and thread management.
• It is a set of thread class libraries.
• POSIX Threads
• POSIX stands for Portable Operating System Interface.
• The POSIX.4 standard deals with the Real-Time extensions and
POSIX.4a standard deals with thread extensions.
POSIX Threads
• The POSIX standard library for thread creation and management is ‘ Pthreads’.
• ‘Pthreads’ library defines the set of POSIX thread creation and management functions in
‘C’ language.
• The primitive
int pthread_create(pthread_t *new_thread_ID, const pthread_attr_t
*attribute, void * (*start_function)(void *), void *arguments);
• creates a new thread for running the function start_ function. Here pthread_t is the
handle to the newly created thread and pthread_attr_t is the data type for holding the
thread attributes.
• ‘start_function’ is the function the thread is going to execute and arguments is the
arguments for ‘start_function’ (It is a void * in the above example).
• On successful creation of a Pthread, pthread_create() associates the Thread Control
Block (TCB) corresponding to the newly created thread to the variable of type pthread_t
(new_thread_ID in our example).
Thread Pre-emption
• Thread pre-emption is the act of pre-empting the currently running
thread (stopping the currently running thread temporarily).
• Thread pre-emption ability is solely dependent on the Operating
System.
• Thread pre-emption is performed for sharing the CPU time among all
the threads.
• The execution switching among threads are known as ‘Thread context
switching’.
• Thread context switching is dependent on the Operating system’s
scheduler and the type of the thread.
Thread v/s Process
Thread Process
• Thread is a single unit of execution and is part of process. • Process is a program in execution and contains one or more
threads.
• A thread does not have its own data memory and heap • Process has its own code memory, data memory and stack
memory. It shares the data memory and heap memory memory.
with other threads of the same process. • A process contains at least one thread.
• A thread cannot live independently; it lives within the • Threads within a process share the code, data and heap
process. memory. Each thread holds separate memory area for stack
(shares the total stack memory of the process).
• There can be multiple threads in a process. The first • Processes are very expensive to create. Involves many OS
thread (main thread) calls the main function and overhead.
occupies the start of the stack memory of the process. • Context switching is complex and involves lot of OS overhead
and is comparatively slower.
• Threads are very inexpensive to create
• If a process dies, the resources allocated to it are reclaimed
• Context switching is inexpensive and fast by the OS and all the associated threads of the process also
dies.
• If a thread expires, its stack is reclaimed by the process.
Pre-emptive Task scheduling techniques
• In pre-emptive scheduling, every task in the ‘Ready’ queue gets a chance to execute.
• When and how often each process gets a chance to execute (gets the CPU time) is
dependent on the type of pre-emptive scheduling algorithm used for scheduling the
processes.
• In this kind of scheduling, the scheduler can pre-empt (stop temporarily) the currently
executing task/process and select another task from the ‘Ready’ queue for execution.
• When to pre-empt a task and which task is to be picked up from the ‘Ready’ queue for
execution after pre-empting the current task is purely dependent on the scheduling
algorithm.
• A task which is pre-empted by the scheduler is moved to the ‘Ready’ queue.
• The act of moving a ‘Running’ process/task into the ‘Ready’ queue by the scheduler,
without the processes requesting for it is known as ‘Pre-emption’.
• The two important approaches adopted in pre-emptive scheduling are time-based pre-
emption and priority-based pre-emption.
Pre-emptive SJF Scheduling/ Shortest
Remaining Time (SRT)
• The pre-emptive SJF (Shortest Job First) scheduling algorithm sorts the
‘Ready’ queue when a new process enters the ‘Ready’ queue and checks
whether the execution time of the new process is shorter than the
remaining of the total estimated time for the currently executing process.
• If the execution time of the new process is less, the currently executing
process is pre-empted and the new process is scheduled for execution.
• Pre-emptive SJF scheduling always compares the execution completion
time (It is same as the remaining time for the new process) of a new
process entered the ‘Ready’ queue with the remaining time for completion
of the currently executing process and schedules the process with shortest
remaining time for execution.
• Pre-emptive SJF scheduling is also known as Shortest Remaining Time (
SRT) scheduling.
Round Robin (RR) Scheduling
• ‘Round Robin’ follows the philosophy “Equal chance to all”.
• In Round Robin scheduling, each process in the ‘Ready’ queue is
executed for a pre-defined time slot.
• The execution starts with picking up the first process in the
‘Ready’ queue.
• It is executed for a pre-defined time and when the pre-defined
time elapses or the process completes (before the pre-defined
time slice), the next process in the ‘Ready’ queue is selected for
execution.
• This is repeated for all the processes in the ‘Ready’ queue.
• Once each process in the ‘Ready’ queue is executed for the pre-
defined time period, the scheduler comes back and picks the
first process in the ‘Ready’ queue again for execution. The
sequence is repeated.
Task Communication
• In a multitasking system, multiple tasks/processes run concurrently
and each process may or may not interact between.
• Based on the degree of interaction, the processes running on an OS
are classified as
• Co-operating Processes: In the co-operating interaction model one
process requires the inputs from other processes to complete its
execution.
• Competing Processes: The competing processes do not share
anything among themselves but they share the system resources. The
competing processes compete for the system resources such as file,
display device, etc.
Task Communication
• Co-operating processes exchange information and communicate through
the following methods.
• Co-operation through Sharing: The co-operating process exchange data
through some shared resources.
• Co-operation through Communication: No data is shared between the
processes. But they communicate for synchronisation.
• The mechanism through which processes/tasks communicate each other is
known as Inter Process/Task Communication (IPC).
• Inter Process Communication is essential for process co-ordination.
• The various types of Inter Process Communication (IPC) mechanisms
adopted by process are kernel (Operating System) dependent.
Task Communication
Shared Memory
• Processes share some area of the memory to communicate among
them.
• Information to be communicated by the process is written to the
shared memory area.
• Other processes which require this information can read the same
from the shared memory area.
• It is same as the real world example where ‘Notice Board’ is used by
corporate to publish the public information among the employees.
Task synchronization issues – Racing and
Deadlock
• Imagine a situation where two processes try to access display hardware
connected to the system or two processes try to access a shared memory
area where one process tries to write to a memory location when the other
process is trying to read from this.
• What could be the result in these scenarios?
• How these issues can be addressed?
• The solution is, make each process aware of the access of a shared
resource either directly or indirectly.
• The act of making processes aware of the access of shared resources by
each process to avoid conflicts is known as ‘Task/ Process Synchronisation’.
Task synchronization issues – Racing and
Deadlock
Racing
• Racing or Race condition is the situation in which multiple processes
compete (race) each other to access and manipulate shared data
concurrently.
• In a Race condition the final value of the shared data depends on the
process which acted on the data finally.
Deadlock
• A race condition produces incorrect results whereas a deadlock condition
creates a situation where none of the processes are able to make any
progress in their execution, resulting in a set of deadlocked processes.
• A situation very similar to our traffic jam issues in a junction.
Deadlock
The different conditions favouring a deadlock situation are listed below:
• Mutual Exclusion: The criteria that only one process can hold a
resource at a time. Typical example is the accessing of display hardware
in an embedded device.
• Hold and Wait: The condition in which a process holds a shared
resource by acquiring the lock controlling the shared access and waiting
for additional resources held by other processes. Deadlock visualization
• No Resource Pre-emption: The criteria that operating system cannot
take back a resource from a process which is currently holding it and
the resource can only be released voluntarily by the process holding it.
• Circular Wait: A process is waiting for a resource which is currently held
by another process which in turn is waiting for a resource held by the
first process.
Scenarios leading to deadlock
Deadlock Handling
Ignore Deadlocks:
• Always assume that the system design is deadlock free. This is acceptable
for the reason the cost of removing a deadlock is large compared to the
chance of happening a deadlock.
Detect and Recover:
• This approach suggests the detection of a deadlock situation and recovery
from it. This is similar to the deadlock condition that may arise at a traffic
junction.
• A deadlock condition can be detected by analysing the resource graph by
graph analyser algorithms. Once a deadlock condition is detected, the
system can terminate a process or pre-empt the resource to break the
deadlocking cycle.
Prevent Deadlocks
• Prevent the deadlock condition by negating one of the four conditions
favouring the deadlock situation.
1. Ensure that a process does not hold any other resources when it
requests a resource.
2. Ensure that resource pre-emption (resource releasing) is possible at
operating system level. This can be achieved by implementing the
following set of rules/guidelines in resources allocation and releasing.
3. Livelock: The Livelock condition is similar to the deadlock condition
except that a process in livelock condition changes its state with time.
4. Starvation: In the multitasking context, starvation is the condition in
which a process does not get the resources required to continue its
execution for a long time.
Integration and testing of Embedded
hardware and firmware
• Integration testing of the embedded hardware and firmware is the
immediate step following the embedded hardware and firmware
development.
• The final embedded hardware constitute of a PCB with all necessary
components affixed to it as per the original schematic diagram.
• Embedded firmware represents the control algorithm and
configuration data necessary to implement the product requirements
on the product.
• Embedded firmware will be in a target processor/controller
understandable format called machine language (sequence of 1s and
0s–Binary).
Integration Embedded hardware and
firmware
• Integration of hardware and firmware deals with the embedding of firmware into
the target hardware board.
• It is the process of ‘Embedding Intelligence’ to the product.
• The embedded processors/controllers used in the target board may or may not
have built in code memory.
• For non-operating system based embedded products, if the processor/controller
contains internal memory and the total size of the firmware is fitting into the
code memory area, the code memory is downloaded into the target
controller/processor.
• If the processor/controller does not support built in code memory or the size of
the firmware is exceeding the memory size supported by the target
processor/controller, an external dedicated EPROM/FLASH memory chip is used
for holding the firmware.
Embedded system Development Environment
– Block diagram (excluding Keil)
• As illustrated in the figure, the development environment
consists of a Development Computer (PC) or Host, which acts
as the heart of the development environment, Integrated
Development Environment (IDE) Tool for embedded firmware
development and debugging, Electronic Design Automation
(EDA) Tool forEmbedded Hardware design, An emulator
hardware for debugging the target board, Signal sources (like
Function generator) for simulating the inputs to the target
board, Target hardware debugging tools (Digital CRO,
Multimeter, Logic Analyser, etc.) and the target hardware.
• The Integrated Development Environment
(IDE) and Electronic Design Automation (EDA) tools are
selected based on the target hardware development
requirement and they are supplied as Installable files in
CDs/Online downloads by vendors.
Module-5
Introduction to the ARM Instruction set
Contents
• Introduction
• Data processing instructions
• Load – Store instruction
• Software interrupt instructions
• Program status register instructions
• Loading constants
• ARMv5E extensions
• Conditional Execution
Introduction
• ARM instructions process data held in registers and only access
memory with load and store instructions.
• ARM instructions commonly take two or three operands.
• For instance the ADD instruction below adds the two values stored in
registers r1 and r2 (the source registers).
• It writes the result to register r3 (the destination register).
Data processing instructions
• The data processing instructions manipulate data within registers.
• They are move instructions, arithmetic instructions, logical
instructions, comparison instructions, and multiply instructions.
• Most data processing instructions can process one of their operands
using the barrel shifter.
• If you use the S suffix on a data processing instruction, then it updates
the flags in the cpsr.
• Move and logical operations update the carry flag C, negative flag N,
and zero flag Z.
Move Instructions
• It copies N into a destination register Rd, where N is a register or
immediate value.
• This instruction is useful for setting initial values and transferring data
between registers.
Example
Barrel Shifter
• A unique and powerful feature of the ARM processor is the ability to
shift the 32-bit binary pattern in one of the source registers left or
right by a specific number of positions before it enters the ALU.
• This shift increases the power and flexibility of many data processing
operations.
• Pre-processing or shift occurs within the cycle time of the instruction.
• This is particularly useful for loading constants into a register and
achieving fast multiplication or division by a power of 2.
Barrel Shifter and ALU
Pre-processing or shift
The example multiplies register r5 by four and then places
the result into register r7.
Example : MOVS r0, r1, LSL #1
• This example of a MOVS
instruction shifts register r1
left by one bit.
• This multiplies register r1 by a
value 2 (1 0000 0008).
• As you can see, the C flag is
updated in the cpsr because
the S suffix is present in the
instruction mnemonic.
Logical shift left by one
Arithmetic Instructions
• The arithmetic instructions implement addition and subtraction of 32-
bit signed and unsigned values.
Example: SUB r0, r1, r2
Example: SUBS r1, r1, #1
Example: Using the Barrel Shifter with Arithmetic
Instructions
Logical Instructions
• Logical instructions perform bitwise logical operations on the two
source registers.
Comparison Instructions
• The comparison instructions are used to compare or test a register
with a 32-bit value.
• They update the cpsr flag bits according to the result, but do not
affect other registers.
• After the bits have been set, the information can then be used to
change program flow by using conditional execution.
• You do not need to apply the S suffix for comparison instructions to
update the flags.
Comparison Instructions
• The CMP is effectively a subtract instruction with the result discarded; similarly the
TST instruction is a logical AND operation, and TEQ is a logical exclusive OR
operation.
• For each, the results are discarded but the condition bits are updated in the cpsr.
• It is important to understand that comparison instructions only modify the condition
flags of the cpsr and do not affect the registers being compared.
Multiply Instructions
• The multiply instructions multiply the contents of a pair of registers
and, depending upon the instruction, accumulate the results in with
another register.
• The long multiplies accumulate onto a pair of registers representing a
64-bit value.
• The final result is placed in a destination register or a pair of registers.
Multiply Instructions
• The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL)
produce a 64-bit result.
• The result is too large to fit a single 32-bit register so the result is
placed in two registers labelled RdLo and RdHi.
• RdLo holds the lower 32 bits of the 64-bit result, and RdHi holds the
higher 32 bits of the 64-bit result.
Branch Instructions
• A branch instruction changes the flow of execution or is used to call a
routine.
• This type of instruction allows programs to have subroutines, if-then-
else structures, and loops.
• The change of execution flow forces the program counter pc to point
to a new address.
• The address label is stored in the instruction as a signed pc-relative
offset and must be within approximately 32 MB of the branch
instruction.
Branch Instructions
Examples of branch instructions
• The forward branch skips three instructions.
• The backward branch creates an infinite loop.
• Branches are used to change execution flow.
• Most assemblers hide the details of a branch
instruction encoding by using labels.
• In this example, forward and backward are the
labels.
• The branch labels are placed at the beginning of
the line and are used to mark an address that can
be used later by the assembler to calculate the
branch offset.
Load-Store Instructions
• Load-store instructions transfer data between memory and processor
registers.
• There are three types of load-store instructions: single-register
transfer, multiple-register transfer, and swap.
• Single-Register Transfer
• These instructions are used for moving a single data item in and out
of a register.
• The datatypes supported are signed and unsigned words (32-bit),
half-words (16-bit), and bytes.
Single register transfer
• LDR and STR instructions can load and
store data on a boundary alignment that is
the same as the datatype size being
loaded or stored.
Single-Register Load-Store Addressing Modes
• The ARM instruction set provides different modes for addressing
memory.
• These modes incorporate one of the indexing methods: preindex with
writeback, preindex, and postindex
• Preindex with writeback calculates an
address from a base register plus address
offset and then updates that address base
register with the new address.
• In contrast, the preindex offset is the
same as the preindex with writeback but
does not update the address base register.
• Postindex only updates the address base
register after the address is used.
• The preindex mode is useful for accessing
an element in a data structure.
• The postindex and preindex with
writeback modes are useful for traversing
an array.
Multiple-Register Transfer
• Load-store multiple instructions can transfer multiple registers
between memory and the processor in a single instruction.
• The transfer occurs from a base address register Rn pointing into
memory.
• Multiple-register transfer instructions are more efficient from single-
register transfers for moving blocks of data around memory and
saving and restoring context and stacks.
• Load-store multiple instructions can increase interrupt latency.
• ARM implementations do not usually interrupt instructions while they
are executing.
Stack Operations
• The ARM architecture uses the load-store multiple instructions to carry out
stack operations.
• The pop operation (removing data from a stack) uses a load multiple
instruction;
• The push operation (placing data onto the stack) uses a store multiple
instruction.
• Ascending (A) stacks grow towards higher memory addresses; in contrast,
descending (D) stacks grow towards lower memory addresses.
• When you use a full stack (F), the stack pointer sp points to an address that
is the last used or full location (i.e., sp points to the last item on the stack).
• If you use an empty stack (E) the sp points to an address that is the first
unused or empty location
Addressing modes for stack operations
Chapter 6
ARM Instruction Set
Hsung-Pin Chang
Department of Computer Science
National Chung Hsing University
Outline
o Data Processing Instructions
o Branch Instructions
o Load-store instructions
o Software interrupt instructions
o Program status register instructions
o Conditional Execution
ARM Instruction Set Format
6.1 Data Processing Instructions
o Manipulate data within registers
o Data processing instructions
n Move instructions
n Arithmetic instructions
n Logical instructions
n Comparison instructions
n Multiply instructions
6.1.1 Move Instruction
o Syntax: <instruction> {<cond>} {S} Rd, N
n N: a register or immediate value
o MOV : move
n MOV r0, r1; r0 = r1
n MOV r0, #5; r0 = 5
o MVN : move (negated)
n MVN r0, r1; r0 = NOT(r1)=~ (r1)
Preprocessed by Shifter
o Example 1
n PRE: r5 = 5, r7 = 8;
n MOV r7, r5, LSL #2; r7 = r5 << 2 = r5*4
n POST: r5 = 5, r7 = 20
6.1.2 Preprocessed by Shifter
o LSL: logical shift left
n x << y, the least significant bits are filled with zeroes
o LSR: logical shift right:
n (unsigned) x >> y, the most significant bits are filled with zeroes
o ASR: arithmetic shift right
n (signed) x >> y, copy the sign bit to the most significant bit
o ROR: rotate right
n ((unsigned) x >> y) | (x << (32-y))
o RRX: rotate right extended
n c flag <<31 | (( unsigned) x >> 1)
n Performs 33-bit rotate, with the CPSR’s C bit being inserted above
sign bit of the word
Preprocessed by Shifter (Cont.)
o Example 2
n PRE: r0 = 0x00000000, r1 = 0x80000004
n MOV r0, r1, LSL #1 ; r0 = r1 *2
n POST r0 = 0x00000008, r1 = 0x80000004
6.1.3 Arithmetic Instructions
o Syntax: <instruction> {<cond>} {S} Rd, Rn, N
n N: a register or immediate value
o ADD : add
n ADD r0, r1, r2; r0 = r1 + r2
o ADC : add with carry
n ADC r0, r1, r2; r0 = r1 + r2 + C
o SUB : subtract
n SUB r0, r1, r2; r0 = r1 - r2
o SBC : subtract with carry
n SUC r0, r1, r2; r0 = r1 - r2 + C -1
6.1.3 Arithmetic Instructions (Cont.)
o RSB : reverse subtract
n RSB r0, r1, r2; r0 = r2 – r1
o RSC : reverse subtract with carry
n RSC r0, r1, r2; r0 = r2 – r1 + C -1
o MUL : multiply
n MUL r0, r1, r2; r0 = r1 x r2
o MLA : multiply and accumulate
n MLA r0, r1, r2, r3; r0 = r1 x r2 + r3
6.1.4 Logical Operations
o Syntax: <instruction> {<cond>} {S} Rd, RN, N
n N: a register or immediate value
o AND : Bit-wise and
o ORR : Bit-wise or
o EOR : Bit-wise exclusive-or
o BIC : bit clear
n BIC r0, r1, r2; r0 = r1 & Not(r2)
Logical Operations (Cont)
o Example 3:
n PRE: r1 = 0b1111, r2 = 0b0101
n BIC r0, r1, r2 ; r0 = r1 AND (NOT(r2))
n POST: r0=0b1010
6.1.5 Comparison Instructions
o Compare or test a register with a 32-bit value
n Do not modify the registers being compared or
tested
n But only set the values of the NZCV bits of the
CPSR register
o Do not need to apply to S suffix for comparison
instruction to update the flags in CPSR register
Comparison Instructions (Cont.)
o Syntax: <instruction> {<cond>} {S} Rd, N
n N: a register or immediate value
o CMP : compare
n CMP r0, r1; compute (r0 - r1)and set NZCV
o CMN : negated compare
n CMP r0, r1; compute (r0 + r1)and set NZCV
o TST : bit-wise AND test
n TST r0, r1; compute (r0 AND r1)and set NZCV
o TEQ : bit-wise exclusive-or test
n TEQ r0, r1; compute (r0 EOR r1)and set NZCV
Comparison Instructions (Cont.)
o Example 4
n PRE: CPSR = nzcvqiFt_USER, r0 = 4, r9 = 4
n CMP r0, r9
n POST: CPSR = nZcvqiFt_USER
6.1.6 Multiply Instruction
o Syntax:
n MLA{<cond>} {S} Rd, Rm, Rs, Rn
n MUL{<cond>} {S} Rd, Rm, Rs
o MUL : multiply
n MUL r0, r1, r2; r0 = r1*r2
o MLA : multiply and accumulate
n MLA r0, r1, r2, r3; r0 = (r1*r2) + r3
Multiply Instruction (Cont.)
o Syntax: <instruction>{<cond>} {S} RdLo, RdHi, Rm, Rs
n Multiply onto a pair of register representing a 64-bit value
o UMULL : unsigned multiply long
n UMULL r0, r1, r2, r3; [r1,r0] = r2*r3
o UMLAL : unsigned multiply accumulate long
n UMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
o SMULL: signed multiply long
n SMULL r0, r1, r2, r3; [r1,r0] = r2*r3
o SMLAL : signed multiply accumulate long
n SMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
6.2 Branch Instructions
o Branch instruction
n Change the flow of execution
n Used to call a routine
o Allow applications to
n Have subroutines
n Implement if-then-else structure
n Implement loop structure
Branch Instructions (Cont.)
o Syntax
n B{<cond>} lable
n BL{<cond>} lable
o B : branch
n B label; pc (program counter) = label
n Used to change execution flow
o BL : branch and link
n BL label; pc = label, lr = address of the next
address after the BL
n Similar to the B instruction but can be used for subroutine
call
o Overwrite the link register (lr) with a return address
Branch Instructions (Cont.)
o Example 5
B forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
Forward
SUB r1, r2, #4
Backward
SUB r1, r2, #4
B backward
Branch Instructions (Cont.)
o Example 6:
BL subroutine
CMP r1, #5
MOVEQ r1, #0
…
subroutine
<subroutine code>
MOV pc, lr ; return by moving pc = lr
6.3 Load-Store Instructions
o Transfer data between memory and processor
registers
o Three types
n Single-register transfer
n Multiple-register transfer
n Swap
6.3.1 Simple-Register Transfer
o Moving a single data item in and out of
register
o Data item can be
n A word (32-bits)
n Halfword (16-bits)
n Bytes (8-bits)
Simple-Register Transfer (Cont.)
o Syntax
n <LDR|STR>{<cond>}{B} Rd, addressing1
n LDR{<cond>}SB|H|SH Rd, addressing2
n STR{<cond>} H Rd, addressing2
o LDR : load word into a register from memory
o LDRB : load byte
o LDRSB : load signed byte
o LDRH : load half-word
o LSRSH : load signed halfword
o STR: store word from a register to memory
o STRB : store byte
o STRH : store half-word
Simple-Register Transfer (Cont.)
o Example 7
LDR r0, [r1] ;= LDR r0, [r1, #0]
;r0 = mem32[r1]
STR r0, [r1] ;= STR r0, [r1, #0]
;mem32[r1]= r0
n Register r1 is called the base address register
6.3.2 Single-Register Load-Store
Addressing Mode
o Index method, also called Base-Plus-Offset
Addressing
n Base register
o r0 – r15
n Offset, add or subtract an unsigned number
o Immediate
o Register (not PC)
o Scaled register
Single-Register Load-Store Addressing
Mode (Cont.)
o Preindex:
n data: mem[base+offset]
n Base address register: not updated
n Ex: LDR r0,[r1,#4] ; r0:=mem32[r1+4]
o Postindex:
n data: mem[base]
n Base address register: base + offset
n Ex: LDR r0,[r1],#4 ; r0:=mem32[r1], then r1:=r1+4
o Preindex with writeback (also called auto-indexing)
n Data: mem[base+offset]
n Base address register: base + offset
n Ex: LDR r0, [r1,#4]! ; r0:=mem32[r1+4], then r1:=r1+4
Single-Register Load-Store Addressing
Mode (Cont.)
o Example 8
n r0 = 0x00000000, r1 = 0x00009000,
mem32[0x00009000] = 0x01010101,
mem32[0x00009004] = 0x02020202
n Preindexing: LDR r0, [r1, #4]
o r0 = 0x02020202, r1=0x00009000
n Postindexing: LDR r0, [r1], #4
o r0 = 0x01010101, r1=0x00009004
n Preindexing with writeback: LDR r0, [r1, #4]!
o R0 = 0x02020202, r1=0x00009004
Single-Register Load-Store Addressing
Mode (Cont.)
Addressing mode and index method Addressing syntax
Preindex with immediate offset [Rn, #+/-offset_12]
Preindex with register offset [Rn, +/-Rm]
Preindex with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Preindex writeback with immediate offset [Rn, #+/-offset_12]!
Preindex writeback with register offset [Rn, +/-Rm]!
Preindex writeback with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Immediate postindexed [Rn], #+/-offset_12]
Register postindexed [Rn], +/-Rm!
Scaled register postindexed [Rn], +/-Rm, shift #shift_imm
Examples of LDR Using Different
Addressing Modes
Instruction r0= r1+=
Preindex with LDR r0, [r1, #0x4]! mem32[r1+0x4] 0x4
writeback
LDR r0, [r1,r2]! mem32[r1+r2] r2
LDR r0,[r1, r2, LSR#0x4]! mem32[r1+(r2 LSR 0x4)] (r2 LSR 0x4)
Preindex LDR r0, [r1, #0x4] mem32[r1+0x4] not updated
LDR r0, [r1, r2] mem32[r1+r2] not updated
LDR r0, [r1, -r2, LSR #0x4] Mem32[r1-(r2 LSR 0x4)] not updated
Postindex LDR r0, [r1], #0x4 mem32[r1] 0x4
LDR r0, [r1], r2 Mem32[r1] r2
LDR r0, [r1], r2 LSR #0x4 mem32[r1] (r2 LSR 0x4)
6.3.3 Multiple-Register Transfer
o Transfer multiple registers between memory
and the processor in a single instruction
o More efficient than single-register transfer
n Moving blocks of data around memory
n Saving and restoring context and stack
Multiple-Register Transfer (Cont.)
o Load-store multiple instruction can increase interrupt
latency
n Interrupt can be occurred after an instruction has been
completed
n Each load multiple instruction takes 2 + N*t cycles
o N: the number of registers to load
o t: the number of cycles required for sequential access to memory
n Compilers provides a switch to control the maximum
number of registers between transferred
o Limit the maximum interrupt latency
Multiple-Register Transfer (Cont.)
o Syntax:
n <LDM|STM>{<cond>} <mode> Rn{!}, <registers>{^}
n Address mode: See the next page
n ^: optional
o Can not be used in User Mode and System Mode
o If op is LDM and reglist contains the pc (r15)
n SPSR is also copied into the CPSR.
o Otherwise, data is transferred into or out of the User mode
registers instead of the current mode registers.
Addressing Mode
Addressing Description Start End Rn!
mode address address
IA increment address after Rn Rn+4*N -4 Rn+4*N
each transfer
IB increment address before Rn + 4 Rn+4*N Rn+4*N
each transfer
DA decrement address after Rn-4*N +4 Rn Rn-4*N
each transfer
DB decrement address before Rn-4*N Rn – 4 Rn+4*N
each transfer
Multiple-Register Transfer (Cont.)
o Example 9
n PRE:
mem32[0x80018] = 0x03,
mem32[0x80014] = 0x02,
mem32[0x80010] = 0x01,
r0 = 0x00080010,
r1 = r2 = r3= 0x00000000
n LDMIA r0!, {r1-r3}, or LDMIA r0!, {r1, r2, r3}
o Register can be explicitly listed or use the “-” character
Pre-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000000
0x80014 0x00000002 R2=0x00000000
R0 = 0x80010 0x80010 0x00000001 R1=0x00000000
0x8000c 0x00000000
Figure 1
Post-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000003
0x80014 0x00000002 R2=0x00000002
0x80010 0x00000001 R1=0x00000001
0x8000c 0x00000000
Figure 2
Multiple-Register Transfer (Cont.)
o Example 9 (Cont.)
n POST:
r0 = 0x0008001c,
r1 = 0x00000001,
r2 = 0x00000002,
r3 = 0x00000003
Multiple-Register Transfer (Cont.)
o Example 10
n PRE: as shown in Fig. 1
n LDMIB r0!, {r1-r3}
n POST:
r0 = 0x0008001c
r1 = 0x00000004
r2 = 0x00000003
r3 = 0x00000002
Post-Condition for LDMIB Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004 R3=0x00000004
0x80018 0x00000003 R2=0x00000003
0x80014 0x00000002 R1=0x00000002
0x80010 0x00000001
0x8000c 0x00000000
Figure 3
Multiple-Register Transfer (Cont.)
o Load-store multiple pairs when base update used (!)
n Useful for saving a group of registers and store them later
Store multiple Load multiple
STMIA LDMDB
STMIB LDMDA
STMDA LDMIB
STMDB LDMIA
Multiple-Register Transfer (Cont.)
o Example 11
n PRE:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
n STMIB r0!, {r1-r3}
MOV r1, #1
MOV r2, #2,
MOV r3, #3
Multiple-Register Transfer (Cont.)
o Example 11 (Cont.)
n PRE (2):
r0 = 0x0000900c
r1 = 0x00000001,
r2 = 0x00000002
r3 = 0x00000003
n LDMDA r0!, {r1-r3}
n POST:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
Multiple-Register Transfer (Cont.)
o Example 11 (Cont.)
n The STMIB stores the values 7, 8, 9 to memory
n Then corrupt register r1 to r3 by MOV instruction
n Finally, the LDMDA
o Reloads the original values, and
o Restore the base pointer r0
Multiple-Register Transfer (Cont.)
o Example 12: the use of the load-store multiple
instructions with a block memory copy
;r9 points to start of source data
;r10 points to start of destination data
;r11 points to end of the source
loop
LDMIA r9!, {r0-r7} ;load 32 bytes from source and update r9
STMIA r10!, {r0-r7} ;store 32 bytes to desti. and update r10
CMP r9, r11 ;have we reached the end
BNE loop
Multiple-Register Transfer (Cont.)
High memory
r11
Source
r9
Copy memory
Location
(transfer 32 bytes in
two instructions)
Destination
r10
Low memory
6.3.4 Stack Operations
o ARM architecture uses the load-store multiple
instruction to carry out stack operations
n PUSH: use a store multiple instruction
n POP: use a load multiple instruction
o Stack
n Ascending (A): stack grows towards higher
memory addresses
n Descending (D): stack grows towards lower
memory addresses
6.3.4 Stack Operations (Cont.)
o Stack
n Full stack (F): stack pointer sp points to the last
valid item pushed onto the stack
n Empty stack (E): sp points after the last item on
the stack
o The free slot where the next data item will be placed
o There are a number of aliases available to
support stack operations
n See next page
6.3.4 Stack Operations (Cont.)
o ARM support all four forms of stacks
n Full ascending (FA): grows up; base register points to
the highest address containing a valid item
n Empty ascending (EA): grows up; base register points to
the first empty location
n Full descending (FD): grows down; base register points
to the lowest address containing a valid data
n Empty descending (ED): grows down; base register
points to the first empty location below the stack
Addressing Methods for Stack
Operations
Addressing Description Pop =LDM Push =STM
mode
FA Full LDMFA LDMDA STMFA STMIB
ascending
FD Full LDMFD LDMIA STMFD STMDB
descending
EA Empty LDMEA LDMDB STMEA STMIA
ascending
ED Empty LDMED LDMIB STMED STMDA
descending
6.3.4 Stack Operations (Cont.)
o Example 13
n PRE:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x00080014
n STMFD sp!, {r1, r4}
n POST:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x0008000c
6.3.4 Stack Operations (Cont.)
o Example 13 (Cont.)
n STMFD – full stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
sp
0x80014 0x00000002 0x80014 0x00000002
0x80010 Empty 0x80010 0x00000003
sp
0x8000c Empty 0x8000c 0x00000002
6.3.4 Stack Operations (Cont.)
o Example 14
n PRE:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x00080010
n STMED sp!, {r1, r4}
n POST:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x00080008
6.3.4 Stack Operations (Cont.)
o Example 14 (Cont.)
n STMED – empty stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
0x80014 0x00000002 0x80014 0x00000002
sp 0x80010 Empty 0x80010 0x00000003
0x8000c Empty 0x8000c 0x00000002
sp
0x80008 Empty 0x80008 Empty
6.3.3 SWAP Instruction
o A special case of a load-store instruction
n Swap the contents of memory with the contents
of a register
n An atomic operation
o Cannot not be interrupted by any other instruction or
any other buy access
o The system “holds the bus” until the transaction is
complete
o Useful when implementing semaphores and mutual
exclusion in an operating system
6.3.3 SWAP Instruction (Cont.)
o Syntax: SWP{B}{<cond>} Rd, Rm, [Rn]
n tmp = mem32[Rn]
n Mem32[Rn] = Rm
n Rd = tmp
o SWP: swap a word between memory and a
register
o SWPB: swap a byte between memory and a
register
6.3.3 SWAP Instruction (Cont.)
o Example 15
n PRE:
o Mem32[0x9000] = 0x12345678
o r0 = 0x00000000
o r1 = 0x11112222
o r2 = 0x00009000
n SWP r0, r1, [r2]
n POST:
o mem32[0x9000] = 0x11112222
o r0 = 0x12345678
o r1 = 0x11112222
o r2 = 0x00009000
6.3.3 SWAP Instruction (Cont.)
o Example 15 (Cont.)
SPIN
MOV r1, =semaphore
MOV r2, #1
SWP r3, r2, [r1] ;hold the bus until complete
CMP r3, #1
BEQ spin
o The address pointed by the semaphore either contains the
value of 1 or 0
o When semaphore value == 1 , loop until semaphore becomes
0 (updated by the holding process)
6.4 Software Interrupt Instruction
o SWI: software interrupt instruction
n Cause a software interrupt exception
n Provide a mechanism for applications to call
operating system routines
n Each SWI instruction has an associated SWI
number
o Used to represent a particular function call or routines
6.4 Software Interrupt Instruction
(Cont.)
o Syntax: SWI{<cond>} SWI_number
n lr_svc = address of instruction following the SWI
n spsr_svc = cpsr
n pc = vector table + 0x8 ; jump to the swi
handling
n cpsr mode = SVC
n cpsr I = 1 (mask IRQ interrupt)
6.4 Software Interrupt Instruction
(Cont.)
o Example 16
n PRE:
o cpsr = nzcVqift_USER
o pc = 0x00008000
o lr = r14 = 0x003fffff
n 0x00008000 SWI 0x123456
n POST:
o cpsr = nzcVqIft_SVC
o spsr = nzcVqift_USER
o pc = 0x00000008
o lr = 0x00008004
6.5 Program Status Register
Instructions
o MRS
n Transfer the contents of either the cpsr or spsr
into a register
o MSR
n Transter the contents of a register into the cpsr or
spsr
6.5 Program Status Register
Instructions (Cont.)
o Syntax
n MRS{<cond>} Rd, <cpsr|spsr>
n MSR{<cond>} <cpsr|spsr>_<fields>, Rm
n MSR{<cond>} <cpsr|spsr>_<fields>, #immediate
o Field: any combination of
n Flags: [24:31]
n Status: [16:23]
n eXtension[8:15]
n Control[0:7]
PSR Registers
6.5 Program Status Register
Instructions (Cont.)
o Note: You cannot access the SPSR in User or
System Mode
n Assembler cannot warn you because it does not
know which mode will be executed in
6.5 Program Status Register
Instructions (Cont.)
o Example 17
n PRE:
o cpsr = nzcvqIFt_SVC
n MRS r1, cpsr
n BIC r1, r1, #0x80 ;0b10000000, clear bit 7
n MSR cpsr_c, r1 ;enable IRQ interrupts
n POST:
o cpsr = nzcvqiFt_SVC
n Note that, this example must be in SVC mode
o In user mode, you can only read all cpsr bits and can only update
the condition flag field f, i.e., cpsr[24:31]
6.6 Conditional Execution
o Almost all ARM instruction can include an
optional condition code
n Instruction is only executed if the condition code
flags in the CPSR meet the specified condition
n The default is AL, or always execute
o Conditional executions depends on two
components
n The condition field: located in the instruction
n The condition flags: located in the cpsr
Conditional Execution (Cont.)
o Example 18
ADDEQ r0, r1, r2
; r0 = r1 + r2 if zero flag is set
Condition Codes
6.6 Conditional Execution (Cont.)
o Thus, before activate conditional execution
n There must be an instruction that updates the
conditional code flag according the result
n If not specified, instructions will not update the
flags
o To make an instruction update the flags
n Include the S suffix
n Example: ADDS r0, r1,r2
6.6 Conditional Execution (Cont.)
o However, some instructions always update the flags
n Do not require the S suffix
n CMP, CMN, TST, TEQ
o Flags are preserved until updated
o Thus, you can execute an instruction conditionally,
based upon the flags set in another instruction, either:
n Immediately after the instruction which updated the flags
n After any number of intervening instructions that have not
updated the flags.
6.6 Conditional Execution (Cont.)
o Example 18
n Transfer the following code into the assembly
language
n Assume r1 = a, r2 = b
while ( a!= b )
{
if (a > b) a -= b; else b -= a;
}
6.6 Conditional Execution (Cont.)
o Example 18: Solution 1
gcd
CMP r1, r2
BEQ complete
BLT lessthan
SUB r1, r1, r2
B gcd
lessthan
SUB r2, r2, r1
B gcd
complete
6.6 Conditional Execution (Cont.)
o Example 18: Solution 2
gcd
CMP r1, r2
SUBGT r1, r1, r2
SUBLT r2, r2, r1
BNE gcd
o Solution 2 dramatically reduces the number of
instructions !!!
References
o Andrew N. Sloss, “ARM System Developer’s
Guide: Designing and Optimizing System
Software,” Morgan Kaufmann Publishers,
2004
n Chapter 3: Introduction to the ARM Instruction
Set