0% found this document useful (0 votes)
22 views239 pages

Unit 3 ARM

ARM, or Advanced RISC Machine, is a leading low-power microprocessor architecture widely used in portable devices due to its efficiency and performance. Developed in the 1980s, ARM has evolved through various architecture versions, enhancing features like memory management and instruction sets, and is now dominant in the embedded and mobile markets. The architecture supports various extensions, including Jazelle for Java execution and NEON for multimedia processing, making it versatile for a range of applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views239 pages

Unit 3 ARM

ARM, or Advanced RISC Machine, is a leading low-power microprocessor architecture widely used in portable devices due to its efficiency and performance. Developed in the 1980s, ARM has evolved through various architecture versions, enhancing features like memory management and instruction sets, and is now dominant in the embedded and mobile markets. The architecture supports various extensions, including Jazelle for Java execution and NEON for multimedia processing, making it versatile for a range of applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 239

ARM

What Is ARM?
• Advanced RISC Machine

• First RISC microprocessor


for commercial use

• Market-leader for low-power


and cost-sensitive embedded applications

2
Why ARM is most popular:
• ARM is the most popular processors, particularly
used in portable devices due to its low power
consumption and reasonable performance.
• ARM has got better performance when compared
to other processors.
• The ARM processor is basically consisting of low
power consumption and low cost.
• It is very easy to use ARM for quick and efficient
application developments so that is the main
reason why ARM is most popular.
History of ARM Processor
• ARM Processor - 32 bit processor
• RISC (Reduced Instruction Set Computer) concept
introduced in 1980 at Stanford and Berkley
• ARM was developed by Acron Computer Limited
of Cambridge, England between 1983 & 1985
• ARM limited founded in 1990
• ARM Cores
• Licensed to partners to develop and fabricate new
microcontrollers
• Soft core
History of ARM
Historical remarks
• ARM’s parent company is Acorn Computers (UK).
• Acorn Computers started their Acorn RISC Machine
project in October 1983 (two years after the introduction
of the IBM PC) to develop an own powerful processor for
a line of business computers.
• The acronym ARM was coined originally at this time
(1983) from the designation Acorn RISC Machine.
• In 1990 the company Advanced RISC Machines Ltd. (ARM
Ltd.) was founded as a joint venture of Acorn Computers,
Apple Computers and VLSI Technology.
• Accordingly, also the interpretation of ARM was changed
to “Advanced RISC Machines”.
History of ARM
• ARM (ARM Holdings plc) is a British multinational
semiconductor company with its head office in Cambridge.
• The company designs and licenses low power embedded and
mobile ARM processors along with the appropriate design tools
but does not fabricate semiconductors.
• ARM designs dominate recently the embedded and the mobile
market (including Smartphone and tablets).
• As of 2014 more than 50 billion ARM based processors have
been produced in total, up from 10 billion in 2008 [59], [19], as
indicated in the next Figure.
ARM's first office, 18th century barn just
outside of Cambridge.
ARM's headquarters in Cambridge
(UK)
ARM Connected Community – 900+

Connect, Collaborate, Create – accelerating innovation


Development of the ARM Architecture
v4 v5 v6 v7
Halfword and Improved SIMD Instructions
Thumb-2
signed halfword / interworking Multi-processing
byte support CLZ v6 Memory architecture
Architecture Profiles
Saturated arithmetic Unaligned data support
System mode DSP MAC instructions 7-A - Applications
Extensions: 7-R - Real-time
Thumb Extensions: Thumb-2 (6T2) 7-M - Microcontroller
instruction set Jazelle (5TEJ) TrustZone® (6Z)
(v4T) Multicore (6K)
Thumb only (6-M)

▪ Note that implementations of the same architecture can be different


▪ Cortex-A8 - architecture v7-A, with a 13-stage pipeline
▪ Cortex-A9 - architecture v7-A, with an 8-stage pipeline
Architecture Revisions
ARMv7

ARM1156T2F-S
version

ARM1136JF-S

ARMv6

ARM1176JZF-S
ARM102xE XScaleTM ARM1026EJ-S

ARMv5

ARM9x6E ARM926EJ-S
StrongARM® SC200
ARM7TDMI-S ARM92xT

V4

SC100 ARM720T

1994 1996 1998 2000 2002 2004 2006


time
XScale is a trademark of Intel Corporation
Features of Different ARM Versions:
• ARM Version 1:
– The ARM version one Architecture:
– Software interrupts
– 26-bit address bus
– Data processing is slow
– It support byte, word and multiword load operations
• ARM Version 2:
– 26-Bit address bus
– Automatic instructions for thread synchronization
– Co-processor support
• ARM Version 3:
– 32-Bit addressing
– Multiple data support (like 32 bit=32*32=64).
– Faster than ARM version1 and version2
• ARM Version 4:
– 32-bit address space
– Its support T variant:16 bit THUMB instruction set
– It support M variant: long multiply means give a 64 bit result
• ARM Version 5:
– Improved ARM THUMB interworking
– Its supports CCL instructions
– It support E variant : Enhanced DSP Instruction set
– It support S variant : Acceleration of Java byte code execution
• ARM Version 6:
– Improved memory system
– Its supports a single instruction multiple data
• ARMv7 :
– ƒThumb-2 - variable length instruction set
– ƒTrustZone
• provides system-wide hardware isolation for trusted
software.
– ƒJazelle-RCT(Runtime Compilation Target)
• an extension that allows some ARM processors to
execute Java byte code in hardware as a third execution
state alongside the existing ARM and Thumb modes.
– Jazelle DBX (Direct Bytecode eXecution)
ARMv7 provides three profiles:
• The Application “A” profile
– Memory management support (MMU)
– Highest performance at low power
– Influenced by multi-tasking OS system requirements
• The Real-time “R” profile
– Protected memory (MPU)
– Low latency and predictability ‘real-time’ needs
– Evolutionary path for traditional embedded business
• The Microcontroller “M” profile
– Lowest gate count entry point
– Deterministic behavior a key priority
– Deeply embedded – strong synergies with the “R” profile
• ARMv8
– It adds a 64-bit architecture, named "AArch64", and a new
"A64" instruction set
– Compatibility with ARMv7-A ISA
– 64-bit general purpose registers, SP (stack pointer) and PC
(program counter)
– The execution states support three key instruction sets:
• A32 (or ARM): a 32-bit fixed length instruction set. Part of the 32-
bit architecture execution environment now referred to as
AArch32.
• T32 (Thumb) introduced as a 16-bit fixed-length instruction set,
subsequently enhanced to a mixed-length 16- and 32-bit
instruction set on the introduction of Thumb-2 technology.
• A64 is a 64-bit fixed-length instruction set that offers similar
functionality to the ARM and Thumb instruction sets. Introduced
with ARMv8-A, it is the AArch64 instruction set.
ARMv7: profiles & key features
ARM Processor Family
• ARM has devised a naming convention for its
processors
• Revisions: ARMv1, v2 … v6, v7, v8
• Core implementation:
– – ARM1, ARM2, ARM7, StrongARM,
– ARM926EJ, ARM11, Cortex-A,R,M
• ARM11 is based on ARMv6
• Cortex is based on ARMv7
ARM Processor Family (2)
• Differences between cores
– Processor modes
– Pipeline
– Architecture
– Memory protection unit
– Memory management unit
– Cache
– Hardware accelerated Java
– … and others
ARM Processor Family (3)
• Examples:
– ARM7TDMI
• No MMU, No MPU, No cache, No Java, Thumb mode
– ARM922T
• MMU, No MPU, 8K+8K data and instruction cache, No
Java, Thumb mode
– ARM1136J-S
• MMU, No MPU, configurable caches, with accelerated
Java and Thumb mode
ARM Processor Family (4)
• Naming convention
• ARM [x][y][z][T][D][M][I][E][J][F][S]
– x – Family
– y – memory management/protection
– z – cache
– T – Thumb mode
– D – JTAG debugging
– M – fast multiplier
– I – Embedded ICE macrocell
– E – Enhanced instruction (implies TDMI)
– J – Jazelle, hardware accelerated Java
– F – Floating point unit
– S – Synthesizable version
ARM Core Extensions-(1)
• Hardware extensions are standard
components placed next to the ARM core.
• Improve performance, manage resources, and
provide extra functionality and are designed
to provide flexibility in handling particular
applications.
What are ARM extensions
• Cache and TCM
• Memory management ( MPU & MMU) - prevents
apps from in-appropriate access to hardware
• Coprocessor interface
ARM Core Extensions-(2)
• co-processor:
• Coprocessors can be attached to the ARM processor.
• Extends the processing features of a core by extending the
instruction set or by providing configuration reg-isters.
• More than one coprocessor can be added to the ARM core
via the coprocessor interface.
• The coprocessor can be accessed through a group of
dedicated ARM instructions that provide a load-store type
interface. Consider, for example, coprocessor 15 (cp15):
– The ARM processor uses coprocessor 15(cp15) registers to control
the cache, TCMs, and memory management.
ARM Core Extensions-(3)
• Thumb:
• Thumb is a subset of the ARM instruction set encoded
in 16-bit wide instructions.
– Requires 70% of the space of ARM code.
– Uses 40% more instructions than equivalent ARM code.
• A CPU has Thumb support if it has a T in its name, or it
is architecture v6 or later.
– With 32-bit memory:
• ARM code is 40% faster than Thumb code.
– With 16-bit memory:
• Thumb code is 45% faster than ARM code.
• Uses 30% less external memory power than ARM code.
ARM Core Extensions-(4)
• Thumb continued…
• Thumb is not a complete architecture: you can’t
have a Thumb-only CPU.
• Some of the limitations of Thumb mode include:
– Conditional execution only exists for branch
instructions.
– Data processing operations use a two-address format,
as opposed to ARM’s three-address format.
– Its instruction encodings are less regular than ARM’s.
• Thumb uses the same register set as ARM — but
only R0-R7
ARM Core Extensions-(5)
• Thumb-2:
• Thumb-2 is an enhancement to the 16-bit Thumb Instruction Set
Architecture (ISA).
• It adds 32-bit instructions that can be freely intermixed with 16-bit
instructions in a program. The additional 32-bit instructions enable
Thumb-2 to cover the functionality of the ARM instruction set.
• The 32-bit instructions enable Thumb-2 to deliver the code density
of earlier versions of Thumb, together with performance of the
existing ARM instruction set, all within a single instruction set.
• It’s present in the Cortex CPU series (or any v7 or later versions).
• Now a complete architecture: you can have a Thumb-2-only CPU
(v7M).
• Mixed 16/32-bit instruction stream provides the economy of space
of Thumb combined with most of the speed of pure ARM code.
ARM Core Extensions-(6)
• Thumb-2 continued…
• The most important difference between the Thumb instruction set and
the ARM instruction set is that most 32-bit Thumb instructions are
unconditional, whereas most ARM instructions can be conditional.
• The main enhancements are:
• 32-bit instructions added to the Thumb instruction set to:
– provide support for exception handling in Thumb state
– provide access to coprocessors
– include Digital Signal Processing (DSP) and media instructions
– improve performance in cases where a single 16-bit instruction restricts
functions available to the compiler.
• addition of a 16-bit IT instruction that enables one to four following
Thumb instructions, the IT block, to be conditional
• addition of a 16-bit Compare with Zero and Branch (CZB) instruction to
improve code density by replacing two-instruction sequence with a single
instruction.
ARM Core Extensions-(7)
• Jazelle Extension
• Jazelle is an execution mode in ARM architecture
which "provides architectural support for
hardware acceleration of bytecode execution by a
Java Virtual Machine (JVM)" .
• Increasing demand from ARM customers for
better Java performance.
• ARM provided its own solution in executing Java
in hardware..
– Integrate Java execution into the core!
– Birth of Jazelle!
ARM Core Extensions-(8)
• Jazelle Extension continued…
• ARM Jazelle technology provides an extension to the world’s
leading 32-bit embedded RISC architecture, enabling ARM
processors to execute Java byte code directly in hardware and
delivering unparalleled Java performance on the ARM architecture.
• Platform developers now have the freedom to run Java applications
alongside established OS, middleware and application code — all on
a single processor.
• Jazelle DBX (Direct Bytecode eXecution) is an extension that allows
some ARM processors to execute Java bytecode in hardware as a
third execution state alongside the existing ARM and Thumb
modes.
– Jazelle functionality was specified in the ARMv5TEJ architecture[2] and
the first processor with Jazelle technology was the ARM926EJ-S.
• Jazelle RCT (Runtime Compilation Target) is a different technology
and is based on ThumbEE mode and supports ahead-of-time (AOT)
and just-in-time (JIT) compilation with Java and other execution
environments
ARM Core Extensions-(9)
• Vector Floating Point(VFP) Extension
• The ARM® architecture provides high-performance and high-
efficiency hardware support for floating-point operations in half-,
single-, and double-precision arithmetic.
• Many operations can take place in either scalar form or in vector
form.
• It is fully IEEE-754 compliant with full software library support.
• The floating-point data type is essential for a wide range of digital
signal processing (DSP) applications.
• Scalable Vector Extension (SVE) for ARMv8-A
– SVE is the next-generation SIMD instruction set for AArch64 that
introduces the architectural features for High Performance Computing
(HPC)
ARM Core Extensions-(10)
• NEON (SIMD) Extension
• The implementation of the Advanced SIMD extension used
in ARM processors is called NEON.
• The NEON technology is a packed SIMD architecture. NEON
registers are considered as vectors of elements of the same
data type. Multiple data types are supported by the
technology.
• NEON technology is intended to improve the multimedia
user experience by accelerating audio and video
encoding/decoding, user interface, 2D/3D graphics or
gaming.
• NEON can also accelerate signal processing algorithms and
functions to speed up applications such as audio and video
processing, voice and facial recognition, computer vision
and deep learning.
ARM Chips
• ARM Ltd
– Provides ARM cores
– Intellectual property
• Analog Devices
– ADuC7019, ADuC7020, ADuC7021, ADuC7022, ADuC7024, ADuC7025, ADuC7026,
ADuC7027, ADuC7128, ADuC7129
• Atmel
– AT91C140, AT91F40416, AT91F40816, AT91FR40162, SAM3N4A, SAMR21E18A
• Freescale
– MAC7101, MAC7104, MAC7105, MAC7106, MAC7125,MAC7144
• Samsung
– S3C44B0X, S3C4510B
• Sharp
– LH75400, LH75401, LH75410, LH75411
• Texas Instruments
– TMS470R1A128, TMS470R1A256, TMS470R1A288
• And others…
Recommended Text
• “ARM System Developer’s Guide”
– Andrew Sloss, et. al.
– ISBN 1-55860-874-5
• “ARM Architecture Reference Manual”
– David Seal
– ISBN 0-201-737191
– Softcopy available at www.arm.com
• “ARM system-on-chip architecture”
– Steve Fuber
– ISBN 0-201-67519-6
ARM Design Philosophy
• ARM core uses RISC architecture
– Reduced instruction set
– Load store architecture
– Large number of general purpose registers
– Parallel executions with pipelines
• But some differences from RISC
– Enhanced instructions for
• Thumb mode
• DSP instructions
• Conditional execution instruction
• 32 bit barrel shifter
What is RISC?
• RISC?
RISC, or Reduced Instruction Set Computer. is a type of
microprocessor architecture that utilizes a small, highly-optimized set
of instructions, rather than a more specialized set of instructions
often found in other types of architectures.
• History
The first RISC projects came from IBM, Stanford, and UC-Berkeley in
the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley
RISC 1 and 2 were all designed with a similar philosophy which has
become known as RISC. Certain design features have been
characteristic of most RISC processors:
– one cycle execution time: RISC processors have a CPI (clock per instruction) of
one cycle. This is due to the optimization of each instruction on the CPU and a
technique called PIPELINING
– pipelining: a technique that allows for simultaneous execution of parts, or stages,
of instructions to more efficiently process instructions;
– large number of registers: the RISC design philosophy generally incorporates a
larger number of registers to prevent in large amounts of interactions with
memory
RISC Attributes
The main characteristics of CISC microprocessors are:
• Extensive instructions.
• Complex and efficient machine instructions.
• Micro encoding of the machine instructions.
• Extensive addressing capabilities for memory operations.
• Relatively few registers.

In comparison, RISC processors are more or less the opposite of the


above:
• Reduced instruction set.
• Less complex, simple instructions.
• Hardwired control unit and machine instructions.
• Few addressing schemes for memory operands with only two basic
instructions, LOAD and STORE
• Many symmetric registers which are organized into a register file.
A difference between RISC and CICS

RISC CISC
• Reduced Instruction Set • Complex Instruction Set
Computer Computer
• It contains lesser number of • It contains greater number
instructions. of instructions.
• Instruction pipelining and • Instruction pipelining
increased execution speed. feature does not exist.
• Orthogonal instruction • Non-orthogonal set(all
set(allows each instruction instructions are not allowed
to operate on any register to operate on any register
and use any addressing and use any addressing
mode. mode.
A difference between RISC and CICS

RISC CISC
• Operations are performed on • Operations are performed either
registers only, only memory on registers or memory
operations are load and store. depending on instruction.
• A larger number of registers are • The number of general purpose
available. registers are very limited.
• Programmer needs to write more • Instructions are like macros in C
code to execute a task since language.
instructions are simpler ones. • It is variable length instruction.
• It is single, fixed length • More silicon usage since more
instruction. additional decoder logic is
• Less silicon usage and pin count. required to implement the
• With Harvard Architecture. complex instruction decoding.
• Can be Harvard or Von-Neumann
Architecture.
RISC Design Principles(1)
• Simple operations
– Simple instructions that can execute in one cycle
• Register-to-register operations
– Only load and store operations access memory
– Rest of the operations on a register-to-register
basis
• Simple addressing modes
– A few addressing modes (1 or 2)
RISC Design Principles(2)
• Large number of registers
– Needed to support register-to-register operations
– Minimize the procedure call and return overhead
• Fixed-length instructions
– Facilitates efficient instruction execution
• Simple instruction format
– Fixed boundaries for various fields
ARM Processor Architecture
• the ARM architecture has evolved to include
architectural features to meet the growing
demand for new functionality, integrated security
features, high performance and the needs of new
and emerging markets.
• There are currently 3 ARMv8 profiles,
– the ARMv8-A architecture profile for high
performance markets such as mobile and enterprise,
– the ARMv8-R architecture profile for embedded
applications in automotive and industrial control,
– the ARMv8-M architecture profile for embedded and
IoT applications.
Difference between Harvard and Von-
neumann Achitectures
Difference between Harvard and Von-
neumann Achitectures
ARM processor features
• Load/store architecture.
• An orthogonal instruction set.
• Mostly single-cycle execution.
• Enhanced power-saving design.
• 64 and 32-bit execution states for scalable high performance.
• 32-bit RISC-processor core (32-bit instructions)
• 37 pieces of 32-bit integer registers (16 available)
• Pipelined (ARM7: 3 stages)
• Von Neuman-type bus structure (ARM7), Harvard (ARM9)
• 8 / 16 / 32 -bit data types
• 7 modes of operation (usr, fiq, irq, svc, abt, sys, und)
• Simple structure -> reasonably good speed / power
consumption ratio
ARM7TDMI
• ARM7TDMI is a core processor module embedded in many
ARM7 microprocessors.
• It is the most complex processor core module in ARM7
series.
– T: capable of executing Thumb instruction set
– D: Featuring with IEEE Std. 1149.1 JTAG boundary-scan
debugging interface.
– M: Featuring with a Multiplier-And-Accumulate (MAC) unit for
DSP applications.
– I: Featuring with the support of embedded In-Circuit Emulator.
• Three pipeline Stages: Instruction fetch, decode, and
Execution.
Features
• A 32-bit RSIC processor core capable of executing
16-bit instructions (Von Neumann Architecture)
– High density code
• The Thumb sets 16-bit instruction length allows it to
approach about 65% of standard ARM code size while
retaining ARM 32-bit processor performance.
– Smaller die size
• About 72,000 transistors
• Occupying only about 4.8mm2 in a 0.6um semiconductor
technology.
– Lower power consumption
• dissipate about 2mW/MHZ with 0.6um technology.
Features (2)
• Memory Access
– Data can be
• 8-bit (bytes)
• 16-bit (half words)
• 32-bit (words)
• Memory Interface
– Can interface to SRAM, ROM, DRAM
– Has four basic types of memory cycle
• idle cycle
• Non sequential cycle
• sequential cycle
• coprocessor register cycle
Debug Extensions
• The Debug extensions to the core add scan chains
to monitor what is occurring on the data path of
the CPU.
• Signals were also added to the core so that
processor control can be handed to the debugger
when a breakpoint or watch point has been
reached.
• This stops the processor enabling the user to
view such characteristics as register contents,
memory regions, and processor status.
Embedded ICE Logic
• In order to provide a powerful debugging environment for ARM-
based applications the EmbeddedICE logic was developed and
integrated into the ARM core architecture.
• It is a set of registers providing the ability to set hardware
breakpoints or watchpoints on code or data.
• The EmbeddedICE logic monitors the ARM core signals every cycle
to check if a breakpoint or watchpoint has been hit. Lastly, an
additional scan chain is used to establish contact between the user
and the EmbeddedICE logic.
• Communication with the EmbeddedICE logic from the external
world is provided via the test access port, or TAP, controller and a
standard IEEE 1149.1 JTAG connection.
• The advantage of on-chip debug solutions is the ability to rapidly
debug software, especially when the software resides in ROM.
synthesizable
• synthesizable (ie. distributed as RTL rather than a hardened layout)
• ARM7TDMI (without the "-S" extension) was initially designed as a
hard macro, meaning that the physical design at the transistor
layout level was done by ARM, and licensees took this fixed physical
block and placed it into their chip designs. This was the prevalent
design methodology at the time.
• Subsequently, demand increased for a more flexible and
configurable solution, so ARM moved towards delivering processor
designs as a behavioral description at the "register transfer level"
(RTL) written in a hardware description language (HDL), typically
Verilog HDL.
• The process of converting this behavioral description into a physical
network of logic gates is called "synthesis", and several major EDA
companies sell automated synthesis tools for this purpose.
• A processor design distributed to licensees as an RTL description
(such as ARM7TDMI-S) is therefore described as "synthesizable".
Instruction Pipeline
• The ARM processor uses a internal pipeline to increase
the rate of instruction flow to the processor, allowing
several operations to be undertaken simultaneously,
rather than serially.
• Pipelining is breaking down execution into multiple
steps, and executing each step in parallel.
• In most ARM processors, the instruction pipeline
consists of 3 stages.
• Basic 3 stage pipeline
– Fetch – Load from memory
– Decode – Identify instruction to execute
– Execute – Process instruction and write back result
Instruction Pipeline
• ARM7 has a 3 stage pipeline
– Fetch, Decode, Execute

• ARM9 has a 5 stage pipeline


– Fetch, Decode, Execute, Memory, Write

• ARM10 has a 6 stage pipeline


– Fetch, Issue, Decode, Execute, Memory, Write
ARM10 vs. ARM11 Pipelines
Instruction Pipeline
ARM7TDMI Processor Block Diagram
ARM7TDMI Processor Functional Diagram
32x8 Multiplier
• Earlier ARM processors (prior to ARM7TDMI) used a
smaller, simpler multiplier block which required more
clock cycles to complete a multiplication.
• Introduction of this more complex 32x8 multiplier
reduced the number of cycles required for a
multiplication of two registers (32-bit * 32-bit) to a few
cycles (data dependent).
• Modern ARM processors are generally capable of
calculating at least a 32-bit product in a single cycle,
although some of the smallest Cortex-M processors
provide an implementation choice of a faster (single-
cycle) or a smaller (32 cycle) 32-bit multiplier block.
The ARM's Barrel Shifter
• The ARM arithmetic logic unit has a 32-bit barrel shifter that is capable of
shift and rotate operations. The second operand to many ARM and Thumb
data-processing and single register data-transfer instructions can be
shifted, before the data-processing or data-transfer is executed, as part of
the instruction.
• This can be used by various classes of ARM instructions to perform
comparatively complex operations in a single instruction.
• The barrel shifter can perform the following types of operation:
• LSL - shift left by n bits
• LSR - logical shift right by n bits
• ASR - arithmetic shift right by n bits (the bits fed |into the top end
of the operand are copies of the |original top (or sign) bit
• ROR - rotate right by n bits
• RRX - rotate right extended by 1 bit. This is a 33 bit |rotate, where
the 33rd bit is the PSR C flag.
• The barrel shifter is a functional unit which
can be used in a number of different
circumstances.
• It provides five types of shifts and rotates
which can be applied to Operand2.
• LSL – Logical Shift Left
– Example: Logical Shift Left by 4.
• LSR – Logical Shift Right
– Example: Logical Shift Right by 4.

• ASR – Arithmetic Shift Right


– Example: Arithmetic Shift Right by 4, positive
value.

– Example: Arithmetic Shift Right by 4, negative


value
• ROR – Rotate Right
– Example: Rotate Right by 4.

• Examples
– MOV r0, r0, LSL #1 -Multiply R0 by two.
– MOV r1, r1, LSR #2 -Divide R1 by four (unsigned).
– MOV r2, r2, ASR #2 -Divide R2 by four (signed).
– MOV r3, r3, ROR #16 -Swap the top and bottom halves
of R3.
– ADD r4, r4, r4, LSL #4 -Multiply R4 by 17. (N = N + N * 16)
– RSB r5, r5, r5, LSL #5 -Multiply R5 by 31. (N = N * 32 - N
The ARM Processor Families (I)
• The ARM7 Family
– 32-bit RISC Processor.
– Support three-stage pipeline
– Uses Von Neumann Architecture.
• Widely used in many applications such as
palmtop computers, portable instruments,
smart card.
• Characteristics of ARM7 family
The ARM Processor Families (II)
• The ARM9 Family
• 32-bit RISC Processor with ARM and Thumb
instruction sets
• Supports five-stage pipeline.
• Uses Harvard architecture
• Widely used in mobile phones, PDAs,digital
cameras, automotive
• systems, industrial control systems.
• Characteristics of ARM9 Thumb Family

• Characteristics of ARM9E Family


The ARM Processor Families (III)
• The ARM10 Family
• 32-bit RISC processor with ARM, Thumb and
DSP instruction sets.
• Supports six-stage Pipelines.
• Uses Harvard Architecture
• Widely used in videophone, PDAs, set-top
boxes, game console, digital video
cameras,automotive and industrial control
systems
• Characteristics of ARM10 family
The ARM Processor Families (IV)
• The ARM11 Family
• 32-bit RISC processor with ARM, Thumb and DSP
instruction sets.
• Uses Harvard Architecture.
• Supports eight-stage Pipelines except
ARM1156T2 uses nine-stage pipeline.
• Widely used in automotive and industrial control
systems, 3D graphics, security critical
applications.
• Characteristics of ARM11 family
what is AMBA?
• “The ARM AMBA (Advanced Microcontroller
Bus Architecture) protocol is an open
standard, on-chip interconnect specification
for the connection and management of
functional blocks in a System-on-Chip (SoC). It
facilitates right-first-time development of
multi-processor designs with large numbers of
controllers and peripherals. AMBA promotes
design re-use by defining common interface
standards for SoC modules.”
AMBA
• AMBA: Advanced Microcontroller Bus Architecture
– It is a specification for an on-chip bus, to enable
macrocells (such as a CPU, DSP, Peripherals, and memory
controllers) to be connected together to form a
microcontroller or complex peripheral chip.
– It defines
• A high-speed, high-bandwidth bus, the Advanced High
Performance Bus (AHB).
• A simple, low-power peripheral bus, the Advanced Peripheral Bus
(APB).
• Access for an external tester to permit modular testing and fast
test of cache RAM
• Essential house keeping operations (reset/power-up, …)
AMBA protocol specifications
• The AMBA specification defines an on-chip
communications standard for designing high-performance
embedded microcontrollers. It is supported by ARM Limited
with wide cross-industry participation.
– The AMBA 5 specification defines the following
buses/interfaces:
• Advanced High-performance Bus (AHB5, AHB-Lite)
• CHI Coherent Hub Interface (CHI)
– The AMBA 4 specification defines following buses/interfaces:
• AXI Coherency Extensions (ACE) - widely used on the latest ARM
Cortex-A processors including Cortex-A7 and Cortex-A15
• AXI Coherency Extensions Lite (ACE-Lite)
• Advanced Extensible Interface 4 (AXI4)
• Advanced Extensible Interface 4 Lite (AXI4-Lite)
• Advanced Extensible Interface 4 Stream (AXI4-Stream v1.0)
• Advanced Trace Bus (ATB v1.1)
• Advanced Peripheral Bus (APB4 v2.0)
AMBA protocol specifications
• AMBA 3 specification defines four buses/interfaces:
– Advanced Extensible Interface (AXI3 or AXI v1.0) - widely used
on ARM Cortex-A processors including Cortex-A9
– Advanced High-performance Bus Lite (AHB-Lite v1.0)
– Advanced Peripheral Bus (APB3 v1.0)
– Advanced Trace Bus (ATB v1.0)
• AMBA 2 specification defines three buses/interfaces:
– Advanced High-performance Bus (AHB) - widely used on ARM7,
ARM9 and ARM Cortex-M based designs
– Advanced System Bus (ASB)
– Advanced Peripheral Bus (APB2 or APB)
• AMBA specification (First version) defines two
buses/interfaces:
– Advanced System Bus (ASB)
– Advanced Peripheral Bus (APB)
ARM7 Processor Architecture
• Features (LPC2148)
– 16/32-bit ARM7TDMI-S microcontroller in a tiny LQFP64
package.
– 8 to 40 kB of on-chip static RAM and 32 to 512 kB of on-chip
flash program memory. 128 bit wide interface/accelerator
enables high speed 60 MHz operation.
– In-System/In-Application Programming (ISP/IAP) via on-chip
boot-loader software. Single flash sector or full chip erase in 400
ms and programming of 256 bytes in 1 ms.
– Embedded ICE RT and Embedded Trace interfaces offer real-
time debugging with the on-chip Real Monitor software and
high speed tracing of instruction execution.
– USB 2.0 Full Speed compliant Device Controller with 2 kB of
endpoint RAM. In addition, the LPC2146/8 provide 8 kB of on-
chip RAM accessible to USB by DMA.
ARM7 Processor Architecture(2)
• Features (LPC2148)
– One or two 10-bit A/D converters provide a total of 6/14 analog
inputs, with conversion times as low as 2.44 µs per channel.
– Single 10-bit D/A converter provides variable analog output.
– Two 32-bit timers/external event counters (with four capture and four
compare channels each), PWM unit (six outputs) and watchdog.
– Low power real-time clock with independent power and dedicated 32
kHz clock input.
– Multiple serial interfaces including two UARTs, two Fast I2C-bus (400
kbit/s), SPI and SSP with buffering and variable data length
capabilities.
– Vectored interrupt controller with configurable priorities and vector
addresses.
– Up to 45 of 5 V tolerant fast general purpose I/O pins in a tiny LQFP64
package.
ARM7 Processor Architecture(3)
• Features (LPC2148)
– Up to nine edge or level sensitive external interrupt pins
available.
– 60 MHz maximum CPU clock available from programmable on-
chip PLL with settling time of 100 µs.
– On-chip integrated oscillator operates with an external crystal in
range from 1 MHz to 30 MHz and with an external oscillator up
to 50 MHz.
– Power saving modes include Idle and Power-down.
– Individual enable/disable of peripheral functions as well as
peripheral clock scaling for additional power optimization.
– Processor wake-up from Power-down mode via external
interrupt, USB, Brown-Out Detect (BOD) or Real-Time Clock
(RTC).
– Single power supply chip with Power-On Reset (POR) and BOD
circuits: – CPU operating voltage range of 3.0 V to 3.6 V (3.3 V ±
10 %) with 5 V tolerant I/O pads.
LPC2148 Pin Configuration
NXP LPC214X - IC
ARM Registers
• ARM has a load store architecture
• General purpose registers can hold data or
address
• Total of 37 registers each 32 bit wide
• There are 18 active registers
– 16 data registers
– 2 status registers
ARM Registers (2)
• Registers R0 - R12 are general purpose
registers
• R13 is used as stack pointer (SP)
• R14 is used as link register (LR)
• R15 is used a program counter (PC)
• CPSR – Current program status register
• SPSR – Stored program status register
ARM Registers (3)
• Three of the 16 visible registers have special roles:
– Stack pointer : Software normally uses R13 as a Stack Pointer
(SP). R13 is used by the PUSH and POP instructions in T variants.
– Link register :Register 14 is the Link Register (LR). This register
holds the address of the next instruction after a Branch and Link
(BL or BLX) instruction, which is the instruction used to make a
subroutine call. It is also used for return address information on
entry to exception modes. At all other times, R14 can be used as
a general-purpose register.
– Program counter :Register 15 is the Program Counter (PC). It
can be used in most instructions as a pointer to the instruction
which is two instructions after the instruction being executed. In
ARM state, all ARM instructions are four bytes long (one 32-bit
word) and are always aligned on a word boundary. The PC can
be halfword (16-bit) and byte aligned respectively in these
states.
ARM Registers (4)
• Program status register
– The current operating processor status is in the
Current Program Status Register (CPSR).
– CPSR is used to control and store CPU states
– CPSR is divided in four 8 bit fields
• Flags
• Status
• Extension
• Control
Current Program status register(CPSR)
Current Program status register
Program Status Registers
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q [de] J GE[3:0] IT[abc] E A I F T mode


f s x c
• Condition code flags • T bit
– N = Negative result from ALU – T = 0: Processor in ARM state
– Z = Zero result from ALU – T = 1: Processor in Thumb state
– C = ALU operation Carried out • J bit
– V = ALU operation oVerflowed – J = 1: Processor in Jazelle state
• Mode bits
• Sticky Overflow flag - Q flag – Specify the processor mode
– Indicates if saturation has occurred • Interrupt Disable bits
– I = 1: Disables IRQ
• SIMD Condition code bits – GE[3:0] – F = 1: Disables FIQ
– Used by some SIMD instructions • E bit
– E = 0: Data load/store is little endian
• IF THEN status bits – IT[abcde] – E = 1: Data load/store is bigendian
– Controls conditional execution of Thumb • A bit
instructions – A = 1: Disable imprecise data aborts
Current Program status register
• The Current Program Status Register (CPSR) is
accessible in all processor modes.
• Each exception mode also has a Saved
Program Status Register (SPSR), that is used to
preserve the value of the CPSR when the
associated exception occurs.
Save Program status register(SPSR)
• Each privileged mode (except system mode)
has associated with it a Saved Program Status
Registers(SPSR ).
• This SPSR is used to save the state of CPSR
(Current Program Status Register) when the
privileged mode is entered in order that the
user state can be fully restored when the user
process is resumed
Data Sizes and Instruction Sets
• ARM is a 32-bit load / store RISC architecture
– The only memory accesses allowed are loads and stores
– Most internal registers are 32 bits wide
– Most instructions execute in a single cycle
• When used in relation to ARM cores
– Halfword means 16 bits (two bytes)
– Word means 32 bits (four bytes)
– Doubleword means 64 bits (eight bytes)

• ARM cores implement two basic instruction sets


– ARM instruction set – instructions are all 32 bits long
– Thumb instruction set – instructions are a mix of 16 and 32 bits
• Thumb-2 technology added many extra 32- and 16-bit instructions to the original
16-bit Thumb instruction set

• Depending on the core, may also implement other instruction sets


– VFP instruction set – 32 bit (vector) floating point instructions
– NEON instruction set – 32 bit SIMD instructions
– Jazelle-DBX - provides acceleration for Java VMs (with additional software support)
– Jazelle-RCT - provides support for interpreted languages
Processor Modes
• ARM has seven basic operating modes
– Each mode has access to its own stack space and a different subset of registers
– Some operations can only be carried out in a privileged mode

Mode Description
Supervisor Entered on reset and when a Supervisor call
(SVC) instruction (SVC) is executed
Entered when a high priority (fast) interrupt is
Exception modes

FIQ
raised

IRQ Entered when a normal priority interrupt is raised


Privileged
modes
Abort Used to handle memory access violations

Undef Used to handle undefined instructions

Privileged mode using the same registers as User


System
mode
Mode under which most Applications / OS tasks Unprivileged
User
run mode
Processor Modes
• Processor modes determine
– Which registers are active, and
– Access rights to CPSR register itself
• Each processor mode is either,
– Privileged: Full read-write access to the CPSR
– Non-Privileged: Only read access to the control field of
CPSR but read-write access to the condition flags
• ARM has seven modes
– Privileged: Abort, Fast interrupt request, Interrupt
request, Supervisor, System and Undefined
– Non-Privileged: User (Programs and applications)
The ARM Register Set-Currently
visible in particular mode
User mode IRQ FIQ Undef Abort SVC
r0 • User level
r1 ARM has 37 registers, all 32-bits long
– 15 GPRs, PC, CPSR
r2
A subset of these registers is accessible in (current program status
r3
each mode register)
r4
r5 Note: System mode uses the User mode • Remaining registers are used
r6 register set. for system-level
r7 programming and for
r8 r8 handling exceptions
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr

Current mode Banked out registers


Program Counter (r15)
• When the processor is executing in ARM state:
– All instructions are 32 bits wide
– All instructions must be word aligned
– Therefore the pc value is stored in bits [31:2] with bits [1:0]
undefined (as instruction cannot be halfword or byte aligned)

• When the processor is executing in Thumb state:


– All instructions are 16 bits wide
– All instructions must be halfword aligned
– Therefore the pc value is stored in bits [31:1] with bit [0]
undefined (as instruction cannot be byte aligned)

• When the processor is executing in Jazelle state:


– All instructions are 8 bits wide
– Processor performs a word access to read 4 instructions at once
Exceptions
• Exceptions are generated by internal and external
sources to cause the processor to handle an
event, such as an externally generated interrupt
or an attempt to execute an Undefined
instruction.
• The processor state just before handling the
exception is normally preserved so that the
original program can be resumed when the
exception routine has completed.
• More than one exception can arise at the same
time.
Exception handling
• Exception:
– Any condition that needs to halt normal
sequential execution of instructions
• ARM core is reset
• Instruction fetch or memory access fails
• Undefined instruction is encountered
• Software interrupt instruction is executed
• External interrupt has been raised
• The ARM architecture supports seven types of
exception.
• When an exception occurs, execution is forced
from a fixed memory address corresponding
to the type of exception. These fixed
addresses are called the exception vectors.
ARM Exception Types
• The ARM recognises seven different types of
exceptions.
– Reset
– Undefined instruction
– Software Interrupt (SWI)
– Prefetch Abort
– Data Abort
– IRQ
– FIQ
ARM Exceptions Types (Cont.)
• Reset
– Occurs when the processor reset pin is asserted
• For signalling Power-up
• For resetting as if the processor has just powered up
– Software reset
• Can be done by branching to the reset vector (0x0000)
• Undefined instruction
– Occurs when the processor or coprocessors
cannot recognize the currently execution
instruction
ARM Exceptions Types (Cont.)
• Software Interrupt (SWI)
– User-defined interrupt instruction
– Allow a program running in User mode to request
privileged operations that are in Supervisor mode
• For example, RTOS functions
• Prefetch Abort
– Fetch an instruction from an illegal address, the
instruction is flagged as invalid
– However, instructions already in the pipeline continue
to execute until the invalid instruction is reached and
then a Prefetch Abort is generated.
ARM Exceptions Types (Cont.)
• Data Abort
– A data transfer instruction attempts to load or store
data at an illegal address
• IRQ
– The processor external interrupt request pin is
asserted (LOW) and the I bit in the CPSR is clear
(enable)
• FIQ
– The processor external fast interrupt request pin is
asserted (LOW) and the F bit in the CPSR is clear
(enable)
ARM processor exceptions and modes
ARM Vector Table
• Exception handling is controlled by a vector table.
• It is a table of addresses that the ARM core branches to
when an exception is raised and there is always branching
instructions that direct the core to the ISR.
• This is a reserved area of 32 bytes at the bottom of the
memory map with one word of space allocated to each
exception type.
• the vector table starts at 0x00000000 (ARMx20 processors
can optionally locate the vector table address to
0xffff0000).
• A vector table consists of a set of ARM instructions that
manipulate the PC (i.e. B, MOV, and LDR). These
instructions cause the PC to jump to a specific location that
can handle a specific exception or interrupt.
ARM exception vector locations
Exception handling process
• When an exception occurs, control passes through an area
of memory called the vector table. This is a reserved area
usually at the bottom of the memory map.
• Figure shows the exception handling process.
ARM Exception Priorities
Response to an Exception Handler
• When an exception occurs, the ARM:
– Copies the CPSR into the SPSR for the mode
in which the exception is to be handled.
• Saves the current mode, interrupt mask, and
condition flags. 0x1C FIQ
0x18 IRQ
– Changes the appropriate CPSR mode bits
0x14 (Reserved)
• Change to the appropriate mode
0x10 Data Abort
• Map in the appropriate banked registers for that
mode 0x0C Prefetch Abort
0x08
– Disable interrupts Software Interrupt
0x04 Undefined Instruction
• IRQs are disabled when any exception occurs.
0x00 Reset
• FIQs are disabled when a FIQ occurs, and on
reset Vector Table
– Set lr_mode to the return address
– Set the program counter(PC) to the vector
address for the exception
Returning From an Exception Handler
• To return, exception handler needs to:
– Restore the CPSR from spsr_mode
– Restore the program counter using the return
address stored in lr_mode
Interrupt Handlers
• There are two types of interrupts available on ARM processor.
– The first type is the interrupt caused by external events from hardware
peripherals
– The second type is the SWI instruction.
• The ARM processor has two levels of external interrupt, FIQ and
IRQ, both of which are level-sensitive active LOW signals into the
core.
• For an interrupt to be taken, the relevant input must be LOW and
the disable bit in the CPSR must be clear.
• FIQs have higher priority than IRQs in two ways:
– 1 FIQs are serviced first when multiple interrupts occur.
– 2 Servicing a FIQ causes IRQs to be disabled, preventing them from
being serviced until after the FIQ handler has re-enabled them (usually
by restoring the CPSR from the SPSR at the end of the handler).
Assigning interrupts
• How are interrupts assigned?
• It is up to the system designer who can decide
which hardware peripheral can produce which
interrupt request.
– Interrupt controller
• Multiple external interrupts to one if the two ARM interrupt
requests
– Standard design practice
• SWI are reserved to call privileged operating system routines
• IRQ are assigned for general-purpose interrupts
– A periodic timer
• FIQ are reserved for a single interrupt source that require a
fast response time
– Direct memory access to move blocks of memory
– FIQ has a higher priority and shorter interrupt latency than IRQ
Interrupt Latency
• It is the interval of time between from an
external interrupt signal being raised to the
first fetch of an instruction of the ISR of the
raised interrupt signal.
• System architects must balance between two
things,
– first is to handle multiple interrupts
simultaneously,
– second is to minimize the interrupt latency.
Interrupt Latency
• Minimization of the interrupt latency is achieved
by software handlers by two main methods,
– the first one is to allow nested interrupt handling so
the system can respond to new interrupts during
handling an older interrupt.
• This is achieved by enabling interrupts immediately after the
interrupt source has been serviced but before finishing the
interrupt handling.
– The second one is the possibility to give priorities to
different interrupt sources;
• this is achieved by programming the interrupt controller to
ignore interrupts of the same or lower priority than the
interrupt being handled if there is one.
Enabling and disabling Interrupt
• This is done by modifying the CPSR, this is done
using only 3 ARM instruction:
– MRS To read CPSR
– MSR To store in CPSR
– BIC Bit clear instruction
– ORR OR instruction
Enabling an IRQ/FIQ Disabling an IRQ/FIQ
Interrupt: Interrupt:
MRS r1, cpsr MRS r1, cpsr
BIC r1, r1, #0x80/0x40 ORR r1, r1, #0x80/0x40
MSR cpsr_c, r1 MSR cpsr_c, r1
Interrupt stack
• Stacks are needed extensively for context
switching between different modes when
interrupts are raised.
• The design of the exception stack depends on
two factors:
– OS Requirements.
– Target hardware.
• A good stack design tries to avoid stack overflow
because it cause instability in embedded systems.
Setting up the interrupt stacks
• Each operation in a system has
its own requirement for stack
design
– Stack pointers are initialized
after reset
• Where the interrupt stack is
placed depends upon the
RTOS requirements and the
specific hardware being used.
• Two design decisions need to
be made for the stacks:
– The location
– The size

• Figure 1.14 shows two


possible designs.
Setting up the interrupt stacks
• Design A is a standard design found on many ARM based
systems.
• If the Interrupt Stack expands into the Interrupt vector the
target system will crash. Unless some check is placed on the
extension of the stack and some means to handle that error
when it occurs.
• The example in figure 1.14 shows two possible stack layouts.
– The first (A) shows the tradition stack layout with the interrupt
stack being stored underneath the code segment.
– The second, layout (B) shows the interrupt stack at the top of the
memory above the user stack.
• One of the main advantages that layout (B) has over layout
(A) is that the stack grows into the user stack and thus does
not corrupt the vector table.
• For each mode a stack has to be setup. This is carried out
every time the processor is reset.
Example to setup stacks
USR_Stack EQU 0x20000
IRQ_Stack EQU 0x8000
SVC_Stack EQU IRQ_Stack-128

Usr32md EQU 0x10
FIQ32md EQU 0x11
IRQ32md EQU 0x12
SVC32md EQU 0x13
Abt32md EQU 0x17
Und32md EQU 0x1b
Sys32md EQU 0x1f
NoInt EQU 0xc0 ; Disable interrupts
Interrupt handling schemes
• Non-nested interrupt handler
• Nested interrupt handler
• Re-entrant nested interrupt handler
• Prioritized interrupt handler
Interrupt handling schemes
• Non-nested interrupt handling scheme
– This is the simplest interrupt handler.
– Interrupts are disabled until control is returned
back to the interrupted task.
– One interrupt can be served at a time.
– Not suitable for complex embedded systems.
Interrupt handling schemes
• Each stage is explained in more detail
below:
1. External source (for example from an
interrupt controller) sets the
Interrupt flag. Processor masks
further external interrupts and
vectors to the interrupt handler via
an entry in the vector table.
2. Upon entry to the handler, the
handler code saves the current
context of the non banked registers.
3. The handler then identifies the
interrupt source and executes the
appropriate interrupt service routine
(ISR).
4. ISR services the interrupt.
5. Upon return from the ISR the handler
restores the context.
6. Enables interrupts and return.
• Nested interrupt handling scheme(1)
– Handling more than one interrupt at a time is possible by
enabling interrupts before fully serving the current interrupt.
– Latency is improved.
– System is more complex.
– No difference between interrupts by priorities, so normal
interrupts can block critical interrupts.
Nested interrupt handling scheme(2)
Nested interrupt handling scheme(2)
Re-entrant interrupt handler
• A re-entrant interrupt handler is a method of handling multiple
interrupts where interrupts are filtered by priority.
• This is important since there is a requirement that interrupts with
higher priority have a lower latency.
• This type of filtering cannot be achieved using the conventional
nested interrupt handler.
• The basic difference between a re-entrant interrupt handler and a
nested interrupt handler is that the interrupts are re-enabled early
on in the interrupt handler to achieve low interrupt latency.
Prioritized interrupt handler
• Types of prioritized interrupt handler which
provide different handling strategies, as given
below:
– Simple prioritized interrupt handler
– Standard prioritized interrupt handler
– Grouped prioritized interrupt handler
Prioritized interrupt handler
• Simple prioritized interrupt handler:
– In this scheme the handler will associate a priority level
with a particular interrupt source.
– A higher priority interrupt will take precedence over a
lower priority interrupt.
– Handling prioritization can be done by means of software
or hardware.
– In case of hardware prioritization the handler is simpler to
design because the interrupt controller will give the
interrupt signal of the highest priority interrupt requiring
service.
– But on the other side the system needs more initialization
code at start-up since priority level tables have to be
constructed before the system being switched on.
Prioritized interrupt handler
• Simple prioritized interrupt handler:
Prioritized interrupt handler
• Standard prioritized interrupt handler
– arranges priorities in a special way to reduce the
time needed to decide on which interrupt will be
handled.
• Grouped prioritized interrupt handler
– groups some interrupts into subset which has a
priority level, this is good for large amount of
interrupt sources.
Memory formats
• The ARM7TDMI processor views memory as a linear
collection of bytes numbered in ascending order from zero.
• For example:
– bytes zero to three hold the first stored word
– bytes four to seven hold the second stored word.
• The ARM7TDMI processor is bi-endian and can treat words
in memory as being stored in either:
– Little-endian.
– Big-endian
• Note
– Little-endian is traditionally the default format for ARM
processors.
• Little-endian
– In little-endian format, the lowest addressed byte in a
word is considered the least-significant byte of the
word and the highest addressed byte is the most
significant.
• Big-endian
– In big-endian format, the ARM7TDMI processor
stores the most significant byte of a word at the
lowest-numbered byte, and the least significant
byte at the highest-numbered byte.
ARM Instruction Set
• ARM instructions fall into one of the following
three categories:
• Data processing instructions.
• Data transfer instructions.
• Control flow instructions.
Features of the ARM Instruction Set
• Load-store architecture
– Process values which are in registers
– Load, store instructions for memory data accesses
• 3-address data processing instructions
• Conditional execution of every instruction
• The inclusion of every powerful load and store multiple
register instructions
• Single-cycle execution of all instruction
• Open coprocessor instruction set extension
• Very dense 16-bit compressed instruction set (Thumb)
Load-store architecture
• ARM employs a load-store architecture.
– This means that the instruction set will only process
(add, subtract, and so on) values which are in registers
(or specified directly within the instruction itself), and
will always place the results of such processing into a
register.
– The only operations which apply to memory state are
ones which copy memory values into registers (load
instructions) or copy register values into memory
(store instructions).
– ARM does not support such 'memory-to-memory'
operations.
Thumb
• Thumb is a 16-bit instruction set
– Optimized for code density from C code
– Improved performance form narrow memory
– Subset of the functionality of the ARM instruction set
• Core has two execution states – ARM and Thumb
– Switch between them using BX instruction
• Thumb has characteristic features:
– Most Thumb instruction are executed unconditionally
– Many Thumb data process instruction use a 2-address
format
– Thumb instruction formats are less regular than ARM
instruction formats, as a result of the dense encoding.
Conditional Execution (1)
• One of the ARM's most interesting features is that
each instruction is conditionally executed
• Most other instruction sets allow conditional
execution of branch instructions, based on the
state of the condition flags.
• In ARM, almost all instructions have can be
conditionally executed.
• If corresponding condition is true, the instruction is
executed. If the condition is false, the instruction is
turned into a nop.
Conditional Execution (2)
• The condition is specified by suffixing the instruction with a
condition code mnemonic.
• This improves code density and performance by reducing the
number of forward branch instructions.
• CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
• In the following example, the instruction moves r1 to r0
only if carry is set.
MOVCS r0, r1
The Condition Field
31 28 24 20 16 12 8 4 0

Cond

0000 = EQ - Z set (equal) 1001 = LS - C clear or Z (set unsigned


0001 = NE - Z clear (not equal) lower or same)

0010 = HS / CS - C set (unsigned 1010 = GE - N set and V set, or N clear


higher or same) and V clear (>or =)
0011 = LO / CC - C clear (unsigned 1011 = LT - N set and V clear, or N clear
lower) and V set (>)
0100 = MI -N set (negative) 1100 = GT - Z clear, and either N set and
0101 = PL - N clear (positive or V set, or N clear and V set (>)
zero) 1101 = LE - Z set, or N set and V clear,or
0110 = VS - V set (overflow) N clear and V set (<, or =)
0111 = VC - V clear (no overflow) 1110 = AL - always
1000 = HI - C set and Z clear 1111 = NV - reserved.
(unsigned higher)
Using and updating the Condition Field
• To execute an instruction conditionally, simply postfix it
with the appropriate condition:
– For example an add instruction takes the form:
• ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL)
– To execute this only if the zero flag is set:
• ADDEQ r0,r1,r2 ; If zero flag set then…
; ... r0 = r1 + r2
• By default, data processing operations do not affect the
condition flags (apart from the comparisons where this is
the only effect).
• To cause the condition flags to be updated, the S bit of the
instruction needs to be set by postfixing the instruction
(and any condition code) with an “S”.
– For example to add two numbers and set the condition flags:
• ADDS r0,r1,r2 ; r0 = r1 + r2
; ... and set flags
Examples of conditional execution
• Use a sequence of several conditional instructions
if (a==0) func(1);
CMP r0,#0
MOVEQ r0,#1
BLEQ func

• Set the flags, then use various condition codes


if (a==0) x=0;
if (a>0) x=1;
CMP r0,#0
MOVEQ r1,#0
MOVGT r1,#1

• Use conditional compare instructions


if (a==4 || a==10) x=0;
CMP r0,#4
CMPNE r0,#10
MOVEQ r1,#0
Conditional Execution
• An unusual feature of the ARM instruction set is that conditional
execution applies no only to branches but to all ARM instructions

CMP r0,#5 CMP r0,#5


BEQ Bypass ;if (r0!=5) ADDNE r1,r1,r0
ADD r1,r1,r0 ;{r1=r1+r0} SUBNE r1,r1,r2
SUB r1,r1,r2
Bypass …

• Whenever the conditional sequence is 3 instructions for


fewer it is better (smaller and faster) to exploit conditional
execution than to use a branch
if((a==b)&&(c==d)) e++; CMP r0,r1
CMPEQ r2,r3
ADDEQ r4,r4,#1
ARM Instruction Set Format
31 2827 1615 87 0 Instruction type
Cond 0 0 I Opcode S Rn Rd Operand2 Data processing / PSR Transfer
Cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply
Cond 0 0 0 0 1 U A S RdHi RdLo Rs 1 0 0 1 Rm Long Multiply (v3M / v4 only)
Cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap
Cond 0 1 I P U B W L Rn Rd Offset Load/Store Byte/Word
Cond 1 0 0 P U S W L Rn Register List Load/Store Multiple
Cond 0 0 0 P U 1 W L Rn Rd Offset1 1 S H 1 Offset2 Halfword transfer : Immediate offset (v4 only)

Cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword transfer: Register offset (v4 only)

Cond 1 0 1 L Offset Branch


Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange (v4T only)
Cond 1 1 0 P U N W L Rn CRd CPNum Offset Coprocessor data transfer
Cond 1 1 1 0 Op1 CRn CRd CPNum Op2 0 CRm Coprocessor data operation
Cond 1 1 1 0 Op1 L CRn Rd CPNum Op2 1 CRm Coprocessor register transfer
Cond 1 1 1 1 SWI Number Software interrupt
The ARM instruction set
• Data processing instructions.
• ARM data processing instructions enable the programmer
to perform arithmetic and logical operations on data values
in registers.
• They are
– Arithmetic instructions
– Logical instructions
– Comparison instructions
– Move instructions
– Multiply instructions.
• the data processing instructions are the only instructions
which modify data values.
• Most data processing instructions can process one of their
operands using the barrel shifter.
The ARM instruction set
• Data processing instructions.
• General rules:
– All operands are 32-bit, coming from registers or
literals.
– The result, if any, is 32-bit and placed in a register
(with the exception for long multiply which produces
a 64-bit result)
– 3-address format
Data processing instruction binary
encoding
31 28 2726 25 24 21 20 19 1615 12 11 0
cond 0 0 # opcode S Rn Rd operand 2

destination register
first operand register
set condition codes
arithmetic/logic function

25 11 8 7 0
1 #rot 8-bit immediate

immediate alignment
11 7 6 5 4 3 0
#shift Sh 0 Rm

25 immediate shift length


0 shift type
second operand register
11 8 7 6 5 4 3 0
Rs 0 Sh 1 Rm

register shift length


The ARM instruction set
Data processing instructions:
• Consist of :
– Arithmetic: ADD ADC SUB SBC RSB RSC
– Logical: AND ORR EOR BIC
– Comparisons: CMP CMN TST TEQ
– Data movement: MOV MVN
• These instructions only work on registers, NOT memory.
• Syntax:
<Operation>{<cond>}{S} Rd, Rn, Operand2
• Comparisons set flags only - they do not specify Rd
• Data movement does not specify Rn
• Second operand is sent to the ALU via barrel shifter.
The ARM instruction set
Data processing instructions:
• The arithmetic/logic instructions share a common
instruction format.
• These perform an arithmetic or logical operation on
up to two source operands, and write the result to a
destination register.
• They can also optionally update the condition code
flags, based on the result.
• Of the two source operands:
– one is always a register
– the other has two basic forms:
• an immediate value
• a register value, optionally shifted.
The ARM instruction set
Data processing instructions:
• If the operand is a shifted register, the shift
amount can be
– an immediate value or
– the register value.
• Five types of shift can be specified.
– LSL/ASL, LSR, ASR, ROR, RRX
• Every arithmetic/logic instruction can therefore
perform an arithmetic/logic operation and a
shift operation.
• ARM does not have dedicated shift instructions.
The ARM instruction set
Data processing instructions:
• Arithmetic operations.
– ADD, ADC : add (w. carry)
– SUB, SBC : subtract (w. carry)
– RSB, RSC : reverse subtract (w. carry)
– MUL, MLA : multiply (and accumulate)

Instruction Sets-169
The ARM instruction set
Data processing instructions:
• Arithmetic operations examples.
ADD r0, r1, r2 ;r0:= r1 + r2
ADC r0, r1, r2 ;r0:= r1 + r2 +C
SUB r0, r1, r2 ;r0:= r1 - r2
SBC r0, r1, r2 ;r0:= r1 - r2 + C - 1
RSB r0, r1, r2 ;r0:= r2 – r1
RSC r0, r1, r2 ;r0:= r2 – r1 + C – 1
• Some other Examples
– SUBGT r3, r3, #1
– RSBLES r4, r5, #5
– ADD r0, r2, r1, LSL #2
– RSB r4, r3, r2, LSL #3
Instruction Sets-170
The ARM instruction set
Data processing instructions:
• Bit-wise logical operations.
– Perform the specified Boolean logic operation on each bit
pair of the input operands, so in the first case r0[i]:= r1[i]
AND r2[i] for each value of i from 0 to 31 inclusive, where
r0[i] is the ith bit of r0.
• AND, OR , XOR (here called EOR) logical operations
and BIC(stands for ‘bit clear’).

Instruction Sets-171
The ARM instruction set
Data processing instructions:
• Bit-wise logical operations examples.

• bit clear(BIC): R2 is a mask identifying which bits of R1 will be cleared


to zero
• let us consider R1=0x11111111 R2=0x01100101
BIC R0, R1, R2
result in R0=0x10011010
• Examples:
– AND r0, r1, r2
– BICEQ r2, r3, #7
– EORS r1,r3,r0
Instruction Sets-172
The ARM instruction set
Data processing instructions:
• Comparison operations.
– These instructions do not produce a result but just set the
condition code bits (N, Z, C and V) in the CPSR according
to the selected operation.

Instruction Sets-173
The ARM instruction set
Data processing instructions:
• Comparison operations examples.
PRE cpsr = nzcvqiFt_USER
r0 = 4 r9 = 4
CMP r0, r9
POST cpsr = nZcvqiFt_USER
• You can see that both registers, r0 and r9, are equal before
executing the instruction.
• prior to execution
– The value of the z flag is 0 and is represented by a lowercase z.
• After execution
– the z flag changes to 1 or an uppercase Z.
• This change indicates equality.
• The CMP is effectively a subtract instruction with the result
discarded.
Instruction Sets-174
The ARM instruction set
Data processing instructions:
• Comparison operations examples.
• compare
– CMP R1, R2 @ set cc on R1-R2
• compare negated
– CMN R1, R2 @ set cc on R1+R2
• bit test
– TST R1, R2 @ set cc on R1 and R2
• test equal
– TEQ R1, R2 @ set cc on R1 xor R2

Instruction Sets-175
The ARM instruction set
Data processing instructions:
• Multiplication operations.
– The multiply instructions multiply the contents of a pair
of registers and, depending upon the instruction,
accumulate the results in with another register.
– The long multiplies accumulate onto a pair of registers
representing a 64-bit value. The final result is placed in a
destination register or a pair of registers.

Instruction Sets-176
The ARM instruction set
Data processing instructions:
• Multiplication operations.
• Multiply:
MUL R0, R1, R2 ; R0 = (R1xR2)[31:0]
• Multiply-accumulate:
MLA r4, r3, r2, r1 ; r4 := (r3 x r2 + r1)[31:0]

• Multiplying two 32-bit integers gives a 64-bit result, the least significant
32 bits of which are placed in the result register and the rest are ignored.
• This can be viewed as multiplication in modulo arithmetic and gives the
correct result whether the operands are viewed as signed or unsigned
integers.
• Operand restrictions
– Immediate second operands are not supported.
– The result register must not be the same as the first source register.
– The destination register Rd must not be the same as the operand register Rm.
– R15 must not be used as an operand or as the destination register.
Instruction Sets-177
The ARM instruction set
Data processing instructions:
• Register movement operations.
– Move is the simplest ARM instruction.
– It copies N into a destination register Rd, where N is a
register or immediate value.
– This instruction is useful for setting initial values and
transferring data between registers.

Instruction Sets-178
The ARM instruction set
Data processing instructions:
• Register movement operations.
PRE r5 = 5 r7 = 8
MOV r7, r5 ;r7 = r5
POST r5 = 5 r7 = 5
• This example shows a simple move instruction.
• The MOV instruction takes the contents of
register r5 and copies them into register r7,
• in this case, taking the value 5, and overwriting
the value 8 in register r7.

Instruction Sets-179
The ARM instruction set
Data processing instructions:
• Register movement operations.
– MVN r0, r2 ;r0= not r2
• The 'MVN' mnemonic stands for 'move negated';
• it leaves the result register set to the value
obtained by inverting every bit in the source
operand.
• Examples:
– MOVS r2, #10
– MVNEQ r1,#0
• Use MVN to:
– form a bit mask
– take the ones complement of a value.
Data operation varieties
• Logical shift:
– fills with zeroes
• Arithmetic shift:
– fills with sign bit on shift right
• RRX performs 33-bit rotate, including C bit
from CPSR above sign bit.

Instruction Sets-181
ARM shift operations
• The available shift operations are:
– LSL: logical shift left by 0 to 31 places; fill the
vacated bits at the least significant end of the
word with zeros.
– LSR: logical shift right by 0 to 31 places; fill the
vacated bits at the most significant end of the
word with zeros.
ARM shift operations
• The available shift operations are:
– ASL: arithmetic shift left; this is a synonym for LSL.
– ASR: arithmetic shift right by 0 to 31 places;
• fill the vacated bits at the MSB end of the word with
zeros if the source operand was positive, or with ones if
the source operand was negative.
ARM shift operations
• The available shift operations are:
– ROR: rotate right by 0 to 32 places;
– RRX: rotate right extended by 1 place;
Data transfer instructions
• Data transfer instructions move data between ARM
registers and memory.
• There are three basic forms of data transfer instruction in
the ARM instruction set:
– Single register load and store instructions.
• These instructions provide the most flexible way to transfer single
data items between an ARM register and memory.
• The data item may be a byte, a 32-bit word, or a 16-bit half-word.
– Multiple register load and store instructions.
• These instructions are less flexible than single register transfer
instructions, but enable large quantities of data to be transferred
more efficiently.
• They are used for procedure entry and exit, to save and restore
workspace registers, and to copy blocks of data around memory.
– Single register swap instructions.
• These instructions allow a value in a register to be exchanged with a
value in memory, effectively doing both a load and a store operation
in one instruction.
ARM load/store instructions
• The ARM is a Load / Store Architecture:
– Does not support memory to memory data processing
operations.
– Must move data values into registers before using them.
• This might sound inefficient, but in practice isn’t:
– Load data values from memory into registers.
– Process data in registers using a number of data processing
instructions which are not slowed down by memory access.
– Store results from registers out to memory.
• The ARM has three sets of instructions which interact with
main memory. These are:
– Single register data transfer (LDR / STR).
– Block data transfer (LDM/STM).
– Single Data Swap (SWP).
ARM load/store instructions
• LDR, LDRH, LDRB : load (half-word, byte)
• STR, STRH, STRB : store (half-word, byte)
• Addressing modes:
– register indirect : LDR r0,[r1]
– with second register : LDR r0,[r1,-r2]
– with constant : LDR r0,[r1,#4]

Instruction Sets-187
Single register data transfer
• The basic load and store instructions are:
– Load and Store Word or Byte
• LDR / STR / LDRB / STRB
• ARM Architecture Version 4 also adds support for halfwords
and signed data.
– Load and Store Halfword
• LDRH / STRH
– Load Signed Byte or Halfword - load value and sign extend it to 32
bits.
• LDRSB / LDRSH
• All of these instructions can be conditionally executed by
inserting the appropriate condition code after STR / LDR.
– e.g. LDREQB
• Syntax:
– <LDR|STR>{<cond>}{<size>} Rd, <address>
Addressing Modes
• Immediate Addressing
– The desired value is a binary value in the instruction
• Register Addressing
– The instruction contains the full binary address
• Indirect addressing
– The instruction contains the binary address of a memory
location containing the binary address
• Base relative addressing
– Plus offset
– Plus index
– Plus scaled index
• Stack addressing
Memory Addressing Modes
• Pre-indexed mode
– The effective address of the operand is the sum of the
contents of the base register Rn and an offset value
• Pre-indexed with writeback mode
– The effective address of the operand is generated in
the same way as in the Pre-indexed mode, and then
the effective address is written back into Rn
• Post-indexed mode
– The effective address of the operand is the contents
of Rn. The offset is then added to this address and the
result is written back into Rn.
Register-indirect addressing
• The memory location to be accessed is held in a base register
– STR r0, [r1] ; Store contents of r0 to location pointed to
; by contents of r1.
– LDR r2, [r1] ; Load r2 with contents of memory location
; pointed to by contents of r1.

r0 Memory
Source
0x5
Register
for STR

r1 r2
Base Destination
0x200 0x200 0x5 0x5
Register Register
for LDR
Base plus offset addressing
• As well as accessing the actual location contained in the base
register, these instructions can access a location offset from
the base register pointer.
• This offset can be
– An unsigned 12bit immediate value (ie 0 - 4095 bytes).
– A register, optionally shifted by an immediate value
• This can be either added or subtracted from the base
register:
– Prefix the offset value or register with ‘+’ (default) or ‘-’.
• This offset can be applied:
– before the transfer is made: Pre-indexed addressing
• optionally auto-incrementing the base register, by postfixing the
instruction with an ‘!’.
– after the transfer is made: Post-indexed addressing
• causing the base register to be auto-incremented.
Pre-indexed Addressing
• Example: STR r0, [r1,#12] Memory
r0 Source
0x5 Register
for STR
Offset
12 0x20c 0x5
r1
Base
0x200 0x200
Register

• To store to location 0x1f4 instead use: STR r0, [r1,#-12]


• To auto-increment base pointer to 0x20c use: STR r0, [r1, #12]!
• If r2 contains 3, access 0x20c by multiplying this by 4:
– STR r0, [r1, r2, LSL #2] ;r2= r2*4
Post-indexed Addressing
• Example: STR r0, [r1], #12 Memory

r1 Offset r0
Updated Source
Base 0x20c 12 0x20c 0x5 Register
Register for STR

0x200 0x5
r1
Original
Base 0x200
Register

• To auto-increment the base register to location 0x1f4 instead use:


– STR r0, [r1], #-12
• If r2 contains 3, auto-incremenet base register to 0x20c by
multiplying this by 4:
– STR r0, [r1], r2, LSL #2
Block Data Transfer (1)
• The Load and Store Multiple instructions (LDM / STM) allow betweeen 1 and
16 registers to be transferred to or from memory.
• The transferred registers can be either:
– Any subset of the current bank of registers (default).
– Any subset of the user mode bank of registers when in a priviledged mode
(postfix instruction with a ‘^’).

31 28 27 24 23 22 21 20 19 16 15 0

Cond 1 0 0 P U S W L Rn Register list

Condition field Base register Each bit corresponds to a particular


register. For example:
Up/Down bit Load/Store bit • Bit 0 set causes r0 to be transferred.
0 = Down; subtract offset from base 0 = Store to memory • Bit 0 unset causes r0 not to be transferred.
1 = Up ; add offset to base 1 = Load from memory At least one register must be transferred as
Write- back bit the list cannot be empty.
Pre/Post indexing bit
0 = Post; add offset after transfer, 0 = no write-back
1 = Pre ; add offset before transfer 1 = write address into base

PSR and force user bit


0 = don’t load PSR or force user mode
1 = load PSR or force user mode
Block Data Transfer (2)
• Base register used to determine where memory access
should occur.
– 4 different addressing modes allow increment and decrement
inclusive or exclusive of the base register location.
– Base register can be optionally updated following the transfer
(by appending it with an ‘!’.
– Lowest register number is always transferred to/from lowest
memory location accessed.
• These instructions are very efficient for
– Saving and restoring context
• For this useful to view memory as a stack.
– Moving large blocks of data around memory
• For this useful to directly represent functionality of the instructions.
Block Data Transfer (3)
• When LDM / STM are not being used to implement
stacks, it is clearer to specify exactly what
functionality of the instruction is:
– i.e. specify whether to increment / decrement the base
pointer, before or after the memory access.
• In order to do this, LDM / STM support a further
syntax in addition to the stack one:
– STMIA / LDMIA : Increment After
– STMIB / LDMIB : Increment Before
– STMDA / LDMDA : Decrement After
– STMDB / LDMDB : Decrement Before
Stack Operations
• The ARM architecture uses the load-store multiple
instructions to carry out stack operations.
• The pop operation (removing data from a stack) uses a
load multiple instruction.
• the push operation (placing data onto the stack) uses a
store multiple instruction.
• A stack is either ascending (A) or descending (D).
– Ascending stacks grow towards higher memory addresses.
– Descending stacks grow towards lower memory addresses.
• the LDMFD and STMFD instructions provide the pop
and push functions, respectively.
Stack Operations
• Example:
• The STMFD instruction pushes registers onto
the stack, updating the sp.
• PRE r1 = 0x00000002
• r4 = 0x00000003 sp = 0x00080014
• STMFD sp!, {r1,r4}
Swap and Swap Byte Instructions
• The swap instruction is a special case of a load-store instruction.
• It swaps the contents of memory with the contents of a register.
• This instruction is an atomic operation.
– it reads and writes a location in the same bus operation, preventing any
other instruction from reading or writing to that location until it
completes.

• Thus to implement an actual swap of contents make Rd = Rm.


Swap and Swap Byte Instructions

1
Rn
temp

2 3
Memory
Rm Rd
Swap and Swap Byte Instructions
• Example
• The swap instruction loads a word from memory into
register r0 and overwrites the memory with register r1.
• PRE mem32[0x9000] = 0x12345678
• r0 = 0x00000000
• r1 = 0x11112222
• r2 = 0x00009000
• SWP r0, r1, [r2]
• POST mem32[0x9000] = 0x11112222
• r0 = 0x12345678
• r1 = 0x11112222
• r2 = 0x00009000
Control Flow Instructions
• This category of instructions neither processes
data nor moves it around; it simply determines
which instructions get executed next.
– Branch instructions
– Conditional branches
– Conditional execution
– Branch and link instructions
– Subroutine return instructions
– Supervisor calls
– Jump tables

204
Branch Instructions
• Change the flow of sequential execution of instructions and
force to modify the program counter.
– Branch : B{<cond>} label
– Branch with Link : BL{<cond>} sub_routine_label
31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

• Branch (B)
– jumps in a range of +/-32 MB.
• Branch with link(BL)
– suitable for subroutine call by storing the address of next
instructions after BL into the link register(lr) and restore the
program counter(pc) from the link register while returning from
subroutine.
Branch Instructions
• Branch Exchange and Branch Exchange Link
for switching the processor state from Thumb
to ARM and vice versa.
• ARM Thumb
• Branch Exchange: BX{<cond>} Rm
• Branch Exchange Link: BLX{<cond>} label/Rm
ARM Branches and Subroutines
• B <label>
– PC relative. ±32 Mbyte range.
• BL <subroutine>
– Stores return address in LR
– Returning implemented by restoring the PC from LR
– For non-leaf functions, LR will have to be stacked

func1 func2

: STMFD :
: sp!,{regs,lr} :
BL func1 : :
: BL func2 :
: : :
LDMFD MOV pc, lr
sp!,{regs,pc}
Branch and Link Instructions
• Perform a branch, save the address following the branch in
the link register, r14
BL SUBR ;branch to SUBR
… ;return here
SUBR … ;subroutine entry point
MOV PC,r14 ;return
• For nested subroutine, push r14 and some work registers
required to be saved onto a stack in memory
BL SUB1

SUB1 STMFD r13!,{r0-r2,r14} ;save work and link regs



MOV PC,r14 ;copy r14 into r15 to return
Branch Instructions
• The most common way to switch program execution from one place
to another is use the branch instruction:
B LABEL

LABEL …
• LABEL comes after or before the branch instruction.
• Example:
B Forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
Forward
SUB r1, r2, #4
Backward
ADD r1, r2, #4
SUB r1, r2, #4
ADD r4, r6, r7
B Backward
Conditional Branches
• The branch has a condition associated with it
and it is only executed if the condition codes
have the correct value – taken or not taken
MOV r0,#0 ;initialize counter
Loop …
ADD r0,r0,#1 ;increment loop counter
CMP r0,#10 ;compare with limit
BNE Loop ;repeat if not equal
… ;else fail through
Conditional Branches
31 28 27 24 23 22 21 20 19 16 15 0

Cond 1 0 0 P U S W L Rn Register list

Condition field Base register Each bit corresponds to a particular


register. For example:
Up/Down bit Load/Store bit • Bit 0 set causes r0 to be transferred.
0 = Down; subtract offset from base 0 = Store to memory • Bit 0 unset causes r0 not to be transferred.
1 = Up ; add offset to base 1 = Load from memory
At least one register must be transferred
Pre/Post indexing bit Write- back bit as the list cannot be empty.
0 = Post; add offset after transfer, 0 = no write-back
1 = Pre ; add offset before transfer 1 = write address into base
PSR and force user bit
0 = don’t load PSR or force user mode
1 = load PSR or force user mode
Example: Block Copy
– Copy a block of memory, which is an exact multiple of 12 words
long from the location pointed to by r12 to the location pointed
to by r13. r14 points to the end of block to be copied.
; r12 points to the start of the source data
; r14 points to the end of the source data
; r13 points to the start of the destination data
loop LDMIA r12!, {r0-r11} ; load 48 bytes
r13
STMIA r13!, {r0-r11} ; and store them
CMP r12, r14 ; check for the end r14 Increasing
BNE loop ; and loop until done Memory

– This loop transfers 48 bytes in 31 cycles r12

– Over 50 Mbytes/sec at 33 MHz


ARM Registers
• ARM has 31 general-purpose 32-bit registers. At any one
time, 16 of these registers are visible.
• The other registers are used to speed up exception
processing. All the register specifiers in ARM instructions
can address any of the 16 visible registers.
• The main bank of 16 registers is used by all unprivileged
code. These are the User mode registers. User mode is
different from all other modes as it is unprivileged, which
means:
– User mode can only switch to another processor mode by
generating an exception. The SWI instruction provides this
facility from program control.
– Memory systems and coprocessors might allow User mode less
access to memory and coprocessor functionality than a
privileged mode.
Registers
• General-purpose registers hold either data or an address.
They are identified with the letter r prefixed to the register
number. For example, register 4 is given the label r4.
• Figure 2.2 shows the active registers available in user
mode—a protected mode normally used when executing
applications. The processor can operate in seven different
modes, which we will introduce shortly. All the registers
shown are 32 bits in size.
• There are up to 18 active registers: 16 data registers and 2
processor status registers. The data registers are visible to
the programmer as r0 to r15.
• The ARM processor has three registers assigned to a
particular task or special function: r13, r14, and r15. They
are frequently given different labels to differentiate them
from the other registers.
• the shaded registers identify the assigned special-purpose
registers:
– Register r13 is traditionally used as the stack pointer (sp) and
stores the head of the stack in the current processor mode.
– Register r14 is called the link register (lr) and is where the core
puts the return address whenever it calls a subroutine.
– Register r15 is the program counter (pc) and contains the
address of the next instruction to be fetched by the processor.
• In addition to the 16 data registers, there are two program
status registers: cpsr and spsr (the current and saved
program status registers, respectively).
Current Program Status Register
• The ARM core uses the cpsr to monitor and control
internal operations.
• The cpsr is a dedicated 32-bit register and resides in
the register file.
• The cpsr is divided into four fields, each 8 bits wide:
flags, status, extension, and control.
• In current designs the extension and status fields are
reserved for future use.
• The control field contains the processor mode, state,
and interrupt mask bits.
• The flags field contains the condition flags.
• The format of the CPSR and the SPSRs is
shown below.

• https://www.slideshare.net/MathivananNatar
ajan/arm-instruction-set-60665439
Processor Modes
• The processor mode determines which registers are active
and the access rights to the cpsr register itself.
• Each processor mode is either privileged or nonprivileged:
• A privileged mode allows full read-write access to the cpsr.
• Conversely, a nonprivileged mode only allows read access
to the control field in the cpsr but still allows read-write
access to the condition flags.
• There are seven processor modes in total: six privileged
modes (abort, fast interrupt request, interrupt request,
supervisor, system, and undefined) and one nonprivileged
mode (user).
• The processor enters abort mode when there is a failed attempt to
access memory.
• Fast interrupt request and interrupt request modes correspond to
the two interrupt levels available on the ARM processor.
• Supervisor mode is the mode that the processor is in after reset and
is generally the mode that an operating system kernel operates in.
• System mode is a special version of user mode that allows full read-
write access to the cpsr.
• Undefined mode is used when the processor encounters an
instruction that is undefined or not supported by the
implementation.
• User mode is used for programs and applications.
Banked Registers
• Figure 2.4 shows all 37 registers in the register file. Of those, 20 registers are
hidden from a program at different times. These registers are called banked
registers and are identified by the shading in the diagram.
• They are available only when the processor is in a particular mode; for example,
abort mode has banked registers r13_abt, r14_abt and spsr_abt.
• Banked registers of a particular mode are denoted by an underline character post-
fixed to the mode mnemonic or _mode.
• Every processor mode except user mode can change mode by writing directly to
the mode bits of the cpsr. All processor modes except system mode have a set of
associated banked registers that are a subset of the main 16 registers.
• The processor mode can be changed by a program that writes directly to the cpsr
(the processor core has to be in privileged mode) or by hardware when the core
responds to an exception or interrupt.
• The following exceptions and interrupts cause a mode change: reset, interrupt
request, fast interrupt request, software interrupt, data abort, prefetch abort, and
undefined instruction. Exceptions and interrupts suspend the normal execution of
sequential instructions and jump to a specific location.
Exception priorities
• When multiple exceptions arise at the same
time, a fixed priority system determines the
order in which they are handled.
• The priority order is listed in Table
Entering an exception
• The ARM7TDMI processor handles an exception as follows:
1. Preserves the address of the next instruction in the appropriate LR.
• When the exception entry is from ARM state, the ARM7TDMI processor
copies the address of the next instruction into the LR, current PC+4 or PC+8
depending on the exception.
• When the exception entry is from Thumb state, the ARM7TDMI processor
writes the value of the PC into the LR, offset by a value, current PC+4 or
PC+8 depending on the exception, that causes the program to resume from
the correct place on return.
• The exception handler does not have to determine the state when entering
an exception. For example, in the case of a SWI, MOVS PC, r14_svc always
returns to the next instruction regardless of whether the SWI was executed
in ARM or Thumb state.
2. Copies the CPSR into the appropriate SPSR.
3. Forces the CPSR mode bits to a value that depends on the exception.
4. Forces the PC to fetch the next instruction from the relevant exception
vector.
• Note
– Exceptions are always entered in ARM state. When the processor is in Thumb
state and an exception occurs, the switch to ARM state takes place
automatically when the exception vector address is loaded into the PC.
Entering an exception
• When an exception occurs, the ARM:
– Preserves the address of the next instruction in the
appropriate LR. When the exception entry is from:
• ARM state, the ARM7TDMI-S copies the address of the next
instruction into the LR (current PC + 4, or PC + 8 depending on
the exception)
• Thumb state, the ARM7TDMI-S writes the value of the PC into
the LR, offset by a value (current PC + 4, or PC + 8 depending
on the exception).
– Copies the CPSR into the appropriate SPSR.
– Forces the CPSR mode bits to a value which depends on
the exception.
– Forces the PC to fetch the next instruction from the
relevant exception vector.
Leaving an exception
• When an exception is completed, the exception
handler must:
1. Move the LR, minus an offset to the PC. The offset
varies according to the type of exception
2. Copy the SPSR back to the CPSR.
3. Clear the interrupt disable flags that were set on
entry.
• Note
– The action of restoring the CPSR from the SPSR
automatically resets the T bit to whatever value it held
immediately prior to the exception.
ARM exception vector locations
Address Exception
0x0000 0000 Reset
0x0000 0004 Undefined Instruction
0x0000 0008 Software Interrupt
0x0000 000C Prefetch Abort (instruction fetch
memory fault)
0x0000 0010 Data Abort (data access memory
fault)
0x0000 0014 Reserved
Note: Identified as reserved in ARM documentation, this
location is used by the Boot Loader as the Valid User
Program key.
0x0000 0018 IRQ
0x0000 001C FIQ
Nested interrupt handling scheme(2)
The ARM instruction set
• Data processing instructions.
• ARM data processing instructions enable the programmer to perform
arithmetic and logical operations on data values in registers.
• the data processing instructions are the only instructions which modify
data values.
• These instructions typically require two operands and produce a single
result, though there are exceptions to both of these rules.
• Here are some rules which apply to ARM data processing instructions:
– All operands are 32 bits wide and come from registers (or) are specified as
literals in the instruction itself.
– The result, if there is one, is 32 bits wide and is placed in a register. (There is
an exception here: long multiply instructions produce a 64-bit result)
– Each of the operand registers and the result register are independently
specified in the instruction. That is, the ARM uses a '3-address' format for
these instructions.
The ARM instruction set
Data processing instructions:
• ADD, ADC : add (w. carry) • AND, ORR, EOR
• SUB, SBC : subtract (w. carry) • BIC : bit clear
• RSB, RSC : reverse subtract • LSL, LSR : logical shift
(w. carry) left/right
• MUL, MLA : multiply (and • ASL, ASR : arithmetic shift
accumulate) left/right
• ROR : rotate right
• RRX : rotate right extended
with C

Instruction Sets-234
• ARM instructions were extended by adding 4 bit in
the top of 32 bit instruction field:
The ARM instruction set Format
The ARM instruction set
• Simple register operands
• A typical ARM data processing instruction is written in
assembly language as shown below:
• Basic format:
ADD r0,r1,r2 ; r0 : = r1 + r2
– Computes r1+r2, stores in r0
• Immediate operand:
ADD r0,r1,#2 ; r0 : = r1 + 2
– Computes r1+2, stores in r0
• The semicolon in this line indicates that everything to the right
of it is a comment and should be ignored by the assembler.
• Comments are put into the assembly source code to make
reading and understanding it easier.
• This example simply takes the values in two registers (r1 and
r2), adds them together, and places the result in a third register
(r0).
ARM comparison instructions
• CMP : compare
• CMN : negated compare
• TST : bit-wise test (AND)
• TEQ : bit-wise negated test (XOR)
• These instructions set only the NZCV bits of
CPSR.

Instruction Sets-238
ARM move instructions
• MOV, MVN : move (negated)

MOV r0, r1 ; sets r0 to r1

Instruction Sets-239

You might also like