ARM
What Is ARM?
• Advanced RISC Machine
• First RISC microprocessor
  for commercial use
• Market-leader for low-power
  and cost-sensitive embedded applications
2
      Why ARM is most popular:
• ARM is the most popular processors, particularly
  used in portable devices due to its low power
  consumption and reasonable performance.
• ARM has got better performance when compared
  to other processors.
• The ARM processor is basically consisting of low
  power consumption and low cost.
• It is very easy to use ARM for quick and efficient
  application developments so that is the main
  reason why ARM is most popular.
      History of ARM Processor
• ARM Processor - 32 bit processor
• RISC (Reduced Instruction Set Computer) concept
  introduced in 1980 at Stanford and Berkley
• ARM was developed by Acron Computer Limited
  of Cambridge, England between 1983 & 1985
• ARM limited founded in 1990
• ARM Cores
  • Licensed to partners to develop and fabricate new
    microcontrollers
  • Soft core
              History of ARM
Historical remarks
• ARM’s parent company is Acorn Computers (UK).
• Acorn Computers started their Acorn RISC Machine
  project in October 1983 (two years after the introduction
  of the IBM PC) to develop an own powerful processor for
  a line of business computers.
• The acronym ARM was coined originally at this time
  (1983) from the designation Acorn RISC Machine.
• In 1990 the company Advanced RISC Machines Ltd. (ARM
  Ltd.) was founded as a joint venture of Acorn Computers,
  Apple Computers and VLSI Technology.
• Accordingly, also the interpretation of ARM was changed
  to “Advanced RISC Machines”.
                 History of ARM
• ARM (ARM Holdings plc) is a British multinational
  semiconductor company with its head office in Cambridge.
• The company designs and licenses low power embedded and
  mobile ARM processors along with the appropriate design tools
  but does not fabricate semiconductors.
• ARM designs dominate recently the embedded and the mobile
  market (including Smartphone and tablets).
• As of 2014 more than 50 billion ARM based processors have
  been produced in total, up from 10 billion in 2008 [59], [19], as
  indicated in the next Figure.
ARM's first office, 18th century barn just
         outside of Cambridge.
ARM's headquarters in Cambridge
            (UK)
   ARM Connected Community – 900+
Connect, Collaborate, Create – accelerating innovation
Development of the ARM Architecture
           v4                 v5                    v6                       v7
  Halfword and        Improved               SIMD Instructions
                                                                      Thumb-2
  signed halfword /   interworking           Multi-processing
  byte support        CLZ                    v6 Memory architecture
                                                                      Architecture Profiles
                      Saturated arithmetic   Unaligned data support
  System mode         DSP MAC instructions                              7-A - Applications
                                             Extensions:                7-R - Real-time
  Thumb               Extensions:              Thumb-2 (6T2)            7-M - Microcontroller
  instruction set       Jazelle (5TEJ)         TrustZone® (6Z)
  (v4T)                                        Multicore (6K)
                                               Thumb only (6-M)
▪ Note that implementations of the same architecture can be different
    ▪ Cortex-A8 - architecture v7-A, with a 13-stage pipeline
    ▪ Cortex-A9 - architecture v7-A, with an 8-stage pipeline
                                     Architecture Revisions
                                                       ARMv7
                                                                                           ARM1156T2F-S
     version
                                                                          ARM1136JF-S
                                                 ARMv6
                                                                                    ARM1176JZF-S
                                             ARM102xE          XScaleTM    ARM1026EJ-S
                      ARMv5
                                                      ARM9x6E        ARM926EJ-S
                                StrongARM®                                             SC200
               ARM7TDMI-S                            ARM92xT
V4
                                     SC100   ARM720T
               1994           1996           1998             2000              2002            2004      2006
                                                                                                             time
                                              XScale is a trademark of Intel Corporation
  Features of Different ARM Versions:
• ARM Version 1:
   –   The ARM version one Architecture:
   –   Software interrupts
   –   26-bit address bus
   –   Data processing is slow
   –   It support byte, word and multiword load operations
• ARM Version 2:
   – 26-Bit address bus
   – Automatic instructions for thread synchronization
   – Co-processor support
• ARM Version 3:
   – 32-Bit addressing
   – Multiple data support (like 32 bit=32*32=64).
   – Faster than ARM version1 and version2
• ARM Version 4:
   – 32-bit address space
   – Its support T variant:16 bit THUMB instruction set
   – It support M variant: long multiply means give a 64 bit result
• ARM Version 5:
   –   Improved ARM THUMB interworking
   –   Its supports CCL instructions
   –   It support E variant : Enhanced DSP Instruction set
   –   It support S variant : Acceleration of Java byte code execution
• ARM Version 6:
   – Improved memory system
   – Its supports a single instruction multiple data
• ARMv7 :
  – ƒThumb-2 - variable length instruction set
  – ƒTrustZone
     • provides system-wide hardware isolation for trusted
       software.
  – ƒJazelle-RCT(Runtime Compilation Target)
     • an extension that allows some ARM processors to
       execute Java byte code in hardware as a third execution
       state alongside the existing ARM and Thumb modes.
  – Jazelle DBX (Direct Bytecode eXecution)
   ARMv7 provides three profiles:
• The Application “A” profile
   – Memory management support (MMU)
   – Highest performance at low power
   – Influenced by multi-tasking OS system requirements
• The Real-time “R” profile
   – Protected memory (MPU)
   – Low latency and predictability ‘real-time’ needs
   – Evolutionary path for traditional embedded business
• The Microcontroller “M” profile
   – Lowest gate count entry point
   – Deterministic behavior a key priority
   – Deeply embedded – strong synergies with the “R” profile
• ARMv8
  – It adds a 64-bit architecture, named "AArch64", and a new
    "A64" instruction set
  – Compatibility with ARMv7-A ISA
  – 64-bit general purpose registers, SP (stack pointer) and PC
    (program counter)
  – The execution states support three key instruction sets:
     • A32 (or ARM): a 32-bit fixed length instruction set. Part of the 32-
       bit architecture execution environment now referred to as
       AArch32.
     • T32 (Thumb) introduced as a 16-bit fixed-length instruction set,
       subsequently enhanced to a mixed-length 16- and 32-bit
       instruction set on the introduction of Thumb-2 technology.
     • A64 is a 64-bit fixed-length instruction set that offers similar
       functionality to the ARM and Thumb instruction sets. Introduced
       with ARMv8-A, it is the AArch64 instruction set.
ARMv7: profiles & key features
        ARM Processor Family
• ARM has devised a naming convention for its
  processors
• Revisions: ARMv1, v2 … v6, v7, v8
• Core implementation:
  – – ARM1, ARM2, ARM7, StrongARM,
  – ARM926EJ, ARM11, Cortex-A,R,M
• ARM11 is based on ARMv6
• Cortex is based on ARMv7
      ARM Processor Family (2)
• Differences between cores
  – Processor modes
  – Pipeline
  – Architecture
  – Memory protection unit
  – Memory management unit
  – Cache
  – Hardware accelerated Java
  – … and others
      ARM Processor Family (3)
• Examples:
  – ARM7TDMI
     • No MMU, No MPU, No cache, No Java, Thumb mode
  – ARM922T
     • MMU, No MPU, 8K+8K data and instruction cache, No
       Java, Thumb mode
  – ARM1136J-S
     • MMU, No MPU, configurable caches, with accelerated
       Java and Thumb mode
          ARM Processor Family (4)
• Naming convention
• ARM [x][y][z][T][D][M][I][E][J][F][S]
   –   x – Family
   –   y – memory management/protection
   –   z – cache
   –   T – Thumb mode
   –   D – JTAG debugging
   –   M – fast multiplier
   –   I – Embedded ICE macrocell
   –   E – Enhanced instruction (implies TDMI)
   –   J – Jazelle, hardware accelerated Java
   –   F – Floating point unit
   –   S – Synthesizable version
      ARM Core Extensions-(1)
• Hardware extensions are standard
  components placed next to the ARM core.
• Improve performance, manage resources, and
  provide extra functionality and are designed
  to provide flexibility in handling particular
  applications.
     What are ARM extensions
• Cache and TCM
• Memory management ( MPU & MMU) - prevents
  apps from in-appropriate access to hardware
• Coprocessor interface
          ARM Core Extensions-(2)
• co-processor:
• Coprocessors can be attached to the ARM processor.
• Extends the processing features of a core by extending the
  instruction set or by providing configuration reg-isters.
• More than one coprocessor can be added to the ARM core
  via the coprocessor interface.
• The coprocessor can be accessed through a group of
  dedicated ARM instructions that provide a load-store type
  interface. Consider, for example, coprocessor 15 (cp15):
   – The ARM processor uses coprocessor 15(cp15) registers to control
     the cache, TCMs, and memory management.
        ARM Core Extensions-(3)
• Thumb:
• Thumb is a subset of the ARM instruction set encoded
  in 16-bit wide instructions.
   – Requires 70% of the space of ARM code.
   – Uses 40% more instructions than equivalent ARM code.
• A CPU has Thumb support if it has a T in its name, or it
  is architecture v6 or later.
   – With 32-bit memory:
      • ARM code is 40% faster than Thumb code.
   – With 16-bit memory:
      • Thumb code is 45% faster than ARM code.
• Uses 30% less external memory power than ARM code.
       ARM Core Extensions-(4)
• Thumb continued…
• Thumb is not a complete architecture: you can’t
  have a Thumb-only CPU.
• Some of the limitations of Thumb mode include:
  – Conditional execution only exists for branch
    instructions.
  – Data processing operations use a two-address format,
    as opposed to ARM’s three-address format.
  – Its instruction encodings are less regular than ARM’s.
• Thumb uses the same register set as ARM — but
  only R0-R7
          ARM Core Extensions-(5)
• Thumb-2:
• Thumb-2 is an enhancement to the 16-bit Thumb Instruction Set
  Architecture (ISA).
• It adds 32-bit instructions that can be freely intermixed with 16-bit
  instructions in a program. The additional 32-bit instructions enable
  Thumb-2 to cover the functionality of the ARM instruction set.
• The 32-bit instructions enable Thumb-2 to deliver the code density
  of earlier versions of Thumb, together with performance of the
  existing ARM instruction set, all within a single instruction set.
• It’s present in the Cortex CPU series (or any v7 or later versions).
• Now a complete architecture: you can have a Thumb-2-only CPU
  (v7M).
• Mixed 16/32-bit instruction stream provides the economy of space
  of Thumb combined with most of the speed of pure ARM code.
            ARM Core Extensions-(6)
• Thumb-2 continued…
• The most important difference between the Thumb instruction set and
  the ARM instruction set is that most 32-bit Thumb instructions are
  unconditional, whereas most ARM instructions can be conditional.
• The main enhancements are:
• 32-bit instructions added to the Thumb instruction set to:
    –   provide support for exception handling in Thumb state
    –   provide access to coprocessors
    –   include Digital Signal Processing (DSP) and media instructions
    –   improve performance in cases where a single 16-bit instruction restricts
        functions available to the compiler.
• addition of a 16-bit IT instruction that enables one to four following
  Thumb instructions, the IT block, to be conditional
• addition of a 16-bit Compare with Zero and Branch (CZB) instruction to
  improve code density by replacing two-instruction sequence with a single
  instruction.
       ARM Core Extensions-(7)
• Jazelle Extension
• Jazelle is an execution mode in ARM architecture
  which "provides architectural support for
  hardware acceleration of bytecode execution by a
  Java Virtual Machine (JVM)" .
• Increasing demand from ARM customers for
  better Java performance.
• ARM provided its own solution in executing Java
  in hardware..
  – Integrate Java execution into the core!
  – Birth of Jazelle!
         ARM Core Extensions-(8)
• Jazelle Extension continued…
• ARM Jazelle technology provides an extension to the world’s
  leading 32-bit embedded RISC architecture, enabling ARM
  processors to execute Java byte code directly in hardware and
  delivering unparalleled Java performance on the ARM architecture.
• Platform developers now have the freedom to run Java applications
  alongside established OS, middleware and application code — all on
  a single processor.
• Jazelle DBX (Direct Bytecode eXecution) is an extension that allows
  some ARM processors to execute Java bytecode in hardware as a
  third execution state alongside the existing ARM and Thumb
  modes.
   – Jazelle functionality was specified in the ARMv5TEJ architecture[2] and
     the first processor with Jazelle technology was the ARM926EJ-S.
• Jazelle RCT (Runtime Compilation Target) is a different technology
  and is based on ThumbEE mode and supports ahead-of-time (AOT)
  and just-in-time (JIT) compilation with Java and other execution
  environments
          ARM Core Extensions-(9)
• Vector Floating Point(VFP) Extension
• The ARM® architecture provides high-performance and high-
  efficiency hardware support for floating-point operations in half-,
  single-, and double-precision arithmetic.
• Many operations can take place in either scalar form or in vector
  form.
• It is fully IEEE-754 compliant with full software library support.
• The floating-point data type is essential for a wide range of digital
  signal processing (DSP) applications.
• Scalable Vector Extension (SVE) for ARMv8-A
    – SVE is the next-generation SIMD instruction set for AArch64 that
      introduces the architectural features for High Performance Computing
      (HPC)
       ARM Core Extensions-(10)
• NEON (SIMD) Extension
• The implementation of the Advanced SIMD extension used
  in ARM processors is called NEON.
• The NEON technology is a packed SIMD architecture. NEON
  registers are considered as vectors of elements of the same
  data type. Multiple data types are supported by the
  technology.
• NEON technology is intended to improve the multimedia
  user experience by accelerating audio and video
  encoding/decoding, user interface, 2D/3D graphics or
  gaming.
• NEON can also accelerate signal processing algorithms and
  functions to speed up applications such as audio and video
  processing, voice and facial recognition, computer vision
  and deep learning.
                            ARM Chips
•   ARM Ltd
     – Provides ARM cores
     – Intellectual property
•   Analog Devices
     – ADuC7019, ADuC7020, ADuC7021, ADuC7022, ADuC7024, ADuC7025, ADuC7026,
        ADuC7027, ADuC7128, ADuC7129
•   Atmel
     – AT91C140, AT91F40416, AT91F40816, AT91FR40162, SAM3N4A, SAMR21E18A
•   Freescale
     – MAC7101, MAC7104, MAC7105, MAC7106, MAC7125,MAC7144
•   Samsung
     – S3C44B0X, S3C4510B
•   Sharp
     – LH75400, LH75401, LH75410, LH75411
•   Texas Instruments
     – TMS470R1A128, TMS470R1A256, TMS470R1A288
•   And others…
            Recommended Text
• “ARM System Developer’s Guide”
  – Andrew Sloss, et. al.
  – ISBN 1-55860-874-5
• “ARM Architecture Reference Manual”
  – David Seal
  – ISBN 0-201-737191
  – Softcopy available at www.arm.com
• “ARM system-on-chip architecture”
  – Steve Fuber
  – ISBN 0-201-67519-6
          ARM Design Philosophy
• ARM core uses RISC architecture
  – Reduced instruction set
  – Load store architecture
  – Large number of general purpose registers
  – Parallel executions with pipelines
• But some differences from RISC
  – Enhanced instructions for
     •   Thumb mode
     •   DSP instructions
     •   Conditional execution instruction
     •   32 bit barrel shifter
                          What is RISC?
• RISC?
  RISC, or Reduced Instruction Set Computer. is a type of
  microprocessor architecture that utilizes a small, highly-optimized set
  of instructions, rather than a more specialized set of instructions
  often found in other types of architectures.
• History
  The first RISC projects came from IBM, Stanford, and UC-Berkeley in
  the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley
  RISC 1 and 2 were all designed with a similar philosophy which has
  become known as RISC. Certain design features have been
  characteristic of most RISC processors:
    – one cycle execution time: RISC processors have a CPI (clock per instruction) of
      one cycle. This is due to the optimization of each instruction on the CPU and a
      technique called PIPELINING
    – pipelining: a technique that allows for simultaneous execution of parts, or stages,
      of instructions to more efficiently process instructions;
    – large number of registers: the RISC design philosophy generally incorporates a
      larger number of registers to prevent in large amounts of interactions with
      memory
                    RISC Attributes
The main characteristics of CISC microprocessors are:
• Extensive instructions.
• Complex and efficient machine instructions.
• Micro encoding of the machine instructions.
• Extensive addressing capabilities for memory operations.
• Relatively few registers.
In comparison, RISC processors are more or less the opposite of the
    above:
• Reduced instruction set.
• Less complex, simple instructions.
• Hardwired control unit and machine instructions.
• Few addressing schemes for memory operands with only two basic
    instructions, LOAD and STORE
• Many symmetric registers which are organized into a register file.
    A difference between RISC and CICS
                RISC                                CISC
•   Reduced Instruction Set         •   Complex Instruction Set
    Computer                            Computer
•    It contains lesser number of   •   It contains greater number
    instructions.                       of instructions.
•    Instruction pipelining and     •   Instruction pipelining
    increased execution speed.          feature does not exist.
•   Orthogonal instruction          •   Non-orthogonal set(all
    set(allows each instruction         instructions are not allowed
    to operate on any register          to operate on any register
    and use any addressing              and use any addressing
    mode.                               mode.
  A difference between RISC and CICS
               RISC                                  CISC
• Operations are performed on         • Operations are performed either
  registers only, only memory           on registers or memory
  operations are load and store.        depending on instruction.
• A larger number of registers are    • The number of general purpose
  available.                            registers are very limited.
• Programmer needs to write more      • Instructions are like macros in C
  code to execute a task since          language.
  instructions are simpler ones.      • It is variable length instruction.
• It is single, fixed length          • More silicon usage since more
  instruction.                          additional decoder logic is
• Less silicon usage and pin count.     required to implement the
• With Harvard Architecture.            complex instruction decoding.
                                      • Can be Harvard or Von-Neumann
                                        Architecture.
       RISC Design Principles(1)
• Simple operations
  – Simple instructions that can execute in one cycle
• Register-to-register operations
  – Only load and store operations access memory
  – Rest of the operations on a register-to-register
    basis
• Simple addressing modes
  – A few addressing modes (1 or 2)
       RISC Design Principles(2)
• Large number of registers
  – Needed to support register-to-register operations
  – Minimize the procedure call and return overhead
• Fixed-length instructions
  – Facilitates efficient instruction execution
• Simple instruction format
  – Fixed boundaries for various fields
    ARM Processor Architecture
• the ARM architecture has evolved to include
  architectural features to meet the growing
  demand for new functionality, integrated security
  features, high performance and the needs of new
  and emerging markets.
• There are currently 3 ARMv8 profiles,
  – the ARMv8-A architecture profile for high
    performance markets such as mobile and enterprise,
  – the ARMv8-R architecture profile for embedded
    applications in automotive and industrial control,
  – the ARMv8-M architecture profile for embedded and
    IoT applications.
Difference between Harvard and Von-
        neumann Achitectures
Difference between Harvard and Von-
        neumann Achitectures
           ARM processor features
•   Load/store architecture.
•   An orthogonal instruction set.
•   Mostly single-cycle execution.
•   Enhanced power-saving design.
•   64 and 32-bit execution states for scalable high performance.
•   32-bit RISC-processor core (32-bit instructions)
•   37 pieces of 32-bit integer registers (16 available)
•   Pipelined (ARM7: 3 stages)
•   Von Neuman-type bus structure (ARM7), Harvard (ARM9)
•   8 / 16 / 32 -bit data types
•   7 modes of operation (usr, fiq, irq, svc, abt, sys, und)
•   Simple structure -> reasonably good speed / power
    consumption ratio
                      ARM7TDMI
• ARM7TDMI is a core processor module embedded in many
  ARM7 microprocessors.
• It is the most complex processor core module in ARM7
  series.
   – T: capable of executing Thumb instruction set
   – D: Featuring with IEEE Std. 1149.1 JTAG boundary-scan
     debugging interface.
   – M: Featuring with a Multiplier-And-Accumulate (MAC) unit for
     DSP applications.
   – I: Featuring with the support of embedded In-Circuit Emulator.
• Three pipeline Stages: Instruction fetch, decode, and
  Execution.
                       Features
• A 32-bit RSIC processor core capable of executing
  16-bit instructions (Von Neumann Architecture)
  – High density code
     • The Thumb sets 16-bit instruction length allows it to
       approach about 65% of standard ARM code size while
       retaining ARM 32-bit processor performance.
  – Smaller die size
     • About 72,000 transistors
     • Occupying only about 4.8mm2 in a 0.6um semiconductor
       technology.
  – Lower power consumption
     • dissipate about 2mW/MHZ with 0.6um technology.
                      Features (2)
• Memory Access
  – Data can be
     • 8-bit (bytes)
     • 16-bit (half words)
     • 32-bit (words)
• Memory Interface
  – Can interface to SRAM, ROM, DRAM
  – Has four basic types of memory cycle
     •   idle cycle
     •   Non sequential cycle
     •   sequential cycle
     •   coprocessor register cycle
            Debug Extensions
• The Debug extensions to the core add scan chains
  to monitor what is occurring on the data path of
  the CPU.
• Signals were also added to the core so that
  processor control can be handed to the debugger
  when a breakpoint or watch point has been
  reached.
• This stops the processor enabling the user to
  view such characteristics as register contents,
  memory regions, and processor status.
              Embedded ICE Logic
• In order to provide a powerful debugging environment for ARM-
  based applications the EmbeddedICE logic was developed and
  integrated into the ARM core architecture.
• It is a set of registers providing the ability to set hardware
  breakpoints or watchpoints on code or data.
• The EmbeddedICE logic monitors the ARM core signals every cycle
  to check if a breakpoint or watchpoint has been hit. Lastly, an
  additional scan chain is used to establish contact between the user
  and the EmbeddedICE logic.
• Communication with the EmbeddedICE logic from the external
  world is provided via the test access port, or TAP, controller and a
  standard IEEE 1149.1 JTAG connection.
• The advantage of on-chip debug solutions is the ability to rapidly
  debug software, especially when the software resides in ROM.
                     synthesizable
• synthesizable (ie. distributed as RTL rather than a hardened layout)
• ARM7TDMI (without the "-S" extension) was initially designed as a
  hard macro, meaning that the physical design at the transistor
  layout level was done by ARM, and licensees took this fixed physical
  block and placed it into their chip designs. This was the prevalent
  design methodology at the time.
• Subsequently, demand increased for a more flexible and
  configurable solution, so ARM moved towards delivering processor
  designs as a behavioral description at the "register transfer level"
  (RTL) written in a hardware description language (HDL), typically
  Verilog HDL.
• The process of converting this behavioral description into a physical
  network of logic gates is called "synthesis", and several major EDA
  companies sell automated synthesis tools for this purpose.
• A processor design distributed to licensees as an RTL description
  (such as ARM7TDMI-S) is therefore described as "synthesizable".
             Instruction Pipeline
• The ARM processor uses a internal pipeline to increase
  the rate of instruction flow to the processor, allowing
  several operations to be undertaken simultaneously,
  rather than serially.
• Pipelining is breaking down execution into multiple
  steps, and executing each step in parallel.
• In most ARM processors, the instruction pipeline
  consists of 3 stages.
• Basic 3 stage pipeline
   – Fetch – Load from memory
   – Decode – Identify instruction to execute
   – Execute – Process instruction and write back result
           Instruction Pipeline
• ARM7 has a 3 stage pipeline
  – Fetch, Decode, Execute
• ARM9 has a 5 stage pipeline
  – Fetch, Decode, Execute, Memory, Write
• ARM10 has a 6 stage pipeline
  – Fetch, Issue, Decode, Execute, Memory, Write
ARM10 vs. ARM11 Pipelines
Instruction Pipeline
ARM7TDMI Processor Block Diagram
ARM7TDMI Processor Functional Diagram
                 32x8 Multiplier
• Earlier ARM processors (prior to ARM7TDMI) used a
  smaller, simpler multiplier block which required more
  clock cycles to complete a multiplication.
• Introduction of this more complex 32x8 multiplier
  reduced the number of cycles required for a
  multiplication of two registers (32-bit * 32-bit) to a few
  cycles (data dependent).
• Modern ARM processors are generally capable of
  calculating at least a 32-bit product in a single cycle,
  although some of the smallest Cortex-M processors
  provide an implementation choice of a faster (single-
  cycle) or a smaller (32 cycle) 32-bit multiplier block.
           The ARM's Barrel Shifter
• The ARM arithmetic logic unit has a 32-bit barrel shifter that is capable of
  shift and rotate operations. The second operand to many ARM and Thumb
  data-processing and single register data-transfer instructions can be
  shifted, before the data-processing or data-transfer is executed, as part of
  the instruction.
• This can be used by various classes of ARM instructions to perform
  comparatively complex operations in a single instruction.
• The barrel shifter can perform the following types of operation:
• LSL -          shift left by n bits
• LSR -          logical shift right by n bits
• ASR -          arithmetic shift right by n bits (the bits fed |into the top end
                 of the operand are copies of the |original top (or sign) bit
• ROR -          rotate right by n bits
• RRX -          rotate right extended by 1 bit. This is a 33 bit |rotate, where
                 the 33rd bit is the PSR C flag.
• The barrel shifter is a functional unit which
  can be used in a number of different
  circumstances.
• It provides five types of shifts and rotates
  which can be applied to Operand2.
• LSL – Logical Shift Left
  – Example: Logical Shift Left by 4.
• LSR – Logical Shift Right
  – Example: Logical Shift Right by 4.
• ASR – Arithmetic Shift Right
  – Example: Arithmetic Shift Right by 4, positive
    value.
  – Example: Arithmetic Shift Right by 4, negative
    value
• ROR – Rotate Right
   – Example: Rotate Right by 4.
• Examples
   –   MOV r0, r0, LSL #1     -Multiply R0 by two.
   –   MOV r1, r1, LSR #2     -Divide R1 by four (unsigned).
   –   MOV r2, r2, ASR #2     -Divide R2 by four (signed).
   –   MOV r3, r3, ROR #16    -Swap the top and bottom halves
                               of R3.
   – ADD r4, r4, r4, LSL #4   -Multiply R4 by 17. (N = N + N * 16)
   – RSB r5, r5, r5, LSL #5   -Multiply R5 by 31. (N = N * 32 - N
  The ARM Processor Families (I)
• The ARM7 Family
  – 32-bit RISC Processor.
  – Support three-stage pipeline
  – Uses Von Neumann Architecture.
• Widely used in many applications such as
  palmtop computers, portable instruments,
  smart card.
• Characteristics of ARM7 family
  The ARM Processor Families (II)
• The ARM9 Family
• 32-bit RISC Processor with ARM and Thumb
  instruction sets
• Supports five-stage pipeline.
• Uses Harvard architecture
• Widely used in mobile phones, PDAs,digital
  cameras, automotive
• systems, industrial control systems.
• Characteristics of ARM9 Thumb Family
• Characteristics of ARM9E Family
 The ARM Processor Families (III)
• The ARM10 Family
• 32-bit RISC processor with ARM, Thumb and
  DSP instruction sets.
• Supports six-stage Pipelines.
• Uses Harvard Architecture
• Widely used in videophone, PDAs, set-top
  boxes, game console, digital video
  cameras,automotive and industrial control
  systems
• Characteristics of ARM10 family
 The ARM Processor Families (IV)
• The ARM11 Family
• 32-bit RISC processor with ARM, Thumb and DSP
  instruction sets.
• Uses Harvard Architecture.
• Supports eight-stage Pipelines except
  ARM1156T2 uses nine-stage pipeline.
• Widely used in automotive and industrial control
  systems, 3D graphics, security critical
  applications.
• Characteristics of ARM11 family
              what is AMBA?
• “The ARM AMBA (Advanced Microcontroller
  Bus Architecture) protocol is an open
  standard, on-chip interconnect specification
  for the connection and management of
  functional blocks in a System-on-Chip (SoC). It
  facilitates right-first-time development of
  multi-processor designs with large numbers of
  controllers and peripherals. AMBA promotes
  design re-use by defining common interface
  standards for SoC modules.”
                            AMBA
• AMBA: Advanced Microcontroller Bus Architecture
   – It is a specification for an on-chip bus, to enable
     macrocells (such as a CPU, DSP, Peripherals, and memory
     controllers) to be connected together to form a
     microcontroller or complex peripheral chip.
   – It defines
      • A high-speed, high-bandwidth bus, the Advanced High
        Performance Bus (AHB).
      • A simple, low-power peripheral bus, the Advanced Peripheral Bus
        (APB).
      • Access for an external tester to permit modular testing and fast
        test of cache RAM
      • Essential house keeping operations (reset/power-up, …)
    AMBA protocol specifications
• The AMBA specification defines an on-chip
  communications standard for designing high-performance
  embedded microcontrollers. It is supported by ARM Limited
  with wide cross-industry participation.
   – The AMBA 5 specification defines the following
     buses/interfaces:
      • Advanced High-performance Bus (AHB5, AHB-Lite)
      • CHI Coherent Hub Interface (CHI)
   – The AMBA 4 specification defines following buses/interfaces:
      • AXI Coherency Extensions (ACE) - widely used on the latest ARM
        Cortex-A processors including Cortex-A7 and Cortex-A15
      • AXI Coherency Extensions Lite (ACE-Lite)
      • Advanced Extensible Interface 4 (AXI4)
      • Advanced Extensible Interface 4 Lite (AXI4-Lite)
      • Advanced Extensible Interface 4 Stream (AXI4-Stream v1.0)
      • Advanced Trace Bus (ATB v1.1)
      • Advanced Peripheral Bus (APB4 v2.0)
    AMBA protocol specifications
• AMBA 3 specification defines four buses/interfaces:
   – Advanced Extensible Interface (AXI3 or AXI v1.0) - widely used
     on ARM Cortex-A processors including Cortex-A9
   – Advanced High-performance Bus Lite (AHB-Lite v1.0)
   – Advanced Peripheral Bus (APB3 v1.0)
   – Advanced Trace Bus (ATB v1.0)
• AMBA 2 specification defines three buses/interfaces:
   – Advanced High-performance Bus (AHB) - widely used on ARM7,
     ARM9 and ARM Cortex-M based designs
   – Advanced System Bus (ASB)
   – Advanced Peripheral Bus (APB2 or APB)
• AMBA specification (First version) defines two
  buses/interfaces:
   – Advanced System Bus (ASB)
   – Advanced Peripheral Bus (APB)
    ARM7 Processor Architecture
• Features (LPC2148)
   – 16/32-bit ARM7TDMI-S microcontroller in a tiny LQFP64
     package.
   – 8 to 40 kB of on-chip static RAM and 32 to 512 kB of on-chip
     flash program memory. 128 bit wide interface/accelerator
     enables high speed 60 MHz operation.
   – In-System/In-Application Programming (ISP/IAP) via on-chip
     boot-loader software. Single flash sector or full chip erase in 400
     ms and programming of 256 bytes in 1 ms.
   – Embedded ICE RT and Embedded Trace interfaces offer real-
     time debugging with the on-chip Real Monitor software and
     high speed tracing of instruction execution.
   – USB 2.0 Full Speed compliant Device Controller with 2 kB of
     endpoint RAM. In addition, the LPC2146/8 provide 8 kB of on-
     chip RAM accessible to USB by DMA.
 ARM7 Processor Architecture(2)
• Features (LPC2148)
  – One or two 10-bit A/D converters provide a total of 6/14 analog
    inputs, with conversion times as low as 2.44 µs per channel.
  – Single 10-bit D/A converter provides variable analog output.
  – Two 32-bit timers/external event counters (with four capture and four
    compare channels each), PWM unit (six outputs) and watchdog.
  – Low power real-time clock with independent power and dedicated 32
    kHz clock input.
  – Multiple serial interfaces including two UARTs, two Fast I2C-bus (400
    kbit/s), SPI and SSP with buffering and variable data length
    capabilities.
  – Vectored interrupt controller with configurable priorities and vector
    addresses.
  – Up to 45 of 5 V tolerant fast general purpose I/O pins in a tiny LQFP64
    package.
 ARM7 Processor Architecture(3)
• Features (LPC2148)
  – Up to nine edge or level sensitive external interrupt pins
    available.
  – 60 MHz maximum CPU clock available from programmable on-
    chip PLL with settling time of 100 µs.
  – On-chip integrated oscillator operates with an external crystal in
    range from 1 MHz to 30 MHz and with an external oscillator up
    to 50 MHz.
  – Power saving modes include Idle and Power-down.
  – Individual enable/disable of peripheral functions as well as
    peripheral clock scaling for additional power optimization.
  – Processor wake-up from Power-down mode via external
    interrupt, USB, Brown-Out Detect (BOD) or Real-Time Clock
    (RTC).
  – Single power supply chip with Power-On Reset (POR) and BOD
    circuits: – CPU operating voltage range of 3.0 V to 3.6 V (3.3 V ±
    10 %) with 5 V tolerant I/O pads.
LPC2148 Pin Configuration
NXP LPC214X - IC
               ARM Registers
• ARM has a load store architecture
• General purpose registers can hold data or
  address
• Total of 37 registers each 32 bit wide
• There are 18 active registers
  – 16 data registers
  – 2 status registers
           ARM Registers (2)
• Registers R0 - R12 are general purpose
  registers
• R13 is used as stack pointer (SP)
• R14 is used as link register (LR)
• R15 is used a program counter (PC)
• CPSR – Current program status register
• SPSR – Stored program status register
                ARM Registers (3)
• Three of the 16 visible registers have special roles:
   – Stack pointer : Software normally uses R13 as a Stack Pointer
     (SP). R13 is used by the PUSH and POP instructions in T variants.
   – Link register :Register 14 is the Link Register (LR). This register
     holds the address of the next instruction after a Branch and Link
     (BL or BLX) instruction, which is the instruction used to make a
     subroutine call. It is also used for return address information on
     entry to exception modes. At all other times, R14 can be used as
     a general-purpose register.
   – Program counter :Register 15 is the Program Counter (PC). It
     can be used in most instructions as a pointer to the instruction
     which is two instructions after the instruction being executed. In
     ARM state, all ARM instructions are four bytes long (one 32-bit
     word) and are always aligned on a word boundary. The PC can
     be halfword (16-bit) and byte aligned respectively in these
     states.
               ARM Registers (4)
• Program status register
  – The current operating processor status is in the
    Current Program Status Register (CPSR).
  – CPSR is used to control and store CPU states
  – CPSR is divided in four 8 bit fields
     •   Flags
     •   Status
     •   Extension
     •   Control
Current Program status register(CPSR)
Current Program status register
                 Program Status Registers
      31        28 27     24   23          19     16 15            10   9   8   7   6   5   4          0
     N Z C V       Q [de] J                GE[3:0]    IT[abc]           E A I F T               mode
      f                        s                     x                          c
•   Condition code flags                             • T bit
      –    N = Negative result from ALU                   – T = 0: Processor in ARM state
      –    Z = Zero result from ALU                       – T = 1: Processor in Thumb state
      –    C = ALU operation Carried out             • J bit
      –    V = ALU operation oVerflowed                    – J = 1: Processor in Jazelle state
                                                     • Mode bits
• Sticky Overflow flag - Q flag                           – Specify the processor mode
      – Indicates if saturation has occurred         • Interrupt Disable bits
                                                           – I = 1: Disables IRQ
• SIMD Condition code bits – GE[3:0]                       – F = 1: Disables FIQ
      – Used by some SIMD instructions               • E bit
                                                          – E = 0: Data load/store is little endian
• IF THEN status bits – IT[abcde]                         – E = 1: Data load/store is bigendian
      – Controls conditional execution of Thumb      • A bit
        instructions                                      – A = 1: Disable imprecise data aborts
  Current Program status register
• The Current Program Status Register (CPSR) is
  accessible in all processor modes.
• Each exception mode also has a Saved
  Program Status Register (SPSR), that is used to
  preserve the value of the CPSR when the
  associated exception occurs.
  Save Program status register(SPSR)
• Each privileged mode (except system mode)
  has associated with it a Saved Program Status
  Registers(SPSR ).
• This SPSR is used to save the state of CPSR
  (Current Program Status Register) when the
  privileged mode is entered in order that the
  user state can be fully restored when the user
  process is resumed
     Data Sizes and Instruction Sets
•   ARM is a 32-bit load / store RISC architecture
     – The only memory accesses allowed are loads and stores
     – Most internal registers are 32 bits wide
     – Most instructions execute in a single cycle
•   When used in relation to ARM cores
     – Halfword means 16 bits (two bytes)
     – Word means 32 bits (four bytes)
     – Doubleword means 64 bits (eight bytes)
•   ARM cores implement two basic instruction sets
     – ARM instruction set – instructions are all 32 bits long
     – Thumb instruction set – instructions are a mix of 16 and 32 bits
          • Thumb-2 technology added many extra 32- and 16-bit instructions to the original
            16-bit Thumb instruction set
•   Depending on the core, may also implement other instruction sets
     –   VFP instruction set – 32 bit (vector) floating point instructions
     –   NEON instruction set – 32 bit SIMD instructions
     –   Jazelle-DBX - provides acceleration for Java VMs (with additional software support)
     –   Jazelle-RCT - provides support for interpreted languages
                                     Processor Modes
• ARM has seven basic operating modes
                  – Each mode has access to its own stack space and a different subset of registers
                  – Some operations can only be carried out in a privileged mode
                        Mode            Description
                        Supervisor      Entered on reset and when a Supervisor call
                        (SVC)           instruction (SVC) is executed
                                        Entered when a high priority (fast) interrupt is
Exception modes
                        FIQ
                                        raised
                        IRQ             Entered when a normal priority interrupt is raised
                                                                                             Privileged
                                                                                             modes
                        Abort           Used to handle memory access violations
                        Undef           Used to handle undefined instructions
                                        Privileged mode using the same registers as User
                        System
                                        mode
                                        Mode under which most Applications / OS tasks        Unprivileged
                        User
                                        run                                                  mode
              Processor Modes
• Processor modes determine
  – Which registers are active, and
  – Access rights to CPSR register itself
• Each processor mode is either,
  – Privileged: Full read-write access to the CPSR
  – Non-Privileged: Only read access to the control field of
    CPSR but read-write access to the condition flags
• ARM has seven modes
  – Privileged: Abort, Fast interrupt request, Interrupt
    request, Supervisor, System and Undefined
  – Non-Privileged: User (Programs and applications)
      The ARM Register Set-Currently
        visible in particular mode
User mode          IRQ            FIQ            Undef           Abort            SVC
     r0                                                    •   User level
     r1        ARM has 37 registers, all 32-bits long
                                                                 – 15 GPRs, PC, CPSR
     r2
               A subset of these registers is accessible in         (current program status
     r3
               each mode                                            register)
     r4
     r5        Note: System mode uses the User mode •          Remaining registers are used
     r6        register set.                                   for system-level
     r7                                                        programming and for
     r8                             r8                         handling exceptions
     r9                             r9
    r10                            r10
    r11                            r11
    r12                            r12
  r13 (sp)        r13 (sp)       r13 (sp)         r13 (sp)        r13 (sp)       r13 (sp)
  r14 (lr)        r14 (lr)       r14 (lr)         r14 (lr)        r14 (lr)       r14 (lr)
  r15 (pc)
    cpsr
                    spsr            spsr            spsr            spsr            spsr
Current mode                            Banked out registers
            Program Counter (r15)
• When the processor is executing in ARM state:
   – All instructions are 32 bits wide
   – All instructions must be word aligned
   – Therefore the pc value is stored in bits [31:2] with bits [1:0]
     undefined (as instruction cannot be halfword or byte aligned)
• When the processor is executing in Thumb state:
   – All instructions are 16 bits wide
   – All instructions must be halfword aligned
   – Therefore the pc value is stored in bits [31:1] with bit [0]
     undefined (as instruction cannot be byte aligned)
• When the processor is executing in Jazelle state:
   – All instructions are 8 bits wide
   – Processor performs a word access to read 4 instructions at once
                 Exceptions
• Exceptions are generated by internal and external
  sources to cause the processor to handle an
  event, such as an externally generated interrupt
  or an attempt to execute an Undefined
  instruction.
• The processor state just before handling the
  exception is normally preserved so that the
  original program can be resumed when the
  exception routine has completed.
• More than one exception can arise at the same
  time.
          Exception handling
• Exception:
  – Any condition that needs to halt normal
    sequential execution of instructions
    • ARM core is reset
    • Instruction fetch or memory access fails
    • Undefined instruction is encountered
    • Software interrupt instruction is executed
    • External interrupt has been raised
• The ARM architecture supports seven types of
  exception.
• When an exception occurs, execution is forced
  from a fixed memory address corresponding
  to the type of exception. These fixed
  addresses are called the exception vectors.
         ARM Exception Types
• The ARM recognises seven different types of
  exceptions.
  – Reset
  – Undefined instruction
  – Software Interrupt (SWI)
  – Prefetch Abort
  – Data Abort
  – IRQ
  – FIQ
   ARM Exceptions Types (Cont.)
• Reset
  – Occurs when the processor reset pin is asserted
     • For signalling Power-up
     • For resetting as if the processor has just powered up
  – Software reset
     • Can be done by branching to the reset vector (0x0000)
• Undefined instruction
  – Occurs when the processor or coprocessors
    cannot recognize the currently execution
    instruction
    ARM Exceptions Types (Cont.)
• Software Interrupt (SWI)
  – User-defined interrupt instruction
  – Allow a program running in User mode to request
    privileged operations that are in Supervisor mode
     • For example, RTOS functions
• Prefetch Abort
  – Fetch an instruction from an illegal address, the
    instruction is flagged as invalid
  – However, instructions already in the pipeline continue
    to execute until the invalid instruction is reached and
    then a Prefetch Abort is generated.
   ARM Exceptions Types (Cont.)
• Data Abort
  – A data transfer instruction attempts to load or store
    data at an illegal address
• IRQ
  – The processor external interrupt request pin is
    asserted (LOW) and the I bit in the CPSR is clear
    (enable)
• FIQ
  – The processor external fast interrupt request pin is
    asserted (LOW) and the F bit in the CPSR is clear
    (enable)
ARM processor exceptions and modes
               ARM Vector Table
• Exception handling is controlled by a vector table.
• It is a table of addresses that the ARM core branches to
  when an exception is raised and there is always branching
  instructions that direct the core to the ISR.
• This is a reserved area of 32 bytes at the bottom of the
  memory map with one word of space allocated to each
  exception type.
• the vector table starts at 0x00000000 (ARMx20 processors
  can optionally locate the vector table address to
  0xffff0000).
• A vector table consists of a set of ARM instructions that
  manipulate the PC (i.e. B, MOV, and LDR). These
  instructions cause the PC to jump to a specific location that
  can handle a specific exception or interrupt.
ARM exception vector locations
      Exception handling process
• When an exception occurs, control passes through an area
  of memory called the vector table. This is a reserved area
  usually at the bottom of the memory map.
• Figure shows the exception handling process.
ARM Exception Priorities
   Response to an Exception Handler
• When an exception occurs, the ARM:
   – Copies the CPSR into the SPSR for the mode
     in which the exception is to be handled.
      • Saves the current mode, interrupt mask, and
        condition flags.                                   0x1C           FIQ
                                                           0x18           IRQ
   – Changes the appropriate CPSR mode bits
                                                           0x14       (Reserved)
      • Change to the appropriate mode
                                                           0x10       Data Abort
      • Map in the appropriate banked registers for that
        mode                                               0x0C    Prefetch Abort
                                                           0x08
   – Disable interrupts                                            Software Interrupt
                                                           0x04   Undefined Instruction
      • IRQs are disabled when any exception occurs.
                                                           0x00          Reset
      • FIQs are disabled when a FIQ occurs, and on
        reset                                                      Vector Table
   – Set lr_mode to the return address
   – Set the program counter(PC) to the vector
     address for the exception
Returning From an Exception Handler
• To return, exception handler needs to:
  – Restore the CPSR from spsr_mode
  – Restore the program counter using the return
    address stored in lr_mode
                 Interrupt Handlers
• There are two types of interrupts available on ARM processor.
    – The first type is the interrupt caused by external events from hardware
      peripherals
    – The second type is the SWI instruction.
• The ARM processor has two levels of external interrupt, FIQ and
  IRQ, both of which are level-sensitive active LOW signals into the
  core.
• For an interrupt to be taken, the relevant input must be LOW and
  the disable bit in the CPSR must be clear.
• FIQs have higher priority than IRQs in two ways:
    – 1 FIQs are serviced first when multiple interrupts occur.
    – 2 Servicing a FIQ causes IRQs to be disabled, preventing them from
      being serviced until after the FIQ handler has re-enabled them (usually
      by restoring the CPSR from the SPSR at the end of the handler).
            Assigning interrupts
• How are interrupts assigned?
• It is up to the system designer who can decide
  which hardware peripheral can produce which
  interrupt request.
  – Interrupt controller
     • Multiple external interrupts to one if the two ARM interrupt
       requests
  – Standard design practice
     • SWI are reserved to call privileged operating system routines
     • IRQ are assigned for general-purpose interrupts
         – A periodic timer
     • FIQ are reserved for a single interrupt source that require a
       fast response time
         – Direct memory access to move blocks of memory
         – FIQ has a higher priority and shorter interrupt latency than IRQ
             Interrupt Latency
• It is the interval of time between from an
  external interrupt signal being raised to the
  first fetch of an instruction of the ISR of the
  raised interrupt signal.
• System architects must balance between two
  things,
  – first is to handle multiple interrupts
    simultaneously,
  – second is to minimize the interrupt latency.
              Interrupt Latency
• Minimization of the interrupt latency is achieved
  by software handlers by two main methods,
  – the first one is to allow nested interrupt handling so
    the system can respond to new interrupts during
    handling an older interrupt.
     • This is achieved by enabling interrupts immediately after the
       interrupt source has been serviced but before finishing the
       interrupt handling.
  – The second one is the possibility to give priorities to
    different interrupt sources;
     • this is achieved by programming the interrupt controller to
       ignore interrupts of the same or lower priority than the
       interrupt being handled if there is one.
      Enabling and disabling Interrupt
• This is done by modifying the CPSR, this is done
  using only 3 ARM instruction:
  –   MRS   To read CPSR
  –   MSR   To store in CPSR
  –   BIC   Bit clear instruction
  –   ORR   OR instruction
Enabling an IRQ/FIQ             Disabling an IRQ/FIQ
Interrupt:                      Interrupt:
  MRS r1, cpsr                      MRS r1, cpsr
  BIC r1, r1, #0x80/0x40            ORR r1, r1, #0x80/0x40
  MSR cpsr_c, r1                    MSR cpsr_c, r1
              Interrupt stack
• Stacks are needed extensively for context
  switching between different modes when
  interrupts are raised.
• The design of the exception stack depends on
  two factors:
  – OS Requirements.
  – Target hardware.
• A good stack design tries to avoid stack overflow
  because it cause instability in embedded systems.
       Setting up the interrupt stacks
• Each operation in a system has
  its own requirement for stack
  design
   – Stack pointers are initialized
     after reset
• Where the interrupt stack is
  placed depends upon the
  RTOS requirements and the
  specific hardware being used.
• Two design decisions need to
  be made for the stacks:
   – The location
   – The size
• Figure 1.14 shows two
  possible designs.
    Setting up the interrupt stacks
• Design A is a standard design found on many ARM based
  systems.
• If the Interrupt Stack expands into the Interrupt vector the
  target system will crash. Unless some check is placed on the
  extension of the stack and some means to handle that error
  when it occurs.
• The example in figure 1.14 shows two possible stack layouts.
   – The first (A) shows the tradition stack layout with the interrupt
     stack being stored underneath the code segment.
   – The second, layout (B) shows the interrupt stack at the top of the
     memory above the user stack.
• One of the main advantages that layout (B) has over layout
  (A) is that the stack grows into the user stack and thus does
  not corrupt the vector table.
• For each mode a stack has to be setup. This is carried out
  every time the processor is reset.
      Example to setup stacks
USR_Stack EQU 0x20000
IRQ_Stack EQU 0x8000
SVC_Stack EQU IRQ_Stack-128
…
Usr32md EQU 0x10
FIQ32md EQU 0x11
IRQ32md EQU 0x12
SVC32md EQU 0x13
Abt32md EQU 0x17
Und32md EQU 0x1b
Sys32md EQU 0x1f
NoInt EQU 0xc0 ; Disable interrupts
      Interrupt handling schemes
•   Non-nested interrupt handler
•   Nested interrupt handler
•   Re-entrant nested interrupt handler
•   Prioritized interrupt handler
    Interrupt handling schemes
• Non-nested interrupt handling scheme
  – This is the simplest interrupt handler.
  – Interrupts are disabled until control is returned
    back to the interrupted task.
  – One interrupt can be served at a time.
  – Not suitable for complex embedded systems.
            Interrupt handling schemes
• Each stage is explained in more detail
  below:
    1.   External source (for example from an
         interrupt controller) sets the
         Interrupt flag. Processor masks
         further external interrupts and
         vectors to the interrupt handler via
         an entry in the vector table.
    2.   Upon entry to the handler, the
         handler code saves the current
         context of the non banked registers.
    3.   The handler then identifies the
         interrupt source and executes the
         appropriate interrupt service routine
         (ISR).
    4.   ISR services the interrupt.
    5.   Upon return from the ISR the handler
         restores the context.
    6.   Enables interrupts and return.
• Nested interrupt handling scheme(1)
   – Handling more than one interrupt at a time is possible by
     enabling interrupts before fully serving the current interrupt.
   – Latency is improved.
   – System is more complex.
   – No difference between interrupts by priorities, so normal
     interrupts can block critical interrupts.
Nested interrupt handling scheme(2)
Nested interrupt handling scheme(2)
       Re-entrant interrupt handler
• A re-entrant interrupt handler is a method of handling multiple
  interrupts where interrupts are filtered by priority.
• This is important since there is a requirement that interrupts with
  higher priority have a lower latency.
• This type of filtering cannot be achieved using the conventional
  nested interrupt handler.
• The basic difference between a re-entrant interrupt handler and a
  nested interrupt handler is that the interrupts are re-enabled early
  on in the interrupt handler to achieve low interrupt latency.
    Prioritized interrupt handler
• Types of prioritized interrupt handler which
  provide different handling strategies, as given
  below:
  – Simple prioritized interrupt handler
  – Standard prioritized interrupt handler
  – Grouped prioritized interrupt handler
    Prioritized interrupt handler
• Simple prioritized interrupt handler:
  – In this scheme the handler will associate a priority level
    with a particular interrupt source.
  – A higher priority interrupt will take precedence over a
    lower priority interrupt.
  – Handling prioritization can be done by means of software
    or hardware.
  – In case of hardware prioritization the handler is simpler to
    design because the interrupt controller will give the
    interrupt signal of the highest priority interrupt requiring
    service.
  – But on the other side the system needs more initialization
    code at start-up since priority level tables have to be
    constructed before the system being switched on.
    Prioritized interrupt handler
• Simple prioritized interrupt handler:
     Prioritized interrupt handler
• Standard prioritized interrupt handler
   – arranges priorities in a special way to reduce the
     time needed to decide on which interrupt will be
     handled.
• Grouped prioritized interrupt handler
   – groups some interrupts into subset which has a
     priority level, this is good for large amount of
     interrupt sources.
                 Memory formats
• The ARM7TDMI processor views memory as a linear
  collection of bytes numbered in ascending order from zero.
• For example:
   – bytes zero to three hold the first stored word
   – bytes four to seven hold the second stored word.
• The ARM7TDMI processor is bi-endian and can treat words
  in memory as being stored in either:
   – Little-endian.
   – Big-endian
• Note
   – Little-endian is traditionally the default format for ARM
     processors.
• Little-endian
  – In little-endian format, the lowest addressed byte in a
    word is considered the least-significant byte of the
    word and the highest addressed byte is the most
    significant.
• Big-endian
  – In big-endian format, the ARM7TDMI processor
    stores the most significant byte of a word at the
    lowest-numbered byte, and the least significant
    byte at the highest-numbered byte.
          ARM Instruction Set
• ARM instructions fall into one of the following
  three categories:
• Data processing instructions.
• Data transfer instructions.
• Control flow instructions.
  Features of the ARM Instruction Set
• Load-store architecture
   – Process values which are in registers
   – Load, store instructions for memory data accesses
• 3-address data processing instructions
• Conditional execution of every instruction
• The inclusion of every powerful load and store multiple
  register instructions
• Single-cycle execution of all instruction
• Open coprocessor instruction set extension
• Very dense 16-bit compressed instruction set (Thumb)
         Load-store architecture
• ARM employs a load-store architecture.
  – This means that the instruction set will only process
    (add, subtract, and so on) values which are in registers
    (or specified directly within the instruction itself), and
    will always place the results of such processing into a
    register.
  – The only operations which apply to memory state are
    ones which copy memory values into registers (load
    instructions) or copy register values into memory
    (store instructions).
  – ARM does not support such 'memory-to-memory'
    operations.
                         Thumb
• Thumb is a 16-bit instruction set
   – Optimized for code density from C code
   – Improved performance form narrow memory
   – Subset of the functionality of the ARM instruction set
• Core has two execution states – ARM and Thumb
   – Switch between them using BX instruction
• Thumb has characteristic features:
   – Most Thumb instruction are executed unconditionally
   – Many Thumb data process instruction use a 2-address
     format
   – Thumb instruction formats are less regular than ARM
     instruction formats, as a result of the dense encoding.
        Conditional Execution (1)
• One of the ARM's most interesting features is that
  each instruction is conditionally executed
• Most other instruction sets allow conditional
  execution of branch instructions, based on the
  state of the condition flags.
• In ARM, almost all instructions have can be
  conditionally executed.
• If corresponding condition is true, the instruction is
  executed. If the condition is false, the instruction is
  turned into a nop.
         Conditional Execution (2)
• The condition is specified by suffixing the instruction with a
  condition code mnemonic.
• This improves code density and performance by reducing the
  number of forward branch instructions.
• CMP r3,#0                  CMP r3,#0
   BEQ skip                   ADDNE r0,r1,r2
   ADD r0,r1,r2
  skip
• In the following example, the instruction moves r1 to r0
  only if carry is set.
       MOVCS r0, r1
                     The Condition Field
              31       28          24   20   16           12           8         4             0
               Cond
0000 = EQ - Z set (equal)                         1001 = LS - C clear or Z (set unsigned
0001 = NE - Z clear (not equal)                          lower or same)
0010 = HS / CS - C set (unsigned                  1010 = GE - N set and V set, or N clear
       higher or same)                                   and V clear (>or =)
0011 = LO / CC - C clear (unsigned                1011 = LT - N set and V clear, or N clear
       lower)                                            and V set (>)
0100 = MI -N set (negative)                       1100 = GT - Z clear, and either N set and
0101 = PL - N clear (positive or                         V set, or N clear and V set (>)
       zero)                                      1101 = LE - Z set, or N set and V clear,or
0110 = VS - V set (overflow)                             N clear and V set (<, or =)
0111 = VC - V clear (no overflow)                 1110 = AL - always
1000 = HI - C set and Z clear                     1111 = NV - reserved.
       (unsigned higher)
Using and updating the Condition Field
• To execute an instruction conditionally, simply postfix it
  with the appropriate condition:
   – For example an add instruction takes the form:
       • ADD r0,r1,r2            ; r0 = r1 + r2 (ADDAL)
   – To execute this only if the zero flag is set:
       • ADDEQ r0,r1,r2          ; If zero flag set then…
                                 ; ... r0 = r1 + r2
• By default, data processing operations do not affect the
  condition flags (apart from the comparisons where this is
  the only effect).
• To cause the condition flags to be updated, the S bit of the
  instruction needs to be set by postfixing the instruction
  (and any condition code) with an “S”.
   – For example to add two numbers and set the condition flags:
       • ADDS r0,r1,r2           ; r0 = r1 + r2
                                 ; ... and set flags
 Examples of conditional execution
• Use a sequence of several conditional instructions
     if (a==0) func(1);
        CMP       r0,#0
        MOVEQ     r0,#1
        BLEQ      func
• Set the flags, then use various condition codes
     if (a==0) x=0;
     if (a>0) x=1;
        CMP       r0,#0
        MOVEQ     r1,#0
        MOVGT     r1,#1
• Use conditional compare instructions
     if (a==4 || a==10) x=0;
        CMP       r0,#4
        CMPNE     r0,#10
        MOVEQ     r1,#0
              Conditional Execution
• An unusual feature of the ARM instruction set is that conditional
  execution applies no only to branches but to all ARM instructions
         CMP r0,#5                              CMP r0,#5
         BEQ Bypass     ;if (r0!=5)             ADDNE r1,r1,r0
         ADD r1,r1,r0 ;{r1=r1+r0}               SUBNE r1,r1,r2
         SUB r1,r1,r2
  Bypass …
• Whenever the conditional sequence is 3 instructions for
  fewer it is better (smaller and faster) to exploit conditional
  execution than to use a branch
    if((a==b)&&(c==d)) e++;                CMP r0,r1
                                           CMPEQ r2,r3
                                           ADDEQ r4,r4,#1
                      ARM Instruction Set Format
31          2827                          1615                      87                         0   Instruction type
 Cond          0 0 I Opcode     S   Rn            Rd                Operand2                       Data processing / PSR Transfer
 Cond          0 0 0 0 0 0 A S      Rd            Rn        Rs           1 0 0 1      Rm           Multiply
 Cond          0 0 0 0 1 U A S      RdHi         RdLo       Rs           1 0 0 1      Rm           Long Multiply           (v3M / v4 only)
 Cond          0 0 0 1 0 B 0 0      Rn           Rd     0 0 0 0 1 0 0 1               Rm           Swap
 Cond          0 1 I P U B W L      Rn           Rd                      Offset                    Load/Store Byte/Word
 Cond          1 0 0 P U S W L      Rn                   Register List                             Load/Store Multiple
 Cond         0 0 0   P U 1 W L      Rn           Rd    Offset1 1 S H 1 Offset2                    Halfword transfer : Immediate offset (v4 only)
 Cond         0 0 0   P U 0 W L     Rn           Rd     0 0 0 0 1 S H 1               Rm           Halfword transfer: Register offset (v4 only)
 Cond          1 0 1 L                             Offset                                          Branch
     Cond     0 0 0 1    0 0 1 0 1 1 1 1 1 1 1 1        1 1 1 1 0 0 0 1                   Rn       Branch Exchange                (v4T only)
 Cond          1 1 0 P U N W L       Rn          CRd        CPNum               Offset             Coprocessor data transfer
 Cond          1 1 1 0    Op1       CRn          CRd        CPNum         Op2     0      CRm       Coprocessor data operation
 Cond          1 1 1 0   Op1    L   CRn           Rd        CPNum         Op2     1      CRm       Coprocessor register transfer
 Cond          1 1 1 1                           SWI Number                                        Software interrupt
      The ARM instruction set
• Data processing instructions.
• ARM data processing instructions enable the programmer
  to perform arithmetic and logical operations on data values
  in registers.
• They are
   –   Arithmetic instructions
   –   Logical instructions
   –   Comparison instructions
   –   Move instructions
   –   Multiply instructions.
• the data processing instructions are the only instructions
  which modify data values.
• Most data processing instructions can process one of their
  operands using the barrel shifter.
      The ARM instruction set
• Data processing instructions.
• General rules:
  – All operands are 32-bit, coming from registers or
    literals.
  – The result, if any, is 32-bit and placed in a register
    (with the exception for long multiply which produces
    a 64-bit result)
  – 3-address format
Data processing instruction binary
            encoding
     31          28 2726 25 24   21 20 19        1615        12 11                                  0
          cond      0 0 # opcode S          Rn          Rd                    operand 2
                                                             destination register
                                                             first operand register
                                                             set condition codes
                                                             arithmetic/logic function
                        25                                     11           8 7                     0
                         1                                           #rot         8-bit immediate
                                     immediate alignment
                                                               11             7 6 5 4 3             0
                                                                     #shift        Sh 0     Rm
                        25         immediate shift length
                         0                        shift type
                                 second operand register
                                                               11           8 7 6 5 4 3             0
                                                                     Rs       0 Sh 1        Rm
                                      register shift length
         The ARM instruction set
Data processing instructions:
• Consist of :
   – Arithmetic:     ADD ADC SUB SBC RSB RSC
   – Logical:        AND ORR EOR BIC
   – Comparisons: CMP CMN TST               TEQ
   – Data movement:          MOV MVN
• These instructions only work on registers, NOT memory.
• Syntax:
     <Operation>{<cond>}{S} Rd, Rn, Operand2
      • Comparisons set flags only - they do not specify Rd
      • Data movement does not specify Rn
• Second operand is sent to the ALU via barrel shifter.
         The ARM instruction set
Data processing instructions:
• The arithmetic/logic instructions share a common
  instruction format.
• These perform an arithmetic or logical operation on
  up to two source operands, and write the result to a
  destination register.
• They can also optionally update the condition code
  flags, based on the result.
• Of the two source operands:
  – one is always a register
  – the other has two basic forms:
     • an immediate value
     • a register value, optionally shifted.
         The ARM instruction set
Data processing instructions:
• If the operand is a shifted register, the shift
  amount can be
   – an immediate value or
   – the register value.
• Five types of shift can be specified.
   – LSL/ASL, LSR, ASR, ROR, RRX
• Every arithmetic/logic instruction can therefore
  perform an arithmetic/logic operation and a
  shift operation.
• ARM does not have dedicated shift instructions.
            The ARM instruction set
Data processing instructions:
• Arithmetic operations.
   –   ADD, ADC : add (w. carry)
   –   SUB, SBC : subtract (w. carry)
   –   RSB, RSC : reverse subtract (w. carry)
   –   MUL, MLA : multiply (and accumulate)
                                                Instruction Sets-169
            The ARM instruction set
Data processing instructions:
• Arithmetic operations examples.
   ADD r0, r1, r2               ;r0:= r1 + r2
   ADC r0, r1, r2               ;r0:= r1 + r2 +C
   SUB r0, r1, r2               ;r0:= r1 - r2
   SBC r0, r1, r2               ;r0:= r1 - r2 + C - 1
   RSB r0, r1, r2               ;r0:= r2 – r1
   RSC r0, r1, r2               ;r0:= r2 – r1 + C – 1
• Some other Examples
   –   SUBGT r3, r3, #1
   –   RSBLES r4, r5, #5
   –   ADD r0, r2, r1, LSL #2
   –   RSB r4, r3, r2, LSL #3
                                                        Instruction Sets-170
          The ARM instruction set
Data processing instructions:
• Bit-wise logical operations.
   – Perform the specified Boolean logic operation on each bit
     pair of the input operands, so in the first case r0[i]:= r1[i]
     AND r2[i] for each value of i from 0 to 31 inclusive, where
     r0[i] is the ith bit of r0.
• AND, OR , XOR (here called EOR) logical operations
  and BIC(stands for ‘bit clear’).
                                                         Instruction Sets-171
              The ARM instruction set
Data processing instructions:
• Bit-wise logical operations examples.
• bit clear(BIC): R2 is a mask identifying which bits of R1 will be cleared
  to zero
• let us consider R1=0x11111111 R2=0x01100101
  BIC R0, R1, R2
  result in R0=0x10011010
• Examples:
    – AND       r0, r1, r2
    – BICEQ     r2, r3, #7
    – EORS      r1,r3,r0
                                                               Instruction Sets-172
         The ARM instruction set
Data processing instructions:
• Comparison operations.
   – These instructions do not produce a result but just set the
     condition code bits (N, Z, C and V) in the CPSR according
     to the selected operation.
                                                       Instruction Sets-173
           The ARM instruction set
Data processing instructions:
• Comparison operations examples.
   PRE cpsr = nzcvqiFt_USER
              r0 = 4 r9 = 4
           CMP r0, r9
   POST cpsr = nZcvqiFt_USER
• You can see that both registers, r0 and r9, are equal before
  executing the instruction.
• prior to execution
   – The value of the z flag is 0 and is represented by a lowercase z.
• After execution
   – the z flag changes to 1 or an uppercase Z.
• This change indicates equality.
• The CMP is effectively a subtract instruction with the result
  discarded.
                                                                     Instruction Sets-174
         The ARM instruction set
Data processing instructions:
• Comparison operations examples.
• compare
   – CMP R1, R2   @ set cc on R1-R2
• compare negated
   – CMN R1, R2   @ set cc on R1+R2
• bit test
   – TST R1, R2   @ set cc on R1 and R2
• test equal
   – TEQ R1, R2   @ set cc on R1 xor R2
                                          Instruction Sets-175
          The ARM instruction set
Data processing instructions:
• Multiplication operations.
   – The multiply instructions multiply the contents of a pair
     of registers and, depending upon the instruction,
     accumulate the results in with another register.
   – The long multiplies accumulate onto a pair of registers
     representing a 64-bit value. The final result is placed in a
     destination register or a pair of registers.
                                                         Instruction Sets-176
              The ARM instruction set
Data processing instructions:
• Multiplication operations.
• Multiply:
    MUL R0, R1, R2            ; R0 = (R1xR2)[31:0]
• Multiply-accumulate:
    MLA r4, r3, r2, r1        ; r4 := (r3 x r2 + r1)[31:0]
• Multiplying two 32-bit integers gives a 64-bit result, the least significant
  32 bits of which are placed in the result register and the rest are ignored.
• This can be viewed as multiplication in modulo arithmetic and gives the
  correct result whether the operands are viewed as signed or unsigned
  integers.
•   Operand restrictions
     – Immediate second operands are not supported.
     – The result register must not be the same as the first source register.
     – The destination register Rd must not be the same as the operand register Rm.
     – R15 must not be used as an operand or as the destination register.
                                                                          Instruction Sets-177
          The ARM instruction set
Data processing instructions:
• Register movement operations.
   – Move is the simplest ARM instruction.
   – It copies N into a destination register Rd, where N is a
     register or immediate value.
   – This instruction is useful for setting initial values and
     transferring data between registers.
                                                         Instruction Sets-178
        The ARM instruction set
Data processing instructions:
• Register movement operations.
  PRE        r5 = 5      r7 = 8
   MOV r7, r5            ;r7 = r5
  POST       r5 = 5      r7 = 5
• This example shows a simple move instruction.
• The MOV instruction takes the contents of
  register r5 and copies them into register r7,
• in this case, taking the value 5, and overwriting
  the value 8 in register r7.
                                             Instruction Sets-179
         The ARM instruction set
Data processing instructions:
• Register movement operations.
   – MVN r0, r2              ;r0= not r2
• The 'MVN' mnemonic stands for 'move negated';
• it leaves the result register set to the value
  obtained by inverting every bit in the source
  operand.
• Examples:
   – MOVS r2, #10
   – MVNEQ r1,#0
• Use MVN to:
   – form a bit mask
   – take the ones complement of a value.
         Data operation varieties
• Logical shift:
   – fills with zeroes
• Arithmetic shift:
   – fills with sign bit on shift right
• RRX performs 33-bit rotate, including C bit
  from CPSR above sign bit.
                                          Instruction Sets-181
            ARM shift operations
• The available shift operations are:
  – LSL: logical shift left by 0 to 31 places; fill the
    vacated bits at the least significant end of the
    word with zeros.
  – LSR: logical shift right by 0 to 31 places; fill the
    vacated bits at the most significant end of the
    word with zeros.
          ARM shift operations
• The available shift operations are:
  – ASL: arithmetic shift left; this is a synonym for LSL.
  – ASR: arithmetic shift right by 0 to 31 places;
     • fill the vacated bits at the MSB end of the word with
       zeros if the source operand was positive, or with ones if
       the source operand was negative.
           ARM shift operations
• The available shift operations are:
  – ROR: rotate right by 0 to 32 places;
  – RRX: rotate right extended by 1 place;
         Data transfer instructions
• Data transfer instructions move data between ARM
  registers and memory.
• There are three basic forms of data transfer instruction in
  the ARM instruction set:
   – Single register load and store instructions.
       • These instructions provide the most flexible way to transfer single
         data items between an ARM register and memory.
       • The data item may be a byte, a 32-bit word, or a 16-bit half-word.
   – Multiple register load and store instructions.
       • These instructions are less flexible than single register transfer
         instructions, but enable large quantities of data to be transferred
         more efficiently.
       • They are used for procedure entry and exit, to save and restore
         workspace registers, and to copy blocks of data around memory.
   – Single register swap instructions.
       • These instructions allow a value in a register to be exchanged with a
         value in memory, effectively doing both a load and a store operation
         in one instruction.
    ARM load/store instructions
• The ARM is a Load / Store Architecture:
   – Does not support memory to memory data processing
     operations.
   – Must move data values into registers before using them.
• This might sound inefficient, but in practice isn’t:
   – Load data values from memory into registers.
   – Process data in registers using a number of data processing
     instructions which are not slowed down by memory access.
   – Store results from registers out to memory.
• The ARM has three sets of instructions which interact with
  main memory. These are:
   – Single register data transfer (LDR / STR).
   – Block data transfer (LDM/STM).
   – Single Data Swap (SWP).
    ARM load/store instructions
• LDR, LDRH, LDRB : load (half-word, byte)
• STR, STRH, STRB : store (half-word, byte)
• Addressing modes:
  – register indirect : LDR r0,[r1]
  – with second register : LDR r0,[r1,-r2]
  – with constant : LDR r0,[r1,#4]
                                             Instruction Sets-187
        Single register data transfer
• The basic load and store instructions are:
   – Load and Store Word or Byte
       • LDR / STR / LDRB / STRB
• ARM Architecture Version 4 also adds support for halfwords
  and signed data.
   – Load and Store Halfword
       • LDRH / STRH
   – Load Signed Byte or Halfword - load value and sign extend it to 32
     bits.
       • LDRSB / LDRSH
• All of these instructions can be conditionally executed by
  inserting the appropriate condition code after STR / LDR.
   – e.g. LDREQB
• Syntax:
   – <LDR|STR>{<cond>}{<size>} Rd, <address>
           Addressing Modes
• Immediate Addressing
   – The desired value is a binary value in the instruction
• Register Addressing
   – The instruction contains the full binary address
• Indirect addressing
   – The instruction contains the binary address of a memory
     location containing the binary address
• Base relative addressing
   – Plus offset
   – Plus index
   – Plus scaled index
• Stack addressing
      Memory Addressing Modes
• Pre-indexed mode
  – The effective address of the operand is the sum of the
    contents of the base register Rn and an offset value
• Pre-indexed with writeback mode
  – The effective address of the operand is generated in
    the same way as in the Pre-indexed mode, and then
    the effective address is written back into Rn
• Post-indexed mode
  – The effective address of the operand is the contents
    of Rn. The offset is then added to this address and the
    result is written back into Rn.
         Register-indirect addressing
• The memory location to be accessed is held in a base register
   – STR r0, [r1]          ; Store contents of r0 to location pointed to
                           ; by contents of r1.
   – LDR r2, [r1]          ; Load r2 with contents of memory location
                           ; pointed to by contents of r1.
                      r0             Memory
       Source
                     0x5
       Register
       for STR
               r1                                           r2
   Base                                                            Destination
             0x200           0x200     0x5                 0x5
  Register                                                          Register
                                                                    for LDR
       Base plus offset addressing
• As well as accessing the actual location contained in the base
  register, these instructions can access a location offset from
  the base register pointer.
• This offset can be
   – An unsigned 12bit immediate value (ie 0 - 4095 bytes).
   – A register, optionally shifted by an immediate value
• This can be either added or subtracted from the base
  register:
   – Prefix the offset value or register with ‘+’ (default) or ‘-’.
• This offset can be applied:
   – before the transfer is made: Pre-indexed addressing
       • optionally auto-incrementing the base register, by postfixing the
         instruction with an ‘!’.
   – after the transfer is made: Post-indexed addressing
       • causing the base register to be auto-incremented.
              Pre-indexed Addressing
• Example: STR r0, [r1,#12]                Memory
                                                         r0    Source
                                                        0x5    Register
                                                               for STR
                       Offset
                         12        0x20c      0x5
             r1
 Base
           0x200                   0x200
Register
• To store to location 0x1f4 instead use: STR r0, [r1,#-12]
• To auto-increment base pointer to 0x20c use: STR r0, [r1, #12]!
• If r2 contains 3, access 0x20c by multiplying this by 4:
     – STR r0, [r1, r2, LSL #2]   ;r2= r2*4
             Post-indexed Addressing
• Example: STR r0, [r1], #12                 Memory
             r1             Offset                      r0
Updated                                                         Source
 Base      0x20c              12     0x20c              0x5     Register
Register                                                        for STR
                                     0x200    0x5
             r1
Original
 Base      0x200
Register
• To auto-increment the base register to location 0x1f4 instead use:
     – STR r0, [r1], #-12
• If r2 contains 3, auto-incremenet base register to 0x20c by
  multiplying this by 4:
     – STR r0, [r1], r2, LSL #2
                              Block Data Transfer (1)
• The Load and Store Multiple instructions (LDM / STM) allow betweeen 1 and
  16 registers to be transferred to or from memory.
• The transferred registers can be either:
   – Any subset of the current bank of registers (default).
   – Any subset of the user mode bank of registers when in a priviledged mode
      (postfix instruction with a ‘^’).
                      31           28 27        24 23 22 21 20 19          16   15                                                            0
                           Cond          1   0 0 P U S W L          Rn                             Register list
           Condition field                                     Base register                 Each bit corresponds to a particular
                                                                                             register. For example:
  Up/Down bit                                                  Load/Store bit                • Bit 0 set causes r0 to be transferred.
  0 = Down; subtract offset from base                          0 = Store to memory           • Bit 0 unset causes r0 not to be transferred.
  1 = Up ; add offset to base                                  1 = Load from memory          At least one register must be transferred as
                                                               Write- back bit               the list cannot be empty.
  Pre/Post indexing bit
  0 = Post; add offset after transfer,                         0 = no write-back
  1 = Pre ; add offset before transfer                         1 = write address into base
                                                               PSR and force user bit
                                                               0 = don’t load PSR or force user mode
                                                               1 = load PSR or force user mode
             Block Data Transfer (2)
• Base register used to determine where memory access
  should occur.
   – 4 different addressing modes allow increment and decrement
     inclusive or exclusive of the base register location.
   – Base register can be optionally updated following the transfer
     (by appending it with an ‘!’.
   – Lowest register number is always transferred to/from lowest
     memory location accessed.
• These instructions are very efficient for
   – Saving and restoring context
       • For this useful to view memory as a stack.
   – Moving large blocks of data around memory
       • For this useful to directly represent functionality of the instructions.
           Block Data Transfer (3)
• When LDM / STM are not being used to implement
  stacks, it is clearer to specify exactly what
  functionality of the instruction is:
  – i.e. specify whether to increment / decrement the base
    pointer, before or after the memory access.
• In order to do this, LDM / STM support a further
  syntax in addition to the stack one:
  –   STMIA / LDMIA : Increment After
  –   STMIB / LDMIB : Increment Before
  –   STMDA / LDMDA : Decrement After
  –   STMDB / LDMDB : Decrement Before
               Stack Operations
• The ARM architecture uses the load-store multiple
  instructions to carry out stack operations.
• The pop operation (removing data from a stack) uses a
  load multiple instruction.
• the push operation (placing data onto the stack) uses a
  store multiple instruction.
• A stack is either ascending (A) or descending (D).
   – Ascending stacks grow towards higher memory addresses.
   – Descending stacks grow towards lower memory addresses.
• the LDMFD and STMFD instructions provide the pop
  and push functions, respectively.
            Stack Operations
• Example:
• The STMFD instruction pushes registers onto
  the stack, updating the sp.
• PRE r1 = 0x00000002
• r4 = 0x00000003 sp = 0x00080014
• STMFD sp!, {r1,r4}
   Swap and Swap Byte Instructions
• The swap instruction is a special case of a load-store instruction.
• It swaps the contents of memory with the contents of a register.
• This instruction is an atomic operation.
   – it reads and writes a location in the same bus operation, preventing any
     other instruction from reading or writing to that location until it
     completes.
• Thus to implement an actual swap of contents make Rd = Rm.
Swap and Swap Byte Instructions
                     1
   Rn
                              temp
        2                     3
            Memory
   Rm                    Rd
  Swap and Swap Byte Instructions
• Example
• The swap instruction loads a word from memory into
  register r0 and overwrites the memory with register r1.
• PRE mem32[0x9000] = 0x12345678
• r0 = 0x00000000
• r1 = 0x11112222
• r2 = 0x00009000
• SWP r0, r1, [r2]
• POST mem32[0x9000] = 0x11112222
• r0 = 0x12345678
• r1 = 0x11112222
• r2 = 0x00009000
            Control Flow Instructions
• This category of instructions neither processes
  data nor moves it around; it simply determines
  which instructions get executed next.
      –   Branch instructions
      –   Conditional branches
      –   Conditional execution
      –   Branch and link instructions
      –   Subroutine return instructions
      –   Supervisor calls
      –   Jump tables
204
                           Branch Instructions
• Change the flow of sequential execution of instructions and
  force to modify the program counter.
   – Branch           :                    B{<cond>} label
   – Branch with Link :                    BL{<cond>} sub_routine_label
       31          28 27   25 24 23                                       0
            Cond       1 0 1 L                      Offset
                                      Link bit   0 = Branch
                                                 1 = Branch with link
                                      Condition field
• Branch (B)
   – jumps in a range of +/-32 MB.
• Branch with link(BL)
   – suitable for subroutine call by storing the address of next
     instructions after BL into the link register(lr) and restore the
     program counter(pc) from the link register while returning from
     subroutine.
          Branch Instructions
• Branch Exchange and Branch Exchange Link
  for switching the processor state from Thumb
  to ARM and vice versa.
• ARM      Thumb
• Branch Exchange: BX{<cond>} Rm
• Branch Exchange Link: BLX{<cond>} label/Rm
    ARM Branches and Subroutines
• B <label>
   – PC relative. ±32 Mbyte range.
• BL <subroutine>
   – Stores return address in LR
   – Returning implemented by restoring the PC from LR
   – For non-leaf functions, LR will have to be stacked
                               func1                func2
       :                    STMFD                  :
       :                    sp!,{regs,lr}          :
       BL func1             :                      :
       :                    BL func2               :
       :                    :                      :
                            LDMFD                  MOV pc, lr
                            sp!,{regs,pc}
      Branch and Link Instructions
• Perform a branch, save the address following the branch in
  the link register, r14
             BL SUBR       ;branch to SUBR
             …             ;return here
      SUBR   …             ;subroutine entry point
             MOV PC,r14    ;return
• For nested subroutine, push r14 and some work registers
  required to be saved onto a stack in memory
           BL SUB1
              …
  SUB1 STMFD r13!,{r0-r2,r14}   ;save work and link regs
              …
              …
              …
         MOV PC,r14        ;copy r14 into r15 to return
                   Branch Instructions
• The most common way to switch program execution from one place
  to another is use the branch instruction:
                  B LABEL
                        …
       LABEL            …
• LABEL comes after or before the branch instruction.
• Example:
                   B Forward
                   ADD r1, r2, #4
                   ADD r0, r6, #2
                   ADD r3, r7, #4
         Forward
                SUB r1, r2, #4
         Backward
                ADD r1, r2, #4
                SUB r1, r2, #4
                ADD r4, r6, r7
                B Backward
           Conditional Branches
• The branch has a condition associated with it
  and it is only executed if the condition codes
  have the correct value – taken or not taken
       MOV r0,#0      ;initialize counter
Loop   …
       ADD r0,r0,#1   ;increment loop counter
       CMP r0,#10     ;compare with limit
       BNE   Loop     ;repeat if not equal
       …              ;else fail through
Conditional Branches
                   31           28 27         24 23 22 21 20 19         16 15                                                               0
                        Cond           1   0 0 P U S W L          Rn                             Register list
         Condition field                                     Base register                 Each bit corresponds to a particular
                                                                                           register. For example:
Up/Down bit                                                  Load/Store bit                • Bit 0 set causes r0 to be transferred.
0 = Down; subtract offset from base                          0 = Store to memory           • Bit 0 unset causes r0 not to be transferred.
1 = Up ; add offset to base                                  1 = Load from memory
                                                                                           At least one register must be transferred
Pre/Post indexing bit                                        Write- back bit               as the list cannot be empty.
0 = Post; add offset after transfer,                         0 = no write-back
1 = Pre ; add offset before transfer                         1 = write address into base
                                                            PSR and force user bit
                                                            0 = don’t load PSR or force user mode
                                                            1 = load PSR or force user mode
               Example: Block Copy
   – Copy a block of memory, which is an exact multiple of 12 words
     long from the location pointed to by r12 to the location pointed
     to by r13. r14 points to the end of block to be copied.
; r12 points to the start of the source data
; r14 points to the end of the source data
; r13 points to the start of the destination data
loop   LDMIA   r12!, {r0-r11} ; load 48 bytes
                                                      r13
       STMIA   r13!, {r0-r11} ; and store them
       CMP     r12, r14       ; check for the end     r14       Increasing
       BNE     loop           ; and loop until done              Memory
   – This loop transfers 48 bytes in 31 cycles        r12
   – Over 50 Mbytes/sec at 33 MHz
                  ARM Registers
• ARM has 31 general-purpose 32-bit registers. At any one
  time, 16 of these registers are visible.
• The other registers are used to speed up exception
  processing. All the register specifiers in ARM instructions
  can address any of the 16 visible registers.
• The main bank of 16 registers is used by all unprivileged
  code. These are the User mode registers. User mode is
  different from all other modes as it is unprivileged, which
  means:
   – User mode can only switch to another processor mode by
     generating an exception. The SWI instruction provides this
     facility from program control.
   – Memory systems and coprocessors might allow User mode less
     access to memory and coprocessor functionality than a
     privileged mode.
                             Registers
• General-purpose registers hold either data or an address.
  They are identified with the letter r prefixed to the register
  number. For example, register 4 is given the label r4.
• Figure 2.2 shows the active registers available in user
  mode—a protected mode normally used when executing
  applications. The processor can operate in seven different
  modes, which we will introduce shortly. All the registers
  shown are 32 bits in size.
• There are up to 18 active registers: 16 data registers and 2
  processor status registers. The data registers are visible to
  the programmer as r0 to r15.
• The ARM processor has three registers assigned to a
  particular task or special function: r13, r14, and r15. They
  are frequently given different labels to differentiate them
  from the other registers.
• the shaded registers identify the assigned special-purpose
  registers:
   – Register r13 is traditionally used as the stack pointer (sp) and
     stores the head of the stack in the current processor mode.
   – Register r14 is called the link register (lr) and is where the core
     puts the return address whenever it calls a subroutine.
   – Register r15 is the program counter (pc) and contains the
     address of the next instruction to be fetched by the processor.
• In addition to the 16 data registers, there are two program
  status registers: cpsr and spsr (the current and saved
  program status registers, respectively).
  Current Program Status Register
• The ARM core uses the cpsr to monitor and control
  internal operations.
• The cpsr is a dedicated 32-bit register and resides in
  the register file.
• The cpsr is divided into four fields, each 8 bits wide:
  flags, status, extension, and control.
• In current designs the extension and status fields are
  reserved for future use.
• The control field contains the processor mode, state,
  and interrupt mask bits.
• The flags field contains the condition flags.
• The format of the CPSR and the SPSRs is
  shown below.
• https://www.slideshare.net/MathivananNatar
  ajan/arm-instruction-set-60665439
                Processor Modes
• The processor mode determines which registers are active
  and the access rights to the cpsr register itself.
• Each processor mode is either privileged or nonprivileged:
• A privileged mode allows full read-write access to the cpsr.
• Conversely, a nonprivileged mode only allows read access
  to the control field in the cpsr but still allows read-write
  access to the condition flags.
• There are seven processor modes in total: six privileged
  modes (abort, fast interrupt request, interrupt request,
  supervisor, system, and undefined) and one nonprivileged
  mode (user).
• The processor enters abort mode when there is a failed attempt to
  access memory.
• Fast interrupt request and interrupt request modes correspond to
  the two interrupt levels available on the ARM processor.
• Supervisor mode is the mode that the processor is in after reset and
  is generally the mode that an operating system kernel operates in.
• System mode is a special version of user mode that allows full read-
  write access to the cpsr.
• Undefined mode is used when the processor encounters an
  instruction that is undefined or not supported by the
  implementation.
• User mode is used for programs and applications.
                       Banked Registers
•   Figure 2.4 shows all 37 registers in the register file. Of those, 20 registers are
    hidden from a program at different times. These registers are called banked
    registers and are identified by the shading in the diagram.
•   They are available only when the processor is in a particular mode; for example,
    abort mode has banked registers r13_abt, r14_abt and spsr_abt.
•   Banked registers of a particular mode are denoted by an underline character post-
    fixed to the mode mnemonic or _mode.
•   Every processor mode except user mode can change mode by writing directly to
    the mode bits of the cpsr. All processor modes except system mode have a set of
    associated banked registers that are a subset of the main 16 registers.
•   The processor mode can be changed by a program that writes directly to the cpsr
    (the processor core has to be in privileged mode) or by hardware when the core
    responds to an exception or interrupt.
•   The following exceptions and interrupts cause a mode change: reset, interrupt
    request, fast interrupt request, software interrupt, data abort, prefetch abort, and
    undefined instruction. Exceptions and interrupts suspend the normal execution of
    sequential instructions and jump to a specific location.
          Exception priorities
• When multiple exceptions arise at the same
  time, a fixed priority system determines the
  order in which they are handled.
• The priority order is listed in Table
               Entering an exception
• The ARM7TDMI processor handles an exception as follows:
1. Preserves the address of the next instruction in the appropriate LR.
     •   When the exception entry is from ARM state, the ARM7TDMI processor
         copies the address of the next instruction into the LR, current PC+4 or PC+8
         depending on the exception.
     •   When the exception entry is from Thumb state, the ARM7TDMI processor
         writes the value of the PC into the LR, offset by a value, current PC+4 or
         PC+8 depending on the exception, that causes the program to resume from
         the correct place on return.
     •   The exception handler does not have to determine the state when entering
         an exception. For example, in the case of a SWI, MOVS PC, r14_svc always
         returns to the next instruction regardless of whether the SWI was executed
         in ARM or Thumb state.
2. Copies the CPSR into the appropriate SPSR.
3. Forces the CPSR mode bits to a value that depends on the exception.
4. Forces the PC to fetch the next instruction from the relevant exception
   vector.
• Note
     – Exceptions are always entered in ARM state. When the processor is in Thumb
       state and an exception occurs, the switch to ARM state takes place
       automatically when the exception vector address is loaded into the PC.
            Entering an exception
• When an exception occurs, the ARM:
  – Preserves the address of the next instruction in the
    appropriate LR. When the exception entry is from:
     • ARM state, the ARM7TDMI-S copies the address of the next
       instruction into the LR (current PC + 4, or PC + 8 depending on
       the exception)
     • Thumb state, the ARM7TDMI-S writes the value of the PC into
       the LR, offset by a value (current PC + 4, or PC + 8 depending
       on the exception).
  – Copies the CPSR into the appropriate SPSR.
  – Forces the CPSR mode bits to a value which depends on
    the exception.
  – Forces the PC to fetch the next instruction from the
    relevant exception vector.
          Leaving an exception
• When an exception is completed, the exception
  handler must:
  1. Move the LR, minus an offset to the PC. The offset
     varies according to the type of exception
  2. Copy the SPSR back to the CPSR.
  3. Clear the interrupt disable flags that were set on
     entry.
• Note
  – The action of restoring the CPSR from the SPSR
    automatically resets the T bit to whatever value it held
    immediately prior to the exception.
ARM exception vector locations
Address                Exception
0x0000 0000              Reset
0x0000 0004              Undefined Instruction
0x0000 0008              Software Interrupt
0x0000 000C              Prefetch Abort (instruction fetch
                         memory fault)
0x0000 0010              Data Abort (data access memory
  fault)
0x0000 0014                Reserved
Note: Identified as reserved in ARM documentation, this
  location is used by the Boot Loader as the Valid User
  Program key.
0x0000 0018                IRQ
0x0000 001C                FIQ
Nested interrupt handling scheme(2)
      The ARM instruction set
• Data processing instructions.
• ARM data processing instructions enable the programmer to perform
  arithmetic and logical operations on data values in registers.
• the data processing instructions are the only instructions which modify
  data values.
• These instructions typically require two operands and produce a single
  result, though there are exceptions to both of these rules.
• Here are some rules which apply to ARM data processing instructions:
   – All operands are 32 bits wide and come from registers (or) are specified as
     literals in the instruction itself.
   – The result, if there is one, is 32 bits wide and is placed in a register. (There is
     an exception here: long multiply instructions produce a 64-bit result)
   – Each of the operand registers and the result register are independently
     specified in the instruction. That is, the ARM uses a '3-address' format for
     these instructions.
          The ARM instruction set
Data processing instructions:
• ADD, ADC : add (w. carry)        • AND, ORR, EOR
• SUB, SBC : subtract (w. carry)   • BIC : bit clear
• RSB, RSC : reverse subtract      • LSL, LSR : logical shift
  (w. carry)                         left/right
• MUL, MLA : multiply (and         • ASL, ASR : arithmetic shift
  accumulate)                        left/right
                                   • ROR : rotate right
                                   • RRX : rotate right extended
                                     with C
                                                       Instruction Sets-234
• ARM instructions were extended by adding 4 bit in
  the top of 32 bit instruction field:
The ARM instruction set Format
           The ARM instruction set
• Simple register operands
• A typical ARM data processing instruction is written in
  assembly language as shown below:
• Basic format:
   ADD r0,r1,r2 ; r0 : = r1 + r2
   – Computes r1+r2, stores in r0
• Immediate operand:
   ADD r0,r1,#2 ; r0 : = r1 + 2
   – Computes r1+2, stores in r0
• The semicolon in this line indicates that everything to the right
  of it is a comment and should be ignored by the assembler.
• Comments are put into the assembly source code to make
  reading and understanding it easier.
• This example simply takes the values in two registers (r1 and
  r2), adds them together, and places the result in a third register
  (r0).
      ARM comparison instructions
•   CMP : compare
•   CMN : negated compare
•   TST : bit-wise test (AND)
•   TEQ : bit-wise negated test (XOR)
•   These instructions set only the NZCV bits of
    CPSR.
                                           Instruction Sets-238
       ARM move instructions
• MOV, MVN : move (negated)
 MOV r0, r1 ; sets r0 to r1
                               Instruction Sets-239