02 Arm and Arm Processors
02 Arm and Arm Processors
Processors
© 2021 Arm
Learning Outcomes
At the end of this module, you will be able to:
• Identify the characteristics of Arm processor classes and their corresponding instruction
   set architecture.
• Describe the properties of the Armv7-A architecture, including the AAPCS, operating
   modes, registers, memory model, and Virtualization Extensions.
• Outline the properties of the Arm Cortex-A9 Processor.
• Identify what Arm Neon technology is and its usage.
2   © 2021 Arm
Arm Products
• Processors
    •   Cortex-A, R, M, SecurCore
• System IP
    •   CoreLink, CoreSight, AMBA Design Tools
• Multimedia
    •   Mali graphics, video, display
• Physical IP
    •   Artisan Logic IP, Interface IP, Memory IP, DesignStart
• Tools
    •   Software tools (Development Studio, Keil MDK), debug adapters, models, boards
• Support
    •   Training, documentation, Arm Connected Community
3   © 2021 Arm
Arm Processors and Applications
A
4   © 2021 Arm
Arm Processor Families
• Cortex-A series (advanced application)
    •   High-performance processors for open OSs
    •   Applications include smartphones, digital TV, server solutions, and home gateways.
• Cortex-R series (real-time)
    •   Exceptional performance for real-time applications
    •   Applications include automotive braking systems and powertrains.
• Cortex-M series (microcontroller)
    •   Cost-sensitive solutions for deterministic microcontroller applications
    •   Applications include microcontrollers, mixed signal devices, smart sensors, automotive body
        electronics, and airbags.
5   © 2021 Arm
Arm Processor Families
• SecurCore series
    •   High-security applications such as smartcards and e-
        government
• Neoverse
    •   High performance efficiency for cloud, infrastructure,
        and AI/ML-accelerated applications
• Classic processors
    •   Include Arm7, Arm9, and Arm11 families
6   © 2021 Arm
Arm Cortex-A Series Family
• Cortex-A series: Cortex-A5, A7, A8, A9, A12, A15, A17, A53, A57
• High-performance application processors
    •   Run rich OSs, multicore technology, 32-bit and 64-bit supports
• Applications
    •   Mobile computing Netbook, tablet, eReader
    •   Mobile handset       Smartphones, feature phones, wearables
    •   Digital home         Set-top box, digital TV, Blu-Ray player, gaming consoles
    •   Automotive           Infotainment, navigation
    •   Enterprise industrial printers, routers, wireless base-stations, VOIP phones and equipment
    •   Wireless infrastructure
7   © 2021 Arm
Arm Cortex-R Series Family
• Cortex-R series: Cortex-R4, R5, R7
• Real-time processor
    •   High-performance: Fast processing combined with a high clock frequency
    •   Real-time: Processing meets hard real-time constraints on all occasions.
    •   Safe: Dependable, reliable systems with high error resistance
    •   Cost-effective: Features for optimal performance, power, and area
• Applications
    •   Automotive                 Airbag, braking, stability, dashboard, engine management
    •   Storage           Hard disk drive controllers, solid state drive controllers
    •   Mobile handsets 3G, 4G, LTE, WiMax smartphones and baseband modems
    •   Embedded, enterprise, home, cameras
8   © 2021 Arm
Arm Cortex-M Series Family
9   © 2021 Arm
Sample Arm Processors
     Processor
       Class
      Arm7
      Arm9
      Arm11
      M
       R
       A
10   © 2021 Arm
Arm Processors v Arm Architectures
• Arm architecture:
        •   Describes the details of instruction set, programmer’s model, Exception model, and memory map
        •   Documented in the Architecture Reference Manual
• Arm processor:
        •   Developed using one of the Arm architectures
        •   More implementing details, such as timing information and implementation-related information
        •   Documented in the processor’s Technical Reference Manual
11       © 2021 Arm
Arm Architectures
12   © 2021 Arm
Which Architecture Is My Processor?
     Processor core    Architecture
                                                             0
                                                                                   Arm
                   Incoming                                                                         Instructions
                                                                                Instruction
                  Instructions                                                                       Executing
                                      Thumb Remap                                Decoder
                                                             1
                                         to Arm
                                                                 T bit, 0: select Arm,
14   © 2021 Arm
                                                                 1: select Thumb
Arm and Thumb Instruction Sets
15   © 2021 Arm
The Arm Register Set
            r0
            r1
            r2                                • Sixteen general-purpose registers
            r3
            r4                                • Some of the registers have special
            r5
            r6     General-purpose              significance.
            r7     registers                      •   R15: Program Counter (pc)
            r8                                    •   R14: Link register (lr)
            r9
                                                  •   R13: Stack pointer (sp)
            r10
            r11                               • There are also two status registers.
            r12
                                                  •   Current Program Status Register (CPSR)
        r13 (sp)   R13: stack pointer             •   Saved Program Status Register (SPSR)
        r14 (lr)   R14: link registers                –   Only present in exception modes
        r15 (pc)   R15: Program Counter
                                                      –   Only accessible by some instructions
           cpsr
                   Program Status Registers
           spsr
16   © 2021 Arm
Assembler Syntax
• Data processing instructions
•    <operation><condition> Rd, Rm, <op2>
•         ADDEQ   r4, r5, r6
•         SUB            r5, r7, #4
•         MOV            r4, #7
17    © 2021 Arm
AAPCS
         Arguments into function
          Result(s) from function
                                        r0
                                        r1
                                               (a1)
                                               (a2)
                                                         • The compiler has a set of rules known as a
           otherwise corruptible
              (Additional parameters
                                        r2     (a3)        Procedure Call Standard that determines
                                        r3     (a4)
                     passed on stack)                      how to pass parameters to a function (see
                                        r4    (v1)         AAPCS).
                                                         • CPSR flags may be corrupted by function call.
                                        r5    (v2)
                                        r6    (v3)
System Privileged mode using the same registers as user mode PL1
                                                                                                                              Unprivileged
                               User         Mode under which most applications/OS tasks run                 PL0
                                                                                                                              mode
   19             © 2021 Arm
Banking of Registers
     User mode        IRQ        FIQ                 Undef                Abort                    SVC
             r0
             r1
             r2
                                        General-purpose registers are 32 bits long.
             r3
             r4                         A subset of these registers is accessible in each mode.
             r5                         Note: System mode uses the user mode register set.
             r6
             r7
             r8                    r8
             r9                    r9
            r10                   r10
            r11                   r11
            r12                   r12
          r13 (sp)   r13 (sp)   r13 (sp)            r13 (sp)              r13 (sp)                r13 (sp)
          r14 (lr)   r14 (lr)   r14 (lr)            r14 (lr)              r14 (lr)                r14 (lr)
          r15 (pc)
             cpsr
                      spsr       spsr                  spsr                 spsr                   spsr
21   © 2021 Arm
Vector Table
• A vector table has one entry per exception type.
• Table entries contain instructions, not addresses.
     •   1 × Arm instruction                                      0x1C    FIQ
     •   2 × 16-bit Thumb instructions                            0x18    IRQ
     •   1 × 32-bit Thumb instruction                             0x14    (Reserved)
     •   Arm/Thumb controlled by SCTLR.TE bit                     0x10    Data Abort
22   © 2021 Arm
Memory Model                                                             Memory Map
Uncached Peripherals
OS
23   © 2021 Arm
Memory Types
• In Armv6/Armv7, address locations must be described in terms of a type.
• The type tells the processor how it can access that location:
     •   Memory access ordering rules
     •   Caching and buffering behavior
     •   Speculation
• There are three mutually exclusive memory type attributes:
     •   Normal:             Data and instructions
     •   Device:             Devices/peripherals
     •   Strongly ordered:   Device/peripherals, or data used by legacy code
• Normal and device memory allow additional attributes for specifying the cache policy
  and whether the region is shared.
     •   For example, normal memory can be cached or non-cached.
24   © 2021 Arm
Example: Cached Arm Macrocell
• For memory management, an Arm core can include either an MMU or MPU.
• Memory Management Unit (MMU)
     •      Implements Virtual Memory System Architecture (VMSA)
• Memory Protection Unit (MPU)
     •      Implements physical memory system architecture (PMSA)
                       Instruction cache
      MMU/MPU
                                                                 L2 Cache
                                WB
Data cache
25   © 2021 Arm
Data Alignment
26   © 2021 Arm
Endianness
• Endianness determines how contents of registers relate to the contents of memory.
     •   Arm registers are word (4 bytes) width.
     •   Arm addresses memory as a sequence of bytes.
• Arm processors are little-endian.
     •   But can be configured to access big-endian memory systems.
27   © 2021 Arm
PMU
• Armv6 & Armv7-A/R processors include a Performance Monitoring Unit (PMU).
• The PMU provides a non-intrusive method of collecting execution information from the core.
     •   Enabling the PMU does not change the timing of the core.
• PMU accessed through
     •   CP15 (mandatory)
     •   A memory-mapped interface (optional)
     •   An external debug interface (optional)
• The PMU provides:
     •   Cycle counter: counts execution cycles (optional 1/64 divider)
     •   Programmable event counters
     •   The number of counters and available events vary between cores.
     •   The PMU can be configured to generate interrupts if a counter overflows.
     •   Interrupt signals are an output from the core.
     •   Need to be connected to the system’s interrupt controller.
28   © 2021 Arm
Coprocessors
• On earlier Arm processors, additional coprocessors could be added to expand the Arm
  instruction set.
• Newer processors do not allow user-defined coprocessors:
     •   Usually better for system designers to use memory-mapped peripherals
     •   Easier to implement, since coprocessors have to be tied in to the core pipeline
• Arm uses coprocessors for internal functions so as not to enforce a particular memory
  map:
     •   System control coprocessor: cp15
     •   Used for processor configuration: System ID, caches, MMU, TCMs, etc.
     •   Debug coprocessor: cp14
     •   Can be used to access debug control registers
     •   VFP and Neon: cp10 and cp11
29   © 2021 Arm
Architecture Extensions
• Architecture extensions to meet the changing needs of applications in new markets
• Security
     •   The TrustZone
     •   Additional operating mode, Monitor (Mon) mode, with associated banked registers and an additional secure operating state
• 40-bit physical addressing (LPAE)
     •   Extension to the VMSAv7 virtual memory architecture
     •   Enables the generation of 40-bit physical addresses from 32-bit virtual addresses
• Virtualization
     •   Extra mode: Hypervisor mode, with associated banked registers
     •   New Hyp exception to trap software accesses to hardware and configuration registers
• Advanced SIMD and floating-point: Both floating point (VFP) support and Advanced SIMD (Neon)
     •   Can be implemented together, in which case they share a common register bank and some common instructions
30   © 2021 Arm
TrustZone
• Processor provides two worlds: secure and normal.
     •   Each world has its own vector table and page tables.
• “Monitor” mode acts as a gatekeeper for moving between worlds.
• Two physical address spaces, controlled by NS attribute
     •   Secure (S) and Non-secure (NS)
     •   S:0x8000 treated as different physical location from NS:0x8000
• Debug for Secure world code and data can be restricted.
OS Trusted OS
Secure Monitor
31   © 2021 Arm
Virtual Memory System Architecture (VMSA)
32   © 2021 Arm
Large Physical Address Extensions
33   © 2021 Arm
Virtualization
                   Guest OS                  Guest OS
                                                                        Trusted OS
                                Hypervisor
Secure Monitor
34    © 2021 Arm
Arm Cortex-A Series Processors
Cortex-A5
35   © 2021 Arm
Arm Cortex-A Series Overview
• High-performance
     •   Used in applications that have high-compute requirements
     •   Run rich OSs and deliver interactive media and graphics on the latest must-have devices.
• Multicore technology
     •   Single to quad-core implementation for performance orientated applications
     •   Supports symmetric and asymmetric OS implementations
     •   Arm big.LITTLE compatible
• Advanced extensions
     •   Thumb-2 for optimal code size and performance
     •   TrustZone Security Extensions for trusted computing
     •   Jazelle technology for accelerating execution environments such as Java, .Net, MSIL, Python, and Perl
• Ideal for mobile Internet
     •   Native support for technologies like Adobe Flash
     •   High-performance Neon engine for broad support of media codecs
36   © 2021 Arm
 Arm Cortex-A Series Processors
Processor           Performance            Typical Frequency   Architecture   Year   Comments
Cortex-A5           1.57 DMIPS/MHz /core   400-800 MHz         Armv7-A        2009   Cost-effective processor core
Cortex-A7 1.9 DMIPS/MHz /core 800 MHz-1.2 GHz Armv7-A 2011 High-energy and area-efficient core
Cortex-A8 2.0 DMIPS/MHz/core 600 MHz-1 GHz Armv7-A 2005 First one supporting Armv7-A architecture
Cortex-A9 2.5 DMIPS/MHz /core 800 MHz-2 GHz Armv7-A 2007 Widely deployed Armv7-A-based processor
 37    © 2021 Arm
Arm Cortex-A9 Processor
38   © 2021 Arm
Cortex-A9 Diagram
39   © 2021 Arm
Cortex-A9 MPCore
• Contains up to four Cortex-A9
  processors
• SCU
     •   Maintains L1 data cache coherency
         between processors
     •   Arbitrates accesses to the L2 memory
         system, through one or two external 64-
         bit AXI Manager interfaces
     •   Optional ACP for maintaining coherency
         with DMA controller, graphics processor,
         or similar
• Integrated interrupt controller
     •   Same programmer’s model as Arm Generic
         Interrupt Controller (GIC): the PL390
         PrimeCell
40   © 2021 Arm
 Cortex-A9 Pipeline
                                                                                       Ex1   Ex2        Main
                                                                                                   WB
                                                                                                        (P0)
  Fe1               Fe2     Fe3         IQ            De             Re
                                                                                       M1    M2         Mac (M)
                                                                     BM
  Prefetch Unit
                                                                                 ISS   Ex1   Ex2   WB   Dual
                                                                                                        (P1)
                                                       Decode and issue stages
                                                                                       AGU   LSU   WB
                                                                                                        Load/store
                                                                                                        (LS)
                                                                                       CE          WB
                                                                                                        Compute
                                                                                                        Engine (CE)
• Five backend execution pipelines
• Pipelines are clustered into three different issue groups.
        •   Main, or multiply accumulate (Mac)
        •   Dual execution (also known as secondary)
        •   Load/store, or compute engine (Neon or floating point)
Core can issue up to 3 instructions per cycle.
 41         © 2021 Arm
What Is Neon?
• Neon is a wide SIMD data processing architecture.
     •   Extension of the Arm instruction set
     •   Thirty-two registers, 64 bits wide (dual view as sixteen registers, 128 bits wide)
• Neon instructions perform packed SIMD processing.
     •   Registers are considered vectors of elements of the same data type.
     •   Data types can be signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision, and float.
     •   Instructions perform the same operation in all lanes.
42   © 2021 Arm
Neon Registers
43   © 2021 Arm
Neon: Enhancing User Experiences
44 © 2021 Arm