0% found this document useful (0 votes)
29 views44 pages

02 Arm and Arm Processors

Uploaded by

VDES MNNIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views44 pages

02 Arm and Arm Processors

Uploaded by

VDES MNNIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Arm and Arm

Processors

© 2021 Arm
Learning Outcomes
At the end of this module, you will be able to:
• Identify the characteristics of Arm processor classes and their corresponding instruction
set architecture.
• Describe the properties of the Armv7-A architecture, including the AAPCS, operating
modes, registers, memory model, and Virtualization Extensions.
• Outline the properties of the Arm Cortex-A9 Processor.
• Identify what Arm Neon technology is and its usage.

2 © 2021 Arm
Arm Products
• Processors
• Cortex-A, R, M, SecurCore
• System IP
• CoreLink, CoreSight, AMBA Design Tools
• Multimedia
• Mali graphics, video, display
• Physical IP
• Artisan Logic IP, Interface IP, Memory IP, DesignStart
• Tools
• Software tools (Development Studio, Keil MDK), debug adapters, models, boards
• Support
• Training, documentation, Arm Connected Community

3 © 2021 Arm
Arm Processors and Applications

A
4 © 2021 Arm
Arm Processor Families
• Cortex-A series (advanced application)
• High-performance processors for open OSs
• Applications include smartphones, digital TV, server solutions, and home gateways.
• Cortex-R series (real-time)
• Exceptional performance for real-time applications
• Applications include automotive braking systems and powertrains.
• Cortex-M series (microcontroller)
• Cost-sensitive solutions for deterministic microcontroller applications
• Applications include microcontrollers, mixed signal devices, smart sensors, automotive body
electronics, and airbags.

5 © 2021 Arm
Arm Processor Families

• SecurCore series
• High-security applications such as smartcards and e-
government
• Neoverse
• High performance efficiency for cloud, infrastructure,
and AI/ML-accelerated applications
• Classic processors
• Include Arm7, Arm9, and Arm11 families

6 © 2021 Arm
Arm Cortex-A Series Family

• Cortex-A series: Cortex-A5, A7, A8, A9, A12, A15, A17, A53, A57
• High-performance application processors
• Run rich OSs, multicore technology, 32-bit and 64-bit supports
• Applications
• Mobile computing Netbook, tablet, eReader
• Mobile handset Smartphones, feature phones, wearables
• Digital home Set-top box, digital TV, Blu-Ray player, gaming consoles
• Automotive Infotainment, navigation
• Enterprise industrial printers, routers, wireless base-stations, VOIP phones and equipment
• Wireless infrastructure

7 © 2021 Arm
Arm Cortex-R Series Family
• Cortex-R series: Cortex-R4, R5, R7
• Real-time processor
• High-performance: Fast processing combined with a high clock frequency
• Real-time: Processing meets hard real-time constraints on all occasions.
• Safe: Dependable, reliable systems with high error resistance
• Cost-effective: Features for optimal performance, power, and area
• Applications
• Automotive Airbag, braking, stability, dashboard, engine management
• Storage Hard disk drive controllers, solid state drive controllers
• Mobile handsets 3G, 4G, LTE, WiMax smartphones and baseband modems
• Embedded, enterprise, home, cameras

8 © 2021 Arm
Arm Cortex-M Series Family

• Cortex-M series: Cortex-M0, M0+, M1, M3, M4, M7


• Low-power processor for embedded microcontrollers
• Energy-efficiency
• Smaller code
• Ease of use
• Applications
• Internet of Things, connectivity, smart metering, human interface devices, automotive and industrial
control systems, domestic household appliances, consumer products, and medical instrumentation

9 © 2021 Arm
Sample Arm Processors
Processor
Class

Arm7
Arm9
Arm11

M
R
A
10 © 2021 Arm
Arm Processors v Arm Architectures
• Arm architecture:
• Describes the details of instruction set, programmer’s model, Exception model, and memory map
• Documented in the Architecture Reference Manual
• Arm processor:
• Developed using one of the Arm architectures
• More implementing details, such as timing information and implementation-related information
• Documented in the processor’s Technical Reference Manual

Armv4/V4t Architecture Armv5/v4E Architecture Armv6 Architecture Armv7 Armv8 Architecture


Architecture Armv7-A Armv8-A
e.g., Cortex-A9 e.g., Cortex-A75
Cortex-A57
Armv7-R
e.g., Cortex-R4 Armv8-R
e.g., Cortex-R52

Arm v6-M Armv7-M Armv8-M


e.g., Cortex-M0, M1 e.g., Cortex-M3 e.g., Cortex-M33

e.g., Arm7TDMI e.g., Arm9926EJ-S e.g., Arm1136

11 © 2021 Arm
Arm Architectures

12 © 2021 Arm
Which Architecture Is My Processor?
Processor core Architecture

Arm7TDMI family v4T


Arm9TDMI family v4T
Arm9E family v5TE, v5TEJ
Arm10E family v5TE, v5TEJ
Arm11 family v6
• Arm1136J(F)-S v6
• Arm1156T2(F)-S v6T2
• Arm1176JZ(F)-S v6Z
• Arm11 MPCore v6k
Cortex family
• Arm Cortex-A57 v8-A (64-bit, highest performance)
• Arm Cortex-A53 v8-A (64-bit)
• Arm Cortex-A15 v7-A (with security and virtualization extensions)
• Arm Cortex-A9 v7-A (with security extensions)
• Arm Cortex-A8 v7-A (with security extensions)
• Arm Cortex-A7 v7-A (with security and virtualization extensions)
• Arm Cortex-A5 v7-A (with security extensions)
• Arm Cortex-R5 v7-R
• Arm Cortex-R4 v7-R
• Arm Cortex-M4 v7-M
• Arm Cortex-M3 v7-M
• Arm Cortex-M1 v6-M (16-bit Thumb, except for system instructions)
• Arm Cortex-M0 v6-M (16-bit Thumb, except for system instructions)
13 © 2021 Arm
Arm and Thumb Instruction Sets
• Early Arm processors
• 32-bit instruction set, called the Arm instruction set
• Powerful and good performance
• Larger program memory compared to 8-bit and 16-bit processors
• Larger power consumption
• Thumb-1 instruction set
• 16-bit instruction set, first used in the Arm7TDMI processor in 1995
• Provides a subset of the Arm instructions, giving better code density compared to 32-bit RISC architecture
• Code size is reduced by ~30%, but performance is also reduced by ~20%.
• Can be used together with Arm instructions using a multiplexer

0
Arm
Incoming Instructions
Instruction
Instructions Executing
Thumb Remap Decoder
1
to Arm
T bit, 0: select Arm,
14 © 2021 Arm
1: select Thumb
Arm and Thumb Instruction Sets

• Thumb-2 instruction set


• Consists of both 32-bit Thumb and original 16-bit Thumb-1 instruction sets
• Compared to the 32-bit Arm instruction set, code size is reduced by ~26%, while maintaining similar
performance
• Thumb Execution Environment (ThumbEE) instruction set
• Based on Thumb
• With some changes and additions to make it a better target for dynamically generated code, i.e., code
compiled on the device either shortly before or during execution
• Armv7-A architecture
• Based on Thumb-2 and ThumbEE

15 © 2021 Arm
The Arm Register Set
r0
r1
r2 • Sixteen general-purpose registers
r3
r4 • Some of the registers have special
r5
r6 General-purpose significance.
r7 registers • R15: Program Counter (pc)
r8 • R14: Link register (lr)
r9
• R13: Stack pointer (sp)
r10
r11 • There are also two status registers.
r12
• Current Program Status Register (CPSR)
r13 (sp) R13: stack pointer • Saved Program Status Register (SPSR)
r14 (lr) R14: link registers – Only present in exception modes
r15 (pc) R15: Program Counter
– Only accessible by some instructions
cpsr
Program Status Registers
spsr

16 © 2021 Arm
Assembler Syntax
• Data processing instructions
• <operation><condition> Rd, Rm, <op2>
• ADDEQ r4, r5, r6
• SUB r5, r7, #4
• MOV r4, #7

• Memory access instructions


• <operation><size> Rd, [<address>]
• LDR r0, [r6, #4]
• STRB r4, [r7], #8

• <operation><addressing mode> <Rn>!, <registers list>


• LDMIA r0, {r1, r2, r7}
• STMFD sp!, {r4-r11, lr}

• Program flow instructions


• <branch> <label>
• BL foo
• B bar

17 © 2021 Arm
AAPCS
Arguments into function
Result(s) from function
r0
r1
(a1)
(a2)
• The compiler has a set of rules known as a
otherwise corruptible
(Additional parameters
r2 (a3) Procedure Call Standard that determines
r3 (a4)
passed on stack) how to pass parameters to a function (see
r4 (v1) AAPCS).
• CPSR flags may be corrupted by function call.
r5 (v2)
r6 (v3)

• Assembler code that links with compiled


r7 (v4)
Register variables r8 (v5)
Must be preserved r9 (v6/SB) code must follow the AAPCS at external
r10 (v7)
r11 (v8) interfaces.
• The AAPCS is part of the ABI for the Arm
Scratch register
(corruptible)
r12 architecture.
• r9 is used as the static base if the RWPI
Stack Pointer r13 (sp)
Link Register r14 (lr) option selected.
Program Counter r15 (pc) - sp should always be 8-byte (2 words) aligned.
18 © 2021 Arm - r14 can be used as a temporary register once value stacked.
Processor Modes
• The Arm processor has seven basic operating modes:
• Each mode has access to its own stack space and a different subset of registers.
• Some operations can only be carried out in a privileged mode.

Mode Description Privilege level


Entered on reset and when a supervisor call instruction (SVC)
Supervisor PL1
is executed
Exception modes

FIQ Entered when a high-priority (fast) interrupt is raised PL1

IRQ Entered when a normal-priority interrupt is raised PL1


Privileged
modes
Abort Used to handle memory access violations PL1

Undef Used to handle undefined instructions PL1

System Privileged mode using the same registers as user mode PL1

Unprivileged
User Mode under which most applications/OS tasks run PL0
mode
19 © 2021 Arm
Banking of Registers
User mode IRQ FIQ Undef Abort SVC
r0
r1
r2
General-purpose registers are 32 bits long.
r3
r4 A subset of these registers is accessible in each mode.
r5 Note: System mode uses the user mode register set.
r6
r7
r8 r8
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr

Current mode Banked out registers


20 © 2021 Arm
Exceptions
• When an exception occurs:
• It causes entry into a processor mode that executes software at PL1 or PL2.
• It causes the execution of a software handler for the exception.
• Exception includes:
• Resets
• Interrupts
• Memory system aborts
• Undefined instructions
• SVCs, secure monitor calls (SMCs), and hypervisor calls (HVCs)
• Processor execution is forced to the exception vector (an address) corresponding to that type of
exception.
• Vector table:
• A set of eight consecutive vectors
• World aligned memory addresses starting at an exception base address

21 © 2021 Arm
Vector Table
• A vector table has one entry per exception type.
• Table entries contain instructions, not addresses.
• 1 × Arm instruction 0x1C FIQ
• 2 × 16-bit Thumb instructions 0x18 IRQ
• 1 × 32-bit Thumb instruction 0x14 (Reserved)
• Arm/Thumb controlled by SCTLR.TE bit 0x10 Data Abort

• The vector table address is configurable. 0x0C Prefetch Abort


0x08 SVC
• 0x0 or 0xFFFF0000
0x04 UNDEFINED instruction
• SCTLR.V bit / VINITHI signal
0x00 Reset
• The security extensions add support for other addresses.
• Vector base address registers Vector Table

At reset, the vector table can be at 0x0 or


0xFFFF0000.

22 © 2021 Arm
Memory Model Memory Map

Uncached Peripherals

OS

User Access Application Space

Cached, read- Vectors


only

• A system includes different memories and peripherals.


• The processor needs to be told how it should access different devices.
• For each address region:
• Access permissions
• Read/write permissions for user/privileged modes
• Memory types
• Caching/buffering and access ordering rules for memory accesses

23 © 2021 Arm
Memory Types
• In Armv6/Armv7, address locations must be described in terms of a type.
• The type tells the processor how it can access that location:
• Memory access ordering rules
• Caching and buffering behavior
• Speculation
• There are three mutually exclusive memory type attributes:
• Normal: Data and instructions
• Device: Devices/peripherals
• Strongly ordered: Device/peripherals, or data used by legacy code
• Normal and device memory allow additional attributes for specifying the cache policy
and whether the region is shared.
• For example, normal memory can be cached or non-cached.

24 © 2021 Arm
Example: Cached Arm Macrocell
• For memory management, an Arm core can include either an MMU or MPU.
• Memory Management Unit (MMU)
• Implements Virtual Memory System Architecture (VMSA)
• Memory Protection Unit (MPU)
• Implements physical memory system architecture (PMSA)

Instruction cache
MMU/MPU

Arm core AMBA


CP15

Bus Interface Unit


Interconnect

L2 Cache
WB

Data cache

25 © 2021 Arm
Data Alignment

• Armv6/v7 data alignment:


• Data accesses can be unaligned.
• Only a subset of load/store instructions support unaligned accesses.
• Unaligned accesses are only allowed to addresses marked as normal.
• The load/store unit will access memory with aligned memory accesses and make the data available to
the CPU.
• Instructions are aligned as follows:
• Arm instructions are word aligned.
• Thumb and ThumbEE instructions are halfword-aligned.
• Java bytecodes are byte-aligned.
• Arm processors are little-endian.
• But can be configured to access big-endian memory systems

26 © 2021 Arm
Endianness
• Endianness determines how contents of registers relate to the contents of memory.
• Arm registers are word (4 bytes) width.
• Arm addresses memory as a sequence of bytes.
• Arm processors are little-endian.
• But can be configured to access big-endian memory systems.

Little-endian memory system


• Least significant byte is at lowest address.
Big-endian memory system
• Most significant byte is at lowest address.

• Arm supports three models of endianness.


• LE little-endian
• BE-32 word-invariant big-endian (dropped in architecture v7)
• BE-8 byte-invariant big-endian (introduced in architecture v6)

27 © 2021 Arm
PMU
• Armv6 & Armv7-A/R processors include a Performance Monitoring Unit (PMU).
• The PMU provides a non-intrusive method of collecting execution information from the core.
• Enabling the PMU does not change the timing of the core.
• PMU accessed through
• CP15 (mandatory)
• A memory-mapped interface (optional)
• An external debug interface (optional)
• The PMU provides:
• Cycle counter: counts execution cycles (optional 1/64 divider)
• Programmable event counters
• The number of counters and available events vary between cores.
• The PMU can be configured to generate interrupts if a counter overflows.
• Interrupt signals are an output from the core.
• Need to be connected to the system’s interrupt controller.

28 © 2021 Arm
Coprocessors
• On earlier Arm processors, additional coprocessors could be added to expand the Arm
instruction set.
• Newer processors do not allow user-defined coprocessors:
• Usually better for system designers to use memory-mapped peripherals
• Easier to implement, since coprocessors have to be tied in to the core pipeline
• Arm uses coprocessors for internal functions so as not to enforce a particular memory
map:
• System control coprocessor: cp15
• Used for processor configuration: System ID, caches, MMU, TCMs, etc.
• Debug coprocessor: cp14
• Can be used to access debug control registers
• VFP and Neon: cp10 and cp11

29 © 2021 Arm
Architecture Extensions
• Architecture extensions to meet the changing needs of applications in new markets
• Security
• The TrustZone
• Additional operating mode, Monitor (Mon) mode, with associated banked registers and an additional secure operating state
• 40-bit physical addressing (LPAE)
• Extension to the VMSAv7 virtual memory architecture
• Enables the generation of 40-bit physical addresses from 32-bit virtual addresses
• Virtualization
• Extra mode: Hypervisor mode, with associated banked registers
• New Hyp exception to trap software accesses to hardware and configuration registers
• Advanced SIMD and floating-point: Both floating point (VFP) support and Advanced SIMD (Neon)
• Can be implemented together, in which case they share a common register bank and some common instructions

30 © 2021 Arm
TrustZone
• Processor provides two worlds: secure and normal.
• Each world has its own vector table and page tables.
• “Monitor” mode acts as a gatekeeper for moving between worlds.
• Two physical address spaces, controlled by NS attribute
• Secure (S) and Non-secure (NS)
• S:0x8000 treated as different physical location from NS:0x8000
• Debug for Secure world code and data can be restricted.

Application(s) Trusted Service(s)

OS Trusted OS

Secure Monitor

31 © 2021 Arm
Virtual Memory System Architecture (VMSA)

• Provides virtual address to physical address translation system


• Up to 40 bits fine grain translation

• Arm Memory Management Unit (MMU) implements VMSA


• Translation tables
• Descriptor

32 © 2021 Arm
Large Physical Address Extensions

• Long-descriptor format for page tables added


• 32-bit virtual address mapped onto 40-bit physical address space
• New translation table format using 64-bit translation table descriptors
• 1TB of memory space accessible
• 32-bit short-descriptor format still supported
• Configurable in the translation table base control register EAE bit (bit 31)
• Can use 16MB memory supersections to map onto 40-bit address space

33 © 2021 Arm
Virtualization

• Support for running multiple guest OSs in the normal world


• Hypervisor mode to control switching between guest OSs
• Two-stage address translation: OS and hypervisor levels
• Hypervisor mode can trap exceptions and choose which guest to direct them to.
Normal world Secure world

Application(s) Application(s) Trusted Service(s)

Guest OS Guest OS
Trusted OS
Hypervisor

Secure Monitor

34 © 2021 Arm
Arm Cortex-A Series Processors

• Armv8 architecture: 64-bit Cortex-A57 Armv8


• Cortex-A57 64-bit
Cortex-A53
• Cortex-A53
• Armv7 architecture: 32-bit Cortex-A15
• Cortex-A15
• Cortex-A9 Cortex-A9
• Cortex-A8 Armv7-A
• Cortex-A7 Cortex-A8 32-bit
• Cortex-A5
Cortex-A7

Cortex-A5

35 © 2021 Arm
Arm Cortex-A Series Overview
• High-performance
• Used in applications that have high-compute requirements
• Run rich OSs and deliver interactive media and graphics on the latest must-have devices.
• Multicore technology
• Single to quad-core implementation for performance orientated applications
• Supports symmetric and asymmetric OS implementations
• Arm big.LITTLE compatible
• Advanced extensions
• Thumb-2 for optimal code size and performance
• TrustZone Security Extensions for trusted computing
• Jazelle technology for accelerating execution environments such as Java, .Net, MSIL, Python, and Perl
• Ideal for mobile Internet
• Native support for technologies like Adobe Flash
• High-performance Neon engine for broad support of media codecs

36 © 2021 Arm
Arm Cortex-A Series Processors
Processor Performance Typical Frequency Architecture Year Comments
Cortex-A5 1.57 DMIPS/MHz /core 400-800 MHz Armv7-A 2009 Cost-effective processor core

Cortex-A7 1.9 DMIPS/MHz /core 800 MHz-1.2 GHz Armv7-A 2011 High-energy and area-efficient core

Cortex-A8 2.0 DMIPS/MHz/core 600 MHz-1 GHz Armv7-A 2005 First one supporting Armv7-A architecture

Cortex-A9 2.5 DMIPS/MHz /core 800 MHz-2 GHz Armv7-A 2007 Widely deployed Armv7-A-based processor

Cortex-A12 3.0 DMIPS/MHz /core Armv7-A 2013

Cortex-A15 >3.5 DMIPS/MHz /core Up to 2.5GHz Armv7-A 2010 High-performance core

Cortex-A17 Up to 2.5GHz Armv7-A 2014 The most efficient Armv7-A-based processor

Cortex-A53 2.3 DMIPS/MHz Armv8-A 2013 Most efficient 32/64-bit processor


Cortex-A57 >4.1 DMIPS/MHz Armv8-A 2013 Proven high-performance 32/64-bit core for mobile
and enterprise computing
Cortex-A72 Up to 2.5GHz Armv8-A 2015 Arm’s highest-performance processor

37 © 2021 Arm
Arm Cortex-A9 Processor

• Arm Cortex-A9 features


• Armv7 architecture: Thumb-2, ThumbEE
• 0.8GHz to 2GHz
• 2.5 DMIPS/MHz/core
• Single core or 4x MPCore solution
• Up to 20k DMIPS (2GHz, quad-core)
• Power-efficient and high-performance processor
• Dynamic length pipeline (8–11 stages)
• Up to 64KB L1 I/D cache
• Up to 8MB of L2 cache
• Optional Neon media and/or floating point
processing engine

38 © 2021 Arm
Cortex-A9 Diagram

39 © 2021 Arm
Cortex-A9 MPCore
• Contains up to four Cortex-A9
processors
• SCU
• Maintains L1 data cache coherency
between processors
• Arbitrates accesses to the L2 memory
system, through one or two external 64-
bit AXI Manager interfaces
• Optional ACP for maintaining coherency
with DMA controller, graphics processor,
or similar
• Integrated interrupt controller
• Same programmer’s model as Arm Generic
Interrupt Controller (GIC): the PL390
PrimeCell

40 © 2021 Arm
Cortex-A9 Pipeline
Ex1 Ex2 Main
WB
(P0)
Fe1 Fe2 Fe3 IQ De Re
M1 M2 Mac (M)

BM
Prefetch Unit
ISS Ex1 Ex2 WB Dual
(P1)
Decode and issue stages
AGU LSU WB
Load/store
(LS)

CE WB
Compute
Engine (CE)
• Five backend execution pipelines
• Pipelines are clustered into three different issue groups.
• Main, or multiply accumulate (Mac)
• Dual execution (also known as secondary)
• Load/store, or compute engine (Neon or floating point)
Core can issue up to 3 instructions per cycle.

41 © 2021 Arm
What Is Neon?
• Neon is a wide SIMD data processing architecture.
• Extension of the Arm instruction set
• Thirty-two registers, 64 bits wide (dual view as sixteen registers, 128 bits wide)
• Neon instructions perform packed SIMD processing.
• Registers are considered vectors of elements of the same data type.
• Data types can be signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision, and float.
• Instructions perform the same operation in all lanes.

42 © 2021 Arm
Neon Registers

• Neon provides a 256-byte register file.


• Distinct from the core registers
• Extension to the VFPv2 register file (VFPv3)
• Two explicitly aliased views
• 32 × 64-bit registers (D0–D31)
• 16 × 128-bit registers (Q0–Q15)
• Enables register trade-off
• Vector length
• Available registers

43 © 2021 Arm
Neon: Enhancing User Experiences

44 © 2021 Arm

You might also like