0% found this document useful (0 votes)
15 views159 pages

Module 3-1

The document covers the ARM instruction set, detailing various types of instructions including data processing, branch, load-store, and software interrupt instructions. It provides syntax and examples for move, arithmetic, logical, comparison, and multiply instructions, as well as addressing modes for load and store operations. Additionally, it explains the structure and function of branch instructions for controlling program flow.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views159 pages

Module 3-1

The document covers the ARM instruction set, detailing various types of instructions including data processing, branch, load-store, and software interrupt instructions. It provides syntax and examples for move, arithmetic, logical, comparison, and multiply instructions, as well as addressing modes for load and store operations. Additionally, it explains the structure and function of branch instructions for controlling program flow.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

Module 3

ARM Instruction Set


ARM Instruction Sets
 Data Processing Instructions
 Branch Instructions
 Load-store instructions
 Software interrupt instructions
 Program status register instructions
 Conditional Execution
Data Processing Instructions
 Manipulate data within registers
 Data processing instructions
◼ Move instructions
◼ Arithmetic instructions
◼ Logical instructions
◼ Comparison instructions
◼ Multiply instructions
Data processing
❑They are move, arithmetic, logical, comparison and
multiply instructions.
❑Most data processing instructions can process one of
their operands using the barrel shifter.

• General rules:
– All operands are 32-bit, coming
from registers or literals.
– The result, if any, is 32-bit and
placed in a register (with the
exception for long multiply which
produces a 64-bit result)
– 3-address format
Data Processing Instruction

5
ARM Instruction Set Format
Move Instruction
 Syntax: <instruction> {<cond>} {S}Rd, N
◼ N: a register or immediate value

 MOV : move
◼ MOV r0, r1; r0 = r1
◼ MOV r0, #5; r0 = 5
 MVN : move (negated)
◼ MVN r0, r1; r0 = NOT(r1)=~ (r1)
Preprocessed by Shifter
 Example 1
◼ PRE: r5 = 5, r7 = 8;

◼ MOV r7, r5, LSL #2; r7 = r5 << 2 = r5*4

◼ POST: r5 = 5, r7 = 20
Preprocessed by Shifter
 LSL: logical shift left
◼ x << y, the least significant bits are filled with zeroes
 LSR: logical shift right:
◼ (unsigned) x >> y, the most significant bits are filled with zeroes
 ASR: arithmetic shift right
◼ (signed) x >> y, copy the sign bit to the most significant bit
 ROR: rotate right
◼ ((unsigned) x >> y) | (x << (32-y))
 RRX: rotate right extended
◼ c flag <<31 | (( unsigned) x >> 1)
◼ Performs 33-bit rotate, with the CPSR’s C bit being inserted above
sign bit of the word
Shift Register Operands
– ADD r1,r2,r3,LSL #3 ;r=
– r1= r2 + (r3 << 3) 31 0 31 0

– A single instruction executed in a


single cycle
00000 00000
❑ LSL: Logical Shift Left by 0 to LSL #5 LSR #5
31 places, 0 filled at the lsb 31 0 31 0

end 0 1

❑ LSR, ASL (Arithmetic Shift


00000 0 11111 1
Left), ASR, ROR (Rotate
ASR #5 , positive operand ASR #5 , negative operand
Right), RRX (Rotate Right
31 0 31 0
etended by 1 bit) C

– ADD r5,r5,r3,LSL r2 ;
r5:=r5+r3*2r2 C C

– MOV r12,r4,ROR r3 ROR #5 RRX


;r12:=r4 rotated right
by value of r3
10
Preprocessed by Shifter (Cont.)
 Example 2
◼ PRE: r0 = 0x00000000, r1 = 0x80000004

◼ MOV r0, r1, LSL #1 ; r0 = r1 *2

◼ POST r0 = 0x00000008, r1 = 0x80000004


Arithmetic Instructions
 Syntax: <instruction> {<cond>} {S}Rd, Rn, N
◼ N: a register or immediate value
 ADD : add
◼ ADD r0, r1, r2; r0 = r1 + r2
 ADC : add with carry
◼ ADC r0, r1, r2; r0 = r1 + r2 + C
 SUB : subtract
◼ SUB r0, r1, r2; r0 = r1 - r2
 SBC : subtract with carry
◼ SUC r0, r1, r2; r0 = r1 - r2 + C -1
Arithmetic Instructions (Cont.)
 RSB : reverse subtract
◼ RSB r0, r1, r2; r0 = r2 – r1
 RSC : reverse subtract with carry
◼ RSC r0, r1, r2; r0 = r2 – r1 + C -1
 MUL : multiply
◼ MUL r0, r1, r2; r0 = r1 x r2
 MLA : multiply and accumulate
◼ MLA r0, r1, r2, r3; r0 = r1 x r2 + r3
Logical Operations
 Syntax: <instruction> {<cond>} {S}Rd, RN, N
◼ N: a register or immediate value
 AND : Bit-wise and
 ORR : Bit-wise or
 EOR : Bit-wise exclusive-or
 BIC : bit clear
◼ BIC r0, r1, r2; r0 = r1 & Not(r2)
Logical Operations (Cont)
 Example 3:
◼ PRE: r1 = 0b1111, r2 = 0b0101

◼ BIC r0, r1, r2 ; r0 = r1 AND (NOT(r2))

◼ POST: r0=0b1010
Comparison Instructions
 Compare or test a register with a 32-bit value
◼ Do not modify the registers being compared or
tested

◼ But only set the values of the NZCV bits of the


CPSR register
 Do not need to apply to S suffix for comparison
instruction to update the flags in CPSR register
Comparison Instructions (Cont.)
 Syntax: <instruction> {<cond>} {S}Rd, N
◼ N: a register or immediate value
 CMP : compare
◼ CMP r0, r1; compute (r0 - r1)and set NZCV
 CMN : negated compare
◼ CMP r0, r1; compute (r0 + r1)and set NZCV
 TST : bit-wise AND test
◼ TST r0, r1; compute (r0 AND r1)and set NZCV
 TEQ : bit-wise exclusive-or test
◼ TEQ r0, r1; compute (r0 EOR r1)and set NZCV
Comparison Instructions (Cont.)
 Example 4
◼ PRE: CPSR = nzcvqiFt_USER, r0 = 4, r9 = 4

◼ CMP r0, r9

◼ POST: CPSR = nZcvqiFt_USER


Multiply Instruction
 Syntax:
◼ MLA{<cond>} {S} Rd, Rm, Rs, Rn
◼ MUL{<cond>} {S} Rd, Rm, Rs
 MUL : multiply
◼ MUL r0, r1, r2; r0 = r1*r2
 MLA : multiply and accumulate
◼ MLA r0, r1, r2, r3; r0 = (r1*r2) + r3
Multiply Instruction (Cont.)
 Syntax: <instruction>{<cond>} {S}RdLo, RdHi, Rm, Rs
◼ Multiply onto a pair of register representing a 64-bit value
 UMULL : unsigned multiply long
◼ UMULL r0, r1, r2, r3; [r1,r0] = r2*r3
 UMLAL : unsigned multiply accumulate long
◼ UMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
 SMULL: signed multiply long
◼ SMULL r0, r1, r2, r3; [r1,r0] = r2*r3
 SMLAL : signed multiply accumulate long
◼ SMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
Branch Instructions
 Branch instruction
◼ Change the flow of execution
◼ Used to call a routine
 Allow applications to
◼ Have subroutines
◼ Implement if-then-else structure
◼ Implement loop structure
Branch Instructions (Cont.)
 Syntax
◼ B{<cond>} lable
◼ BL{<cond>} lable
 B : branch
◼ B label; pc (program counter) = label
◼ Used to change execution flow
 BL : branch and link
◼ BL label; pc = label, lr = address of the next
address after the BL
◼ Similar to the B instruction but can be used for subroutine
call
 Overwrite the link register (lr) with a return address
Branch Instructions (Cont.)
 Example 5
B forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
Forward
SUB r1, r2, #4
Backward
SUB r1, r2, #4
B backward
Branch Instructions (Cont.)
 Example 6:
BL subroutine
CMP r1, #5
MOVEQ r1, #0

subroutine
<subroutine code>
MOV pc, lr ; return by moving pc = lr
Load-Store Instructions
 Transfer data between memory and processor
registers

 Three types
◼ Single-register transfer
◼ Multiple-register transfer
◼ Swap
Simple-Register Transfer
 Moving a single data item in and out of
register

 Data item can be


◼ A word (32-bits)
◼ Halfword (16-bits)
◼ Bytes (8-bits)
Simple-Register Transfer (Cont.)
 Syntax
◼ <LDR|STR>{<cond>}{B} Rd, addressing1
◼ LDR{<cond>}SB|H|SH Rd, addressing2
◼ STR{<cond>} H Rd, addressing2
 LDR : load word into a register from memory
 LDRB : load byte
 LDRSB : load signed byte
 LDRH : load half-word
 LSRSH : load signed halfword
 STR: store word from a register to memory
 STRB : store byte
 STRH : store half-word
Simple-Register Transfer (Cont.)
 Example 7
LDR r0, [r1] ;= LDR r0, [r1, #0]
;r0 = mem32[r1]
STR r0, [r1] ;= STR r0, [r1, #0]
;mem32[r1]= r0

◼ Register r1 is called the base address register


Single-Register Load-Store Addressing Mode

 Index method, also called Base-Plus-Offset


Addressing
◼ Base register
 r0 – r15
◼ Offset, add or subtract an unsigned number
 Immediate
 Register (not PC)
 Scaled register
Single-Register Load-Store Addressing
Mode (Cont.)
 Preindex:
◼ data: mem[base+offset]
◼ Base address register: not updated
◼ Ex: LDR r0,[r1,#4] ; r0:=mem32[r1+4]
 Postindex:
◼ data: mem[base]
◼ Base address register: base + offset
◼ Ex: LDR r0,[r1],#4 ; r0:=mem32[r1], then r1:=r1+4
 Preindex with writeback (also called auto-indexing)
◼ Data: mem[base+offset]
◼ Base address register: base + offset
◼ Ex: LDR r0, [r1,#4]! ; r0:=mem32[r1+4], then r1:=r1+4
Single-Register Load-Store Addressing
Mode (Cont.)

 Example 8
◼ r0 = 0x00000000, r1 = 0x00009000,
mem32[0x00009000] = 0x01010101,
mem32[0x00009004] = 0x02020202
◼ Preindexing: LDR r0, [r1, #4]
 r0 = 0x02020202, r1=0x00009000
◼ Postindexing: LDR r0, [r1], #4
 r0 = 0x01010101, r1=0x00009004
◼ Preindexing with writeback: LDR r0, [r1, #4]!
 R0 = 0x02020202, r1=0x00009004
Single-Register Load-Store Addressing
Mode (Cont.)
Addressing mode and index method Addressing syntax
Preindex with immediate offset [Rn, #+/-offset_12]
Preindex with register offset [Rn, +/-Rm]
Preindex with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Preindex writeback with immediate offset [Rn, #+/-offset_12]!
Preindex writeback with register offset [Rn, +/-Rm]!
Preindex writeback with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Immediate postindexed [Rn], #+/-offset_12]
Register postindexed [Rn], +/-Rm!
Scaled register postindexed [Rn], +/-Rm, shift #shift_imm
Examples of LDR Using Different
Addressing Modes
Instruction r0= r1+=
Preindex with LDR r0, [r1, #0x4]! mem32[r1+0x4] 0x4
writeback
LDR r0, [r1,r2]! mem32[r1+r2] r2
LDR r0,[r1, r2, LSR#0x4]! mem32[r1+(r2 LSR 0x4)] (r2 LSR 0x4)
Preindex LDR r0, [r1, #0x4] mem32[r1+0x4] not updated
LDR r0, [r1, r2] mem32[r1+r2] not updated
LDR r0, [r1, -r2, LSR #0x4] Mem32[r1-(r2 LSR 0x4)] not updated
Postindex LDR r0, [r1], #0x4 mem32[r1] 0x4
LDR r0, [r1], r2 Mem32[r1] r2
LDR r0, [r1], r2 LSR #0x4 mem32[r1] (r2 LSR 0x4)
Multiple-Register Transfer
 Transfer multiple registers between memory
and the processor in a single instruction

 More efficient than single-register transfer


◼ Moving blocks of data around memory
◼ Saving and restoring context and stack
Multiple-Register Transfer (Cont.)
 Load-store multiple instruction can increase interrupt
latency
◼ Interrupt can be occurred after an instruction has been
completed
◼ Each load multiple instruction takes 2 + N*t cycles
 N: the number of registers to load
 t: the number of cycles required for sequential access to memory
◼ Compilers provides a switch to control the maximum
number of registers between transferred
 Limit the maximum interrupt latency
Multiple-Register Transfer (Cont.)
 Syntax:
◼ <LDM|STM>{<cond>} <mode> Rn{!}, <registers>{^}
◼ Address mode: See the next page
◼ ^: optional
 Can not be used in User Mode and System Mode
 If op is LDM and reglist contains the pc (r15)
◼ SPSR is also copied into the CPSR.
 Otherwise, data is transferred into or out of the User mode
registers instead of the current mode registers.
Multiple-Register Transfer (Cont.)
 Example 9
◼ PRE:
mem32[0x80018] = 0x03,
mem32[0x80014] = 0x02,
mem32[0x80010] = 0x01,
r0 = 0x00080010,
r1 = r2 = r3= 0x00000000

◼ LDMIA r0!, {r1-r3}, or LDMIA r0!, {r1, r2, r3}


 Register can be explicitly listed or use the “-” character
Pre-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000000

0x80014 0x00000002 R2=0x00000000

R0 = 0x80010 0x80010 0x00000001 R1=0x00000000

0x8000c 0x00000000
Figure 1
Post-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005

R0 = 0x8001c 0x8001c 0x00000004


0x80018 0x00000003 R3=0x00000003

0x80014 0x00000002 R2=0x00000002

0x80010 0x00000001 R1=0x00000001

0x8000c 0x00000000

Figure 2
Multiple-Register Transfer (Cont.)
 Example 9 (Cont.)
◼ POST:
r0 = 0x0008001c,
r1 = 0x00000001,
r2 = 0x00000002,
r3 = 0x00000003
Multiple-Register Transfer (Cont.)
 Example 10
◼ PRE: as shown in Fig. 1
◼ LDMIB r0!, {r1-r3}
◼ POST:
r0 = 0x0008001c
r1 = 0x00000004
r2 = 0x00000003
r3 = 0x00000002
Post-Condition for LDMIB Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004 R3=0x00000004

0x80018 0x00000003 R2=0x00000003

0x80014 0x00000002 R1=0x00000002

0x80010 0x00000001
0x8000c 0x00000000
Figure 3
Multiple-Register Transfer (Cont.)
 Load-store multiple pairs when base update used (!)
◼ Useful for saving a group of registers and store them later

Store multiple Load multiple

STMIA LDMDB
STMIB LDMDA
STMDA LDMIB
STMDB LDMIA
Multiple-Register Transfer (Cont.)
 Example 11
◼ PRE:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
◼ STMIB r0!, {r1-r3}
MOV r1, #1
MOV r2, #2,
MOV r3, #3
Multiple-Register Transfer (Cont.)
 Example 11 (Cont.)
◼ PRE (2):
r0 = 0x0000900c
r1 = 0x00000001,
r2 = 0x00000002
r3 = 0x00000003
◼ LDMDA r0!, {r1-r3}
◼ POST:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
Multiple-Register Transfer (Cont.)
 Example 11 (Cont.)
◼ The STMIB stores the values 7, 8, 9 to memory

◼ Then corrupt register r1 to r3 by MOV instruction

◼ Finally, the LDMDA


 Reloads the original values, and
 Restore the base pointer r0
Multiple-Register Transfer (Cont.)
 Example 12: the use of the load-store multiple
instructions with a block memory copy
;r9 points to start of source data
;r10 points to start of destination data
;r11 points to end of the source
loop
LDMIA r9!, {r0-r7} ;load 32 bytes from source and update r9
STMIA r10!, {r0-r7} ;store 32 bytes to desti. and update r10
CMP r9, r11 ;have we reached the end
BNE loop
Multiple-Register Transfer (Cont.)
High memory

r11
Source
r9

Copy memory
Location
(transfer 32 bytes in
two instructions)

Destination
r10

Low memory
Stack Operations
 ARM architecture uses the load-store multiple
instruction to carry out stack operations
◼ PUSH: use a store multiple instruction
◼ POP: use a load multiple instruction
 Stack
◼ Ascending (A): stack grows towards higher
memory addresses
◼ Descending (D): stack grows towards lower
memory addresses
Stack Operations (Cont.)
 Stack
◼ Full stack (F): stack pointer sp points to the last
valid item pushed onto the stack
◼ Empty stack (E): sp points after the last item on
the stack
 The free slot where the next data item will be placed
 There are a number of aliases available to
support stack operations
◼ See next page
Stack Operations (Cont.)
 ARM support all four forms of stacks
◼ Full ascending (FA): grows up; base register points to
the highest address containing a valid item
◼ Empty ascending (EA): grows up; base register points to
the first empty location
◼ Full descending (FD): grows down; base register points
to the lowest address containing a valid data
◼ Empty descending (ED): grows down; base register
points to the first empty location below the stack
Addressing Methods for Stack Operations
Addressing Description Pop =LDM Push =STM
mode

FA Full LDMFA LDMDA STMFA STMIB


ascending

FD Full LDMFD LDMIA STMFD STMDB


descending

EA Empty LDMEA LDMDB STMEA STMIA


ascending

ED Empty LDMED LDMIB STMED STMDA


descending
Stack Operations (Cont.)
 Example 13
◼ PRE:
 r1 = 0x00000002
 r4 = 0x00000003
 sp = 0x00080014
◼ STMFD sp!, {r1, r4}
◼ POST:
 r1 = 0x00000002
 r4 = 0x00000003
 sp = 0x0008000c
Stack Operations (Cont.)
 Example 13 (Cont.)
◼ STMFD – full stack push operation

PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
sp
0x80014 0x00000002 0x80014 0x00000002

0x80010 Empty 0x80010 0x00000003


sp 0x8000c 0x00000002
0x8000c Empty
Stack Operations (Cont.)
 Example 14
◼ PRE:
 r1 = 0x00000002
 r4 = 0x00000003
 sp = 0x00080010
◼ STMED sp!, {r1, r4}
◼ POST:
 r1 = 0x00000002
 r4 = 0x00000003
 sp = 0x00080008
Stack Operations (Cont.)
 Example 14 (Cont.)
◼ STMED – empty stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001

0x80014 0x00000002 0x80014 0x00000002


sp 0x80010 Empty 0x80010 0x00000003

0x8000c Empty 0x8000c 0x00000002


sp
0x80008 Empty 0x80008 Empty
SWAP Instruction
 A special case of a load-store instruction
◼ Swap the contents of memory with the contents
of a register
◼ An atomic operation
 Cannot not be interrupted by any other instruction or
any other buy access
 The system “holds the bus” until the transaction is
complete
 Useful when implementing semaphores and mutual
exclusion in an operating system
SWAP Instruction (Cont.)
 Syntax: SWP{B}{<cond>} Rd, Rm, [Rn]
◼ tmp = mem32[Rn]
◼ Mem32[Rn] = Rm
◼ Rd = tmp
 SWP: swap a word between memory and a
register
 SWPB: swap a byte between memory and a
register
SWAP Instruction (Cont.)
 Example 15
◼ PRE:
 Mem32[0x9000] = 0x12345678
 r0 = 0x00000000
 r1 = 0x11112222
 r2 = 0x00009000
◼ SWP r0, r1, [r2]
◼ POST:
 mem32[0x9000] = 0x11112222
 r0 = 0x12345678
 r1 = 0x11112222
 r2 = 0x00009000
SWAP Instruction (Cont.)
 Example 15 (Cont.)
SPIN
MOV r1, =semaphore
MOV r2, #1
SWP r3, r2, [r1] ;hold the bus until complete
CMP r3, #1
BEQ spin
 The address pointed by the semaphore either contains the
value of 1 or 0
 When semaphore value == 1 , loop until semaphore becomes
0 (updated by the holding process)
Software Interrupt Instruction
 SWI: software interrupt instruction
◼ Cause a software interrupt exception
◼ Provide a mechanism for applications to call
operating system routines
◼ Each SWI instruction has an associated SWI
number
 Used to represent a particular function call or routines
Software Interrupt Instruction (Cont.)

 Syntax: SWI{<cond>} SWI_number


◼ lr_svc = address of instruction following the SWI
◼ spsr_svc = cpsr
◼ pc = vector table + 0x8 ; jump to the swi
handling
◼ cpsr mode = SVC
◼ cpsr I = 1 (mask IRQ interrupt)
Software Interrupt Instruction (Cont.)
 Example 16
◼ PRE:
 cpsr = nzcVqift_USER
 pc = 0x00008000
 lr = r14 = 0x003fffff
◼ 0x00008000 SWI 0x123456
◼ POST:
 cpsr = nzcVqIft_SVC
 spsr = nzcVqift_USER
 pc = 0x00000008
 lr = 0x00008004
Program Status Register Instructions

 MRS
◼ Transfer the contents of either the cpsr or spsr
into a register

 MSR
◼ Transter the contents of a register into the cpsr or
spsr
Program Status Register Instructions
(Cont.)
 Syntax
◼ MRS{<cond>} Rd, <cpsr|spsr>
◼ MSR{<cond>} <cpsr|spsr>_<fields>, Rm
◼ MSR{<cond>} <cpsr|spsr>_<fields>, #immediate
 Field: any combination of
◼ Flags: [24:31]
◼ Status: [16:23]
◼ eXtension[8:15]
◼ Control[0:7]
PSR Registers
Program Status Register Instructions
(Cont.)
 Note: You cannot access the SPSR in User or
System Mode
◼ Assembler cannot warn you because it does not
know which mode will be executed in
Program Status Register Instructions
(Cont.)
 Example 17
◼ PRE:
 cpsr = nzcvqIFt_SVC
◼ MRS r1, cpsr
◼ BIC r1, r1, #0x80 ;0b10000000, clear bit 7
◼ MSR cpsr_c, r1 ;enable IRQ interrupts
◼ POST:
 cpsr = nzcvqiFt_SVC
◼ Note that, this example must be in SVC mode
 In user mode, you can only read all cpsr bits and can only update
the condition flag field f, i.e., cpsr[24:31]
Conditional Execution
 Almost all ARM instruction can include an
optional condition code
◼ Instruction is only executed if the condition code
flags in the CPSR meet the specified condition
◼ The default is AL, or always execute
 Conditional executions depends on two
components
◼ The condition field: located in the instruction
◼ The condition flags: located in the cpsr
Conditional Execution (Cont.)
 Example 18

ADDEQ r0, r1, r2


; r0 = r1 + r2 if zero flag is set
Condition Codes
Conditional Execution (Cont.)
 Thus, before activate conditional execution
◼ There must be an instruction that updates the
conditional code flag according the result
◼ If not specified, instructions will not update the
flags
 To make an instruction update the flags
◼ Include the S suffix
◼ Example: ADDS r0, r1,r2
Conditional Execution (Cont.)
 However, some instructions always update the flags
◼ Do not require the S suffix
◼ CMP, CMN, TST, TEQ
 Flags are preserved until updated
 Thus, you can execute an instruction conditionally,
based upon the flags set in another instruction, either:
◼ Immediately after the instruction which updated the flags
◼ After any number of intervening instructions that have not
updated the flags.
Conditional Execution (Cont.)
 Example 18
◼ Transfer the following code into the assembly
language
◼ Assume r1 = a, r2 = b
while ( a!= b )
{
if (a > b) a -= b; else b -= a;
}
Conditional Execution (Cont.)
 Example 18: Solution 1

gcd
CMP r1, r2
BEQ complete
BLT lessthan
SUB r1, r1, r2
B gcd
lessthan
SUB r2, r2, r1
B gcd
complete
Conditional Execution (Cont.)
 Example 18: Solution 2

gcd
CMP r1, r2
SUBGT r1, r1, r2
SUBLT r2, r2, r1
BNE gcd

 Solution 2 dramatically reduces the number of


instructions !!!
References
 Andrew N. Sloss, “ARM System Developer’s
Guide: Designing and Optimizing System
Software,” Morgan Kaufmann Publishers,
2004
◼ Chapter 3: Introduction to the ARM Instruction
Set
ARM7TDMI Microprocessor

Thumb Instruction Set

107 of 37
Processor Operating States

ARM state
which executes 32-bit, word-aligned ARM
instructions.

THUMB state
which operates with 16-bit, halfword-aligned
THUMB instructions.
108 of 37
Thumb Instruction Set

•ARM architecture versions v4T and above define a 16-bit


instruction set called the Thumb instruction set. The
functionality of the Thumb instruction set is a subset of the
functionality of the 32-bit ARM instruction set.

•A processor that is executing Thumb instructions is


operating in Thumb state. A processor that is executing ARM
instructions is operating in ARM state.
109 of 37
Thumb Instruction Set

•A processor in ARM state cannot execute Thumb


instructions, and a processor in Thumb state cannot
execute ARM instructions.

•Each instruction set includes instructions to change


processor state.

Note: ARM processors always start executing code in


ARM state.
110 of 37
Thumb Instruction Set

•Thumb does not provide direct access to the CPSR or any


SPSR.

•Thumb execution is flagged by the T bit(bit[5]) in the CPSR.

T==0 32-bit instructions are fetched(ARM instruction)


T==1 16-bit instructions are fetched(Thumb instruction)

111 of 37
Thumb applications

In a typical embedded system:


use ARM code in 32-bit on-chip memory for small speed- critical routines
use Thumb code in 16-bit off-chip memory for large non-critical control routines

Note:
Switching between ARM and Thumb States of Execution Using BX
Instruction 112 of 37
Thumb applications

For Most Instruction Generated by the Compiler


Condition Execution is not used.
Source and Destination Registers are identical
Only low registers used
Constants are limited size
Inline barrel shifter not used

113 of 37
DATA TYPES

Byte (8-bit):
placed on any byte boundary.
Half-word (16-bit):
aligned to two-byte boundaries.
Word (32-bit):
aligned to four- byte boundaries.

114 of 37
Features
•Not a complete architecture
•Dynamically decompressed to ARM Instruction
•Fully supported by ARM development tools
•Both entry and exit are done using corresponding BX
Instruction
•Increases the maximum clock rate to 40 MHz
•Expanded Cache to 8 kB
•Thumb is a combination of new instruction set with16 bit long
instruction format & Hardware logic unit is present.
•Translated thumb instruction to regular
•Thumb improves ARM instruction density by about 25% to
115 of 37
35%
•16 bit wide memory
Thumb State Philosophy
The Thumb instruction set(16 bit) addresses the issue of code density.
It may be viewed as a compressed form of a subset of the ARM instruction set

Thumb instructions map onto ARM instructions

The Thumb programmer’s model map onto the ARM programmer’s model

Implementations of Thumb use dynamic decompression in an ARM instruction


pipeline & then instructions execute as standard ARM instructions within the
processor
Thumb is not a complete architecture; it is not anticipated that a processor would
execute Thumb instructions without supporting the ARM instruction set.
Therefore Thumb instruction set need to only support common application functions.

Exceptions will not be handled in THUMB state 116 of 37


117 of 37
118 of 37
119 of 37
Thumb-ARM Decompression

•Translation from 16-bit Thumb instruction to 32-bit


ARM instruction
•Condition bits changed to ‘always’
•Lookup to translate major and minor opcodes
•Zero extending 3-bit register specifiers to give 4-bit
specifiers
•Zero extending immediate values
•Implicit ‘S’(affecting condition codes) should be
explicitly specified.
•Thumb 2-address format must be mapped to ARM
3-address format 120 of 37
THUMB-ARM Instruction Mapping

121 of 37
❖ So where performance is all important, a system should use 32 bit memory and run
ARM code
❖ Where Power consumption and cost are more important , a 16 bit memory system
and THUMB code may be a better choice
122 of 37
Mode Switching

•Default entry to exception mode is always ARM


•Explicit entry to Thumb is done using ARM mode BX
Instruction
•Explicit entry back to ARM mode is done using Thumb
mode BX Instruction

123 of 37
124 of 37
125 of 37
Thumb Programmers Model

•Registers r0 to r7 are accessible (Lo)


•Few instructions require r8 to r15 to be specified
•r13 is used as the stack pointer
•r14 is used as the link register
•r15 is used as the program counter

126 of 37
127 of 37
128 of 37
129 of 37
130 of 37
131 of 37
THUMB Programmer’s Model

132 of 37
THUMB Register Organisation
Thumb General registers and Program Counter

User / System FIQ Supervisor Abort IRQ Undefined


r0 r0 r0 r0 r0 r0
r1 r1 r1 r1 r1 r1
r2 r2 r2 r2 r2 r2
r3 r3 r3 r3 r3 r3
r4 r4 r4 r4 r4 r4
r5 r5 r5 r5 r5 r5
r6 r6 r6 r6 r6 r6
r7 r7 r7 r7 r7 r7
SP SP_FIQ SP_SVC SP_ABT SP_IRQ SP_UND

LR LR_ FIQ LR_ SVC LR_ ABT LR_ IRQ LR_ UND

PC PC_ FIQ PC_ SVC PC_ ABT PC_ IRQ PC_ UND

Thumb Program Status Registers

CPSR CPSR CPSR CPSR CPSR CPSR


sprsr_fiq
SPSR_FIQ SPSR_SVC SPSR_ABT sprsr_fiq
SPSR_IRQ SPSR _UND
sprsr_fiq
133 of 37
ARM-Thumb Similarities
•Load-store architecture
•Support 8-bit byte, 16-bit half-word and 32 bit word
data types with aligned boundaries
•32 bit unsegmented memory.
•However , in order to achieve a 16 bit instruction
length a number of characteristic features of the ARM
instruction set have not been supported in Thumb state

134 of 37
ARM-Thumb differences
•Unconditional Execution of instruction except branch instructions
Where all ARM instructions are executed conditionally

•2-address format for data processing


ARM data processing instructions uses 3 address format
(Except 64 bit MUL instructions)
•Thumb instruction are Less regular instruction formats than ARM, as
a result of the dense encoding
•There are NO status register access instructions(MRS/MSR) in
Thumb state
•Many addressing modes of ARM not supported in Thumb state
135 of 37
•No banked registers and privileged modes in Thumb state
ARM-Thumb differences

The biggest register difference involves the SP register


The Thumb state has unique mnemonics (PUSH, POP) that
don’t exist in ARM state
These instructions assume the existence of a stack pointer,
for which R13 is used
They translate into load and store instructions internally
No SWP instructions in Thumb state
No support for coprocessor instructions in Thumb state
Barrel shifter operations are separate instructions

136 of 37
Thumb exception

•With exception processor is returned to ARM mode.

•While returning previous mode is restored as SPSR is


transferred to CPSR

•Use of the Thumb instruction set can improve code


density , Power efficiency, Save cost and Enhance
performance all at one
137 of 37
Thumb Branching

•Short conditional branches

•Medium range unconditional branches

•Long range Subroutine calls

•Branch to change to ARM Mode

138 of 37
Branch Instruction Formats

B <cond> <label>
15 14 13 12 11 8 7 0

1 1 0 1 Condition 8-Bit Offset

B <label>
1 1 1 0 0 11 – Bit Offset

BL <label>
1 1 1 H 11 – Bit Offset

BX Rm
0 1 0 0 0 1 1 1 0 H Rm 0 0 0
139 of 37
THUMB Branch Instructions

140 of 37
Features
•Different format for each case
•Offset is reduced to 11bit and 8 bit
•Offset is shifted left by 1-bit (to give half-word
alignment) and sign-extended to 32 bits.
•BL is more subtle to give 22-bit offset using link register
for temporary storage
•No direct mapping to ARM instructions as Thumb
require half-word aligned offsets.

141 of 37
BL Instruction

To allow for a reasonably large offset to the target


subroutine each of these two instructions is
automatically translated by the assembler into a
sequence of two 16 bit thumb instructions
1. H = 10
LR := PC + (sign-extended offset shifted left 12 places);

2. H = 11
PC := LR + (offset shifted left 1 place)

3. LR := address of next instruction 142 of 37


Software Interrupt Instruction

1 1 0 1 1 1 1 1 8 – Bit Immediate

•Address of next instruction is saved in r14_svc


•CPSR is saved in r14_svc
•Disables IRQ, Clears T bit, Enters Supervisor mode
•PC is forced to 0x08
•8 bit immediate is zero extended to fill the 24-bit
field in the ARM instruction.
•Limits SWIs to first 256 of 16 million ARM SWIs.
143 of 37
Data Processing

•Fairly complex instruction formats


•No conditional execution
•Separate shift operations provided, no shifting of
second operand
•All data processing instruction set condition code bits
(no need of ‘S’)

144 of 37
THUMB data processing instructions

145 of 37
THUMB data processing instructions

146 of 37
Instruction formats

•<op> Rd, Rn, Rm


•<op> Rd, Rn, # <imm3>
•<op> Rd|Rn, Rm|Rs
•<op> Rd, Rn, #<sh 5>
•<op> Rd, #<imm 8>

147 of 37
Instructions

•MOV Rd, #<imm8>


•MVN Rd, Rm
•CMP Rn, #<imm8>
•CMP Rn, Rm
•CMN Rn, Rm
•TST Rn, Rm

148 of 37
Instruction

•ADD Rd, Rn, #<imm3>


•ADD Rd, #< imm8>
•ADD Rd, Rn, Rm
•ADC Rd, Rm
•SUB Rd, Rn, #<imm3>
•SUB Rd, #< imm8>
•SUB Rd, Rn, Rm
•SBC Rd, Rm
•NEG Rd, Rn
149 of 37
Instruction

•LSL Rd, Rm, #<#sh>


•LSL Rd, Rs
•LSR Rd, Rm, #<#sh>
•LSR Rd, Rs
•ASR Rd, Rm, #<#sh>
•ASR Rd, Rs
•ROR Rd, Rs

150 of 37
Instruction

•AND Rd, Rm
•EOR Rd, Rm
•ORR Rd, Rm
•BIC Rd, Rm
•MUL Rd, Rm

151 of 37
Instruction (using Hi registers)

•ADD Rd, Rm (1 or 2 Hi registers)


•CMP Rn, Rm (1 or 2 Hi registers)
•MOV Rd, Rm (1 or 2 Hi registers)
•ADD Rd, PC, #<imm8>
•ADD Rd, SP, #<imm8>
•ADD SP, SP, #<imm7>
•SUB SP, SP, #<imm7>
Except others donot set condition code bits

152 of 37
THUMB Single register data transfer

153 of 37
Data Transfer Instruction

•LDR|STR Rd, [Rn, #off5]


•LDR|STR Rd, [Rn, Rm]
•LDRB|STRB Rd, [Rn, #off5]
•LDRB|STRB Rd, [Rn, Rm]
•LDRH|STRH Rd, [Rn, #off5]
•LDRH|STRH Rd, [Rn, Rm]
Signed operands:
•LDR|STR {S} {H|B} Rd, [Rn, Rm]

154 of 37
THUMB Multiple register data transfer

155 of 37
Multiple register transfers
•LDMIA|STMIA Rn!, { <reg list> }

•Rn may be any register among Ro – R7

•Register set can be any subset of R0 – R7 but not


base register ‘Rn’

•Write back to base register is always selected.

156 of 37
Stack Mode

•POP|PUSH { <reg list> {, R}}

•R13 (sp) is used as base register

•Uses Full Descending Stack

•In addition any subset of Ro-R7 registers LR


(lr) may be included in PUSH instruction and
PC (pc)may be included in POP instruction
157 of 37
Properties

•Thumb code requires 70% of space of ARM code


•Thumb code uses 40% more instructions than the ARM
code
•With 32-bit memory ARM code is 40% faster
•With 16-bit memory Thumb code is 45% faster than ARM
code
•Thumb code uses 30% less external memory power than
ARM code.

158 of 37
Thumb Applications

159 of 37

You might also like