Module 3
ARM Instruction Set
ARM Instruction Sets
Data Processing Instructions
Branch Instructions
Load-store instructions
Software interrupt instructions
Program status register instructions
Conditional Execution
Data Processing Instructions
Manipulate data within registers
Data processing instructions
◼ Move instructions
◼ Arithmetic instructions
◼ Logical instructions
◼ Comparison instructions
◼ Multiply instructions
Data processing
❑They are move, arithmetic, logical, comparison and
multiply instructions.
❑Most data processing instructions can process one of
their operands using the barrel shifter.
• General rules:
– All operands are 32-bit, coming
from registers or literals.
– The result, if any, is 32-bit and
placed in a register (with the
exception for long multiply which
produces a 64-bit result)
– 3-address format
Data Processing Instruction
5
ARM Instruction Set Format
Move Instruction
Syntax: <instruction> {<cond>} {S}Rd, N
◼ N: a register or immediate value
MOV : move
◼ MOV r0, r1; r0 = r1
◼ MOV r0, #5; r0 = 5
MVN : move (negated)
◼ MVN r0, r1; r0 = NOT(r1)=~ (r1)
Preprocessed by Shifter
Example 1
◼ PRE: r5 = 5, r7 = 8;
◼ MOV r7, r5, LSL #2; r7 = r5 << 2 = r5*4
◼ POST: r5 = 5, r7 = 20
Preprocessed by Shifter
LSL: logical shift left
◼ x << y, the least significant bits are filled with zeroes
LSR: logical shift right:
◼ (unsigned) x >> y, the most significant bits are filled with zeroes
ASR: arithmetic shift right
◼ (signed) x >> y, copy the sign bit to the most significant bit
ROR: rotate right
◼ ((unsigned) x >> y) | (x << (32-y))
RRX: rotate right extended
◼ c flag <<31 | (( unsigned) x >> 1)
◼ Performs 33-bit rotate, with the CPSR’s C bit being inserted above
sign bit of the word
Shift Register Operands
– ADD r1,r2,r3,LSL #3 ;r=
– r1= r2 + (r3 << 3) 31 0 31 0
– A single instruction executed in a
single cycle
00000 00000
❑ LSL: Logical Shift Left by 0 to LSL #5 LSR #5
31 places, 0 filled at the lsb 31 0 31 0
end 0 1
❑ LSR, ASL (Arithmetic Shift
00000 0 11111 1
Left), ASR, ROR (Rotate
ASR #5 , positive operand ASR #5 , negative operand
Right), RRX (Rotate Right
31 0 31 0
etended by 1 bit) C
– ADD r5,r5,r3,LSL r2 ;
r5:=r5+r3*2r2 C C
– MOV r12,r4,ROR r3 ROR #5 RRX
;r12:=r4 rotated right
by value of r3
10
Preprocessed by Shifter (Cont.)
Example 2
◼ PRE: r0 = 0x00000000, r1 = 0x80000004
◼ MOV r0, r1, LSL #1 ; r0 = r1 *2
◼ POST r0 = 0x00000008, r1 = 0x80000004
Arithmetic Instructions
Syntax: <instruction> {<cond>} {S}Rd, Rn, N
◼ N: a register or immediate value
ADD : add
◼ ADD r0, r1, r2; r0 = r1 + r2
ADC : add with carry
◼ ADC r0, r1, r2; r0 = r1 + r2 + C
SUB : subtract
◼ SUB r0, r1, r2; r0 = r1 - r2
SBC : subtract with carry
◼ SUC r0, r1, r2; r0 = r1 - r2 + C -1
Arithmetic Instructions (Cont.)
RSB : reverse subtract
◼ RSB r0, r1, r2; r0 = r2 – r1
RSC : reverse subtract with carry
◼ RSC r0, r1, r2; r0 = r2 – r1 + C -1
MUL : multiply
◼ MUL r0, r1, r2; r0 = r1 x r2
MLA : multiply and accumulate
◼ MLA r0, r1, r2, r3; r0 = r1 x r2 + r3
Logical Operations
Syntax: <instruction> {<cond>} {S}Rd, RN, N
◼ N: a register or immediate value
AND : Bit-wise and
ORR : Bit-wise or
EOR : Bit-wise exclusive-or
BIC : bit clear
◼ BIC r0, r1, r2; r0 = r1 & Not(r2)
Logical Operations (Cont)
Example 3:
◼ PRE: r1 = 0b1111, r2 = 0b0101
◼ BIC r0, r1, r2 ; r0 = r1 AND (NOT(r2))
◼ POST: r0=0b1010
Comparison Instructions
Compare or test a register with a 32-bit value
◼ Do not modify the registers being compared or
tested
◼ But only set the values of the NZCV bits of the
CPSR register
Do not need to apply to S suffix for comparison
instruction to update the flags in CPSR register
Comparison Instructions (Cont.)
Syntax: <instruction> {<cond>} {S}Rd, N
◼ N: a register or immediate value
CMP : compare
◼ CMP r0, r1; compute (r0 - r1)and set NZCV
CMN : negated compare
◼ CMP r0, r1; compute (r0 + r1)and set NZCV
TST : bit-wise AND test
◼ TST r0, r1; compute (r0 AND r1)and set NZCV
TEQ : bit-wise exclusive-or test
◼ TEQ r0, r1; compute (r0 EOR r1)and set NZCV
Comparison Instructions (Cont.)
Example 4
◼ PRE: CPSR = nzcvqiFt_USER, r0 = 4, r9 = 4
◼ CMP r0, r9
◼ POST: CPSR = nZcvqiFt_USER
Multiply Instruction
Syntax:
◼ MLA{<cond>} {S} Rd, Rm, Rs, Rn
◼ MUL{<cond>} {S} Rd, Rm, Rs
MUL : multiply
◼ MUL r0, r1, r2; r0 = r1*r2
MLA : multiply and accumulate
◼ MLA r0, r1, r2, r3; r0 = (r1*r2) + r3
Multiply Instruction (Cont.)
Syntax: <instruction>{<cond>} {S}RdLo, RdHi, Rm, Rs
◼ Multiply onto a pair of register representing a 64-bit value
UMULL : unsigned multiply long
◼ UMULL r0, r1, r2, r3; [r1,r0] = r2*r3
UMLAL : unsigned multiply accumulate long
◼ UMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
SMULL: signed multiply long
◼ SMULL r0, r1, r2, r3; [r1,r0] = r2*r3
SMLAL : signed multiply accumulate long
◼ SMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
Branch Instructions
Branch instruction
◼ Change the flow of execution
◼ Used to call a routine
Allow applications to
◼ Have subroutines
◼ Implement if-then-else structure
◼ Implement loop structure
Branch Instructions (Cont.)
Syntax
◼ B{<cond>} lable
◼ BL{<cond>} lable
B : branch
◼ B label; pc (program counter) = label
◼ Used to change execution flow
BL : branch and link
◼ BL label; pc = label, lr = address of the next
address after the BL
◼ Similar to the B instruction but can be used for subroutine
call
Overwrite the link register (lr) with a return address
Branch Instructions (Cont.)
Example 5
B forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
Forward
SUB r1, r2, #4
Backward
SUB r1, r2, #4
B backward
Branch Instructions (Cont.)
Example 6:
BL subroutine
CMP r1, #5
MOVEQ r1, #0
…
subroutine
<subroutine code>
MOV pc, lr ; return by moving pc = lr
Load-Store Instructions
Transfer data between memory and processor
registers
Three types
◼ Single-register transfer
◼ Multiple-register transfer
◼ Swap
Simple-Register Transfer
Moving a single data item in and out of
register
Data item can be
◼ A word (32-bits)
◼ Halfword (16-bits)
◼ Bytes (8-bits)
Simple-Register Transfer (Cont.)
Syntax
◼ <LDR|STR>{<cond>}{B} Rd, addressing1
◼ LDR{<cond>}SB|H|SH Rd, addressing2
◼ STR{<cond>} H Rd, addressing2
LDR : load word into a register from memory
LDRB : load byte
LDRSB : load signed byte
LDRH : load half-word
LSRSH : load signed halfword
STR: store word from a register to memory
STRB : store byte
STRH : store half-word
Simple-Register Transfer (Cont.)
Example 7
LDR r0, [r1] ;= LDR r0, [r1, #0]
;r0 = mem32[r1]
STR r0, [r1] ;= STR r0, [r1, #0]
;mem32[r1]= r0
◼ Register r1 is called the base address register
Single-Register Load-Store Addressing Mode
Index method, also called Base-Plus-Offset
Addressing
◼ Base register
r0 – r15
◼ Offset, add or subtract an unsigned number
Immediate
Register (not PC)
Scaled register
Single-Register Load-Store Addressing
Mode (Cont.)
Preindex:
◼ data: mem[base+offset]
◼ Base address register: not updated
◼ Ex: LDR r0,[r1,#4] ; r0:=mem32[r1+4]
Postindex:
◼ data: mem[base]
◼ Base address register: base + offset
◼ Ex: LDR r0,[r1],#4 ; r0:=mem32[r1], then r1:=r1+4
Preindex with writeback (also called auto-indexing)
◼ Data: mem[base+offset]
◼ Base address register: base + offset
◼ Ex: LDR r0, [r1,#4]! ; r0:=mem32[r1+4], then r1:=r1+4
Single-Register Load-Store Addressing
Mode (Cont.)
Example 8
◼ r0 = 0x00000000, r1 = 0x00009000,
mem32[0x00009000] = 0x01010101,
mem32[0x00009004] = 0x02020202
◼ Preindexing: LDR r0, [r1, #4]
r0 = 0x02020202, r1=0x00009000
◼ Postindexing: LDR r0, [r1], #4
r0 = 0x01010101, r1=0x00009004
◼ Preindexing with writeback: LDR r0, [r1, #4]!
R0 = 0x02020202, r1=0x00009004
Single-Register Load-Store Addressing
Mode (Cont.)
Addressing mode and index method Addressing syntax
Preindex with immediate offset [Rn, #+/-offset_12]
Preindex with register offset [Rn, +/-Rm]
Preindex with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Preindex writeback with immediate offset [Rn, #+/-offset_12]!
Preindex writeback with register offset [Rn, +/-Rm]!
Preindex writeback with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Immediate postindexed [Rn], #+/-offset_12]
Register postindexed [Rn], +/-Rm!
Scaled register postindexed [Rn], +/-Rm, shift #shift_imm
Examples of LDR Using Different
Addressing Modes
Instruction r0= r1+=
Preindex with LDR r0, [r1, #0x4]! mem32[r1+0x4] 0x4
writeback
LDR r0, [r1,r2]! mem32[r1+r2] r2
LDR r0,[r1, r2, LSR#0x4]! mem32[r1+(r2 LSR 0x4)] (r2 LSR 0x4)
Preindex LDR r0, [r1, #0x4] mem32[r1+0x4] not updated
LDR r0, [r1, r2] mem32[r1+r2] not updated
LDR r0, [r1, -r2, LSR #0x4] Mem32[r1-(r2 LSR 0x4)] not updated
Postindex LDR r0, [r1], #0x4 mem32[r1] 0x4
LDR r0, [r1], r2 Mem32[r1] r2
LDR r0, [r1], r2 LSR #0x4 mem32[r1] (r2 LSR 0x4)
Multiple-Register Transfer
Transfer multiple registers between memory
and the processor in a single instruction
More efficient than single-register transfer
◼ Moving blocks of data around memory
◼ Saving and restoring context and stack
Multiple-Register Transfer (Cont.)
Load-store multiple instruction can increase interrupt
latency
◼ Interrupt can be occurred after an instruction has been
completed
◼ Each load multiple instruction takes 2 + N*t cycles
N: the number of registers to load
t: the number of cycles required for sequential access to memory
◼ Compilers provides a switch to control the maximum
number of registers between transferred
Limit the maximum interrupt latency
Multiple-Register Transfer (Cont.)
Syntax:
◼ <LDM|STM>{<cond>} <mode> Rn{!}, <registers>{^}
◼ Address mode: See the next page
◼ ^: optional
Can not be used in User Mode and System Mode
If op is LDM and reglist contains the pc (r15)
◼ SPSR is also copied into the CPSR.
Otherwise, data is transferred into or out of the User mode
registers instead of the current mode registers.
Multiple-Register Transfer (Cont.)
Example 9
◼ PRE:
mem32[0x80018] = 0x03,
mem32[0x80014] = 0x02,
mem32[0x80010] = 0x01,
r0 = 0x00080010,
r1 = r2 = r3= 0x00000000
◼ LDMIA r0!, {r1-r3}, or LDMIA r0!, {r1, r2, r3}
Register can be explicitly listed or use the “-” character
Pre-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000000
0x80014 0x00000002 R2=0x00000000
R0 = 0x80010 0x80010 0x00000001 R1=0x00000000
0x8000c 0x00000000
Figure 1
Post-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000003
0x80014 0x00000002 R2=0x00000002
0x80010 0x00000001 R1=0x00000001
0x8000c 0x00000000
Figure 2
Multiple-Register Transfer (Cont.)
Example 9 (Cont.)
◼ POST:
r0 = 0x0008001c,
r1 = 0x00000001,
r2 = 0x00000002,
r3 = 0x00000003
Multiple-Register Transfer (Cont.)
Example 10
◼ PRE: as shown in Fig. 1
◼ LDMIB r0!, {r1-r3}
◼ POST:
r0 = 0x0008001c
r1 = 0x00000004
r2 = 0x00000003
r3 = 0x00000002
Post-Condition for LDMIB Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004 R3=0x00000004
0x80018 0x00000003 R2=0x00000003
0x80014 0x00000002 R1=0x00000002
0x80010 0x00000001
0x8000c 0x00000000
Figure 3
Multiple-Register Transfer (Cont.)
Load-store multiple pairs when base update used (!)
◼ Useful for saving a group of registers and store them later
Store multiple Load multiple
STMIA LDMDB
STMIB LDMDA
STMDA LDMIB
STMDB LDMIA
Multiple-Register Transfer (Cont.)
Example 11
◼ PRE:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
◼ STMIB r0!, {r1-r3}
MOV r1, #1
MOV r2, #2,
MOV r3, #3
Multiple-Register Transfer (Cont.)
Example 11 (Cont.)
◼ PRE (2):
r0 = 0x0000900c
r1 = 0x00000001,
r2 = 0x00000002
r3 = 0x00000003
◼ LDMDA r0!, {r1-r3}
◼ POST:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
Multiple-Register Transfer (Cont.)
Example 11 (Cont.)
◼ The STMIB stores the values 7, 8, 9 to memory
◼ Then corrupt register r1 to r3 by MOV instruction
◼ Finally, the LDMDA
Reloads the original values, and
Restore the base pointer r0
Multiple-Register Transfer (Cont.)
Example 12: the use of the load-store multiple
instructions with a block memory copy
;r9 points to start of source data
;r10 points to start of destination data
;r11 points to end of the source
loop
LDMIA r9!, {r0-r7} ;load 32 bytes from source and update r9
STMIA r10!, {r0-r7} ;store 32 bytes to desti. and update r10
CMP r9, r11 ;have we reached the end
BNE loop
Multiple-Register Transfer (Cont.)
High memory
r11
Source
r9
Copy memory
Location
(transfer 32 bytes in
two instructions)
Destination
r10
Low memory
Stack Operations
ARM architecture uses the load-store multiple
instruction to carry out stack operations
◼ PUSH: use a store multiple instruction
◼ POP: use a load multiple instruction
Stack
◼ Ascending (A): stack grows towards higher
memory addresses
◼ Descending (D): stack grows towards lower
memory addresses
Stack Operations (Cont.)
Stack
◼ Full stack (F): stack pointer sp points to the last
valid item pushed onto the stack
◼ Empty stack (E): sp points after the last item on
the stack
The free slot where the next data item will be placed
There are a number of aliases available to
support stack operations
◼ See next page
Stack Operations (Cont.)
ARM support all four forms of stacks
◼ Full ascending (FA): grows up; base register points to
the highest address containing a valid item
◼ Empty ascending (EA): grows up; base register points to
the first empty location
◼ Full descending (FD): grows down; base register points
to the lowest address containing a valid data
◼ Empty descending (ED): grows down; base register
points to the first empty location below the stack
Addressing Methods for Stack Operations
Addressing Description Pop =LDM Push =STM
mode
FA Full LDMFA LDMDA STMFA STMIB
ascending
FD Full LDMFD LDMIA STMFD STMDB
descending
EA Empty LDMEA LDMDB STMEA STMIA
ascending
ED Empty LDMED LDMIB STMED STMDA
descending
Stack Operations (Cont.)
Example 13
◼ PRE:
r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080014
◼ STMFD sp!, {r1, r4}
◼ POST:
r1 = 0x00000002
r4 = 0x00000003
sp = 0x0008000c
Stack Operations (Cont.)
Example 13 (Cont.)
◼ STMFD – full stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
sp
0x80014 0x00000002 0x80014 0x00000002
0x80010 Empty 0x80010 0x00000003
sp 0x8000c 0x00000002
0x8000c Empty
Stack Operations (Cont.)
Example 14
◼ PRE:
r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080010
◼ STMED sp!, {r1, r4}
◼ POST:
r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080008
Stack Operations (Cont.)
Example 14 (Cont.)
◼ STMED – empty stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
0x80014 0x00000002 0x80014 0x00000002
sp 0x80010 Empty 0x80010 0x00000003
0x8000c Empty 0x8000c 0x00000002
sp
0x80008 Empty 0x80008 Empty
SWAP Instruction
A special case of a load-store instruction
◼ Swap the contents of memory with the contents
of a register
◼ An atomic operation
Cannot not be interrupted by any other instruction or
any other buy access
The system “holds the bus” until the transaction is
complete
Useful when implementing semaphores and mutual
exclusion in an operating system
SWAP Instruction (Cont.)
Syntax: SWP{B}{<cond>} Rd, Rm, [Rn]
◼ tmp = mem32[Rn]
◼ Mem32[Rn] = Rm
◼ Rd = tmp
SWP: swap a word between memory and a
register
SWPB: swap a byte between memory and a
register
SWAP Instruction (Cont.)
Example 15
◼ PRE:
Mem32[0x9000] = 0x12345678
r0 = 0x00000000
r1 = 0x11112222
r2 = 0x00009000
◼ SWP r0, r1, [r2]
◼ POST:
mem32[0x9000] = 0x11112222
r0 = 0x12345678
r1 = 0x11112222
r2 = 0x00009000
SWAP Instruction (Cont.)
Example 15 (Cont.)
SPIN
MOV r1, =semaphore
MOV r2, #1
SWP r3, r2, [r1] ;hold the bus until complete
CMP r3, #1
BEQ spin
The address pointed by the semaphore either contains the
value of 1 or 0
When semaphore value == 1 , loop until semaphore becomes
0 (updated by the holding process)
Software Interrupt Instruction
SWI: software interrupt instruction
◼ Cause a software interrupt exception
◼ Provide a mechanism for applications to call
operating system routines
◼ Each SWI instruction has an associated SWI
number
Used to represent a particular function call or routines
Software Interrupt Instruction (Cont.)
Syntax: SWI{<cond>} SWI_number
◼ lr_svc = address of instruction following the SWI
◼ spsr_svc = cpsr
◼ pc = vector table + 0x8 ; jump to the swi
handling
◼ cpsr mode = SVC
◼ cpsr I = 1 (mask IRQ interrupt)
Software Interrupt Instruction (Cont.)
Example 16
◼ PRE:
cpsr = nzcVqift_USER
pc = 0x00008000
lr = r14 = 0x003fffff
◼ 0x00008000 SWI 0x123456
◼ POST:
cpsr = nzcVqIft_SVC
spsr = nzcVqift_USER
pc = 0x00000008
lr = 0x00008004
Program Status Register Instructions
MRS
◼ Transfer the contents of either the cpsr or spsr
into a register
MSR
◼ Transter the contents of a register into the cpsr or
spsr
Program Status Register Instructions
(Cont.)
Syntax
◼ MRS{<cond>} Rd, <cpsr|spsr>
◼ MSR{<cond>} <cpsr|spsr>_<fields>, Rm
◼ MSR{<cond>} <cpsr|spsr>_<fields>, #immediate
Field: any combination of
◼ Flags: [24:31]
◼ Status: [16:23]
◼ eXtension[8:15]
◼ Control[0:7]
PSR Registers
Program Status Register Instructions
(Cont.)
Note: You cannot access the SPSR in User or
System Mode
◼ Assembler cannot warn you because it does not
know which mode will be executed in
Program Status Register Instructions
(Cont.)
Example 17
◼ PRE:
cpsr = nzcvqIFt_SVC
◼ MRS r1, cpsr
◼ BIC r1, r1, #0x80 ;0b10000000, clear bit 7
◼ MSR cpsr_c, r1 ;enable IRQ interrupts
◼ POST:
cpsr = nzcvqiFt_SVC
◼ Note that, this example must be in SVC mode
In user mode, you can only read all cpsr bits and can only update
the condition flag field f, i.e., cpsr[24:31]
Conditional Execution
Almost all ARM instruction can include an
optional condition code
◼ Instruction is only executed if the condition code
flags in the CPSR meet the specified condition
◼ The default is AL, or always execute
Conditional executions depends on two
components
◼ The condition field: located in the instruction
◼ The condition flags: located in the cpsr
Conditional Execution (Cont.)
Example 18
ADDEQ r0, r1, r2
; r0 = r1 + r2 if zero flag is set
Condition Codes
Conditional Execution (Cont.)
Thus, before activate conditional execution
◼ There must be an instruction that updates the
conditional code flag according the result
◼ If not specified, instructions will not update the
flags
To make an instruction update the flags
◼ Include the S suffix
◼ Example: ADDS r0, r1,r2
Conditional Execution (Cont.)
However, some instructions always update the flags
◼ Do not require the S suffix
◼ CMP, CMN, TST, TEQ
Flags are preserved until updated
Thus, you can execute an instruction conditionally,
based upon the flags set in another instruction, either:
◼ Immediately after the instruction which updated the flags
◼ After any number of intervening instructions that have not
updated the flags.
Conditional Execution (Cont.)
Example 18
◼ Transfer the following code into the assembly
language
◼ Assume r1 = a, r2 = b
while ( a!= b )
{
if (a > b) a -= b; else b -= a;
}
Conditional Execution (Cont.)
Example 18: Solution 1
gcd
CMP r1, r2
BEQ complete
BLT lessthan
SUB r1, r1, r2
B gcd
lessthan
SUB r2, r2, r1
B gcd
complete
Conditional Execution (Cont.)
Example 18: Solution 2
gcd
CMP r1, r2
SUBGT r1, r1, r2
SUBLT r2, r2, r1
BNE gcd
Solution 2 dramatically reduces the number of
instructions !!!
References
Andrew N. Sloss, “ARM System Developer’s
Guide: Designing and Optimizing System
Software,” Morgan Kaufmann Publishers,
2004
◼ Chapter 3: Introduction to the ARM Instruction
Set
ARM7TDMI Microprocessor
Thumb Instruction Set
107 of 37
Processor Operating States
ARM state
which executes 32-bit, word-aligned ARM
instructions.
THUMB state
which operates with 16-bit, halfword-aligned
THUMB instructions.
108 of 37
Thumb Instruction Set
•ARM architecture versions v4T and above define a 16-bit
instruction set called the Thumb instruction set. The
functionality of the Thumb instruction set is a subset of the
functionality of the 32-bit ARM instruction set.
•A processor that is executing Thumb instructions is
operating in Thumb state. A processor that is executing ARM
instructions is operating in ARM state.
109 of 37
Thumb Instruction Set
•A processor in ARM state cannot execute Thumb
instructions, and a processor in Thumb state cannot
execute ARM instructions.
•Each instruction set includes instructions to change
processor state.
Note: ARM processors always start executing code in
ARM state.
110 of 37
Thumb Instruction Set
•Thumb does not provide direct access to the CPSR or any
SPSR.
•Thumb execution is flagged by the T bit(bit[5]) in the CPSR.
T==0 32-bit instructions are fetched(ARM instruction)
T==1 16-bit instructions are fetched(Thumb instruction)
111 of 37
Thumb applications
In a typical embedded system:
use ARM code in 32-bit on-chip memory for small speed- critical routines
use Thumb code in 16-bit off-chip memory for large non-critical control routines
Note:
Switching between ARM and Thumb States of Execution Using BX
Instruction 112 of 37
Thumb applications
For Most Instruction Generated by the Compiler
Condition Execution is not used.
Source and Destination Registers are identical
Only low registers used
Constants are limited size
Inline barrel shifter not used
113 of 37
DATA TYPES
Byte (8-bit):
placed on any byte boundary.
Half-word (16-bit):
aligned to two-byte boundaries.
Word (32-bit):
aligned to four- byte boundaries.
114 of 37
Features
•Not a complete architecture
•Dynamically decompressed to ARM Instruction
•Fully supported by ARM development tools
•Both entry and exit are done using corresponding BX
Instruction
•Increases the maximum clock rate to 40 MHz
•Expanded Cache to 8 kB
•Thumb is a combination of new instruction set with16 bit long
instruction format & Hardware logic unit is present.
•Translated thumb instruction to regular
•Thumb improves ARM instruction density by about 25% to
115 of 37
35%
•16 bit wide memory
Thumb State Philosophy
The Thumb instruction set(16 bit) addresses the issue of code density.
It may be viewed as a compressed form of a subset of the ARM instruction set
Thumb instructions map onto ARM instructions
The Thumb programmer’s model map onto the ARM programmer’s model
Implementations of Thumb use dynamic decompression in an ARM instruction
pipeline & then instructions execute as standard ARM instructions within the
processor
Thumb is not a complete architecture; it is not anticipated that a processor would
execute Thumb instructions without supporting the ARM instruction set.
Therefore Thumb instruction set need to only support common application functions.
Exceptions will not be handled in THUMB state 116 of 37
117 of 37
118 of 37
119 of 37
Thumb-ARM Decompression
•Translation from 16-bit Thumb instruction to 32-bit
ARM instruction
•Condition bits changed to ‘always’
•Lookup to translate major and minor opcodes
•Zero extending 3-bit register specifiers to give 4-bit
specifiers
•Zero extending immediate values
•Implicit ‘S’(affecting condition codes) should be
explicitly specified.
•Thumb 2-address format must be mapped to ARM
3-address format 120 of 37
THUMB-ARM Instruction Mapping
121 of 37
❖ So where performance is all important, a system should use 32 bit memory and run
ARM code
❖ Where Power consumption and cost are more important , a 16 bit memory system
and THUMB code may be a better choice
122 of 37
Mode Switching
•Default entry to exception mode is always ARM
•Explicit entry to Thumb is done using ARM mode BX
Instruction
•Explicit entry back to ARM mode is done using Thumb
mode BX Instruction
123 of 37
124 of 37
125 of 37
Thumb Programmers Model
•Registers r0 to r7 are accessible (Lo)
•Few instructions require r8 to r15 to be specified
•r13 is used as the stack pointer
•r14 is used as the link register
•r15 is used as the program counter
126 of 37
127 of 37
128 of 37
129 of 37
130 of 37
131 of 37
THUMB Programmer’s Model
132 of 37
THUMB Register Organisation
Thumb General registers and Program Counter
User / System FIQ Supervisor Abort IRQ Undefined
r0 r0 r0 r0 r0 r0
r1 r1 r1 r1 r1 r1
r2 r2 r2 r2 r2 r2
r3 r3 r3 r3 r3 r3
r4 r4 r4 r4 r4 r4
r5 r5 r5 r5 r5 r5
r6 r6 r6 r6 r6 r6
r7 r7 r7 r7 r7 r7
SP SP_FIQ SP_SVC SP_ABT SP_IRQ SP_UND
LR LR_ FIQ LR_ SVC LR_ ABT LR_ IRQ LR_ UND
PC PC_ FIQ PC_ SVC PC_ ABT PC_ IRQ PC_ UND
Thumb Program Status Registers
CPSR CPSR CPSR CPSR CPSR CPSR
sprsr_fiq
SPSR_FIQ SPSR_SVC SPSR_ABT sprsr_fiq
SPSR_IRQ SPSR _UND
sprsr_fiq
133 of 37
ARM-Thumb Similarities
•Load-store architecture
•Support 8-bit byte, 16-bit half-word and 32 bit word
data types with aligned boundaries
•32 bit unsegmented memory.
•However , in order to achieve a 16 bit instruction
length a number of characteristic features of the ARM
instruction set have not been supported in Thumb state
134 of 37
ARM-Thumb differences
•Unconditional Execution of instruction except branch instructions
Where all ARM instructions are executed conditionally
•2-address format for data processing
ARM data processing instructions uses 3 address format
(Except 64 bit MUL instructions)
•Thumb instruction are Less regular instruction formats than ARM, as
a result of the dense encoding
•There are NO status register access instructions(MRS/MSR) in
Thumb state
•Many addressing modes of ARM not supported in Thumb state
135 of 37
•No banked registers and privileged modes in Thumb state
ARM-Thumb differences
The biggest register difference involves the SP register
The Thumb state has unique mnemonics (PUSH, POP) that
don’t exist in ARM state
These instructions assume the existence of a stack pointer,
for which R13 is used
They translate into load and store instructions internally
No SWP instructions in Thumb state
No support for coprocessor instructions in Thumb state
Barrel shifter operations are separate instructions
136 of 37
Thumb exception
•With exception processor is returned to ARM mode.
•While returning previous mode is restored as SPSR is
transferred to CPSR
•Use of the Thumb instruction set can improve code
density , Power efficiency, Save cost and Enhance
performance all at one
137 of 37
Thumb Branching
•Short conditional branches
•Medium range unconditional branches
•Long range Subroutine calls
•Branch to change to ARM Mode
138 of 37
Branch Instruction Formats
B <cond> <label>
15 14 13 12 11 8 7 0
1 1 0 1 Condition 8-Bit Offset
B <label>
1 1 1 0 0 11 – Bit Offset
BL <label>
1 1 1 H 11 – Bit Offset
BX Rm
0 1 0 0 0 1 1 1 0 H Rm 0 0 0
139 of 37
THUMB Branch Instructions
140 of 37
Features
•Different format for each case
•Offset is reduced to 11bit and 8 bit
•Offset is shifted left by 1-bit (to give half-word
alignment) and sign-extended to 32 bits.
•BL is more subtle to give 22-bit offset using link register
for temporary storage
•No direct mapping to ARM instructions as Thumb
require half-word aligned offsets.
141 of 37
BL Instruction
To allow for a reasonably large offset to the target
subroutine each of these two instructions is
automatically translated by the assembler into a
sequence of two 16 bit thumb instructions
1. H = 10
LR := PC + (sign-extended offset shifted left 12 places);
2. H = 11
PC := LR + (offset shifted left 1 place)
3. LR := address of next instruction 142 of 37
Software Interrupt Instruction
1 1 0 1 1 1 1 1 8 – Bit Immediate
•Address of next instruction is saved in r14_svc
•CPSR is saved in r14_svc
•Disables IRQ, Clears T bit, Enters Supervisor mode
•PC is forced to 0x08
•8 bit immediate is zero extended to fill the 24-bit
field in the ARM instruction.
•Limits SWIs to first 256 of 16 million ARM SWIs.
143 of 37
Data Processing
•Fairly complex instruction formats
•No conditional execution
•Separate shift operations provided, no shifting of
second operand
•All data processing instruction set condition code bits
(no need of ‘S’)
144 of 37
THUMB data processing instructions
145 of 37
THUMB data processing instructions
146 of 37
Instruction formats
•<op> Rd, Rn, Rm
•<op> Rd, Rn, # <imm3>
•<op> Rd|Rn, Rm|Rs
•<op> Rd, Rn, #<sh 5>
•<op> Rd, #<imm 8>
147 of 37
Instructions
•MOV Rd, #<imm8>
•MVN Rd, Rm
•CMP Rn, #<imm8>
•CMP Rn, Rm
•CMN Rn, Rm
•TST Rn, Rm
148 of 37
Instruction
•ADD Rd, Rn, #<imm3>
•ADD Rd, #< imm8>
•ADD Rd, Rn, Rm
•ADC Rd, Rm
•SUB Rd, Rn, #<imm3>
•SUB Rd, #< imm8>
•SUB Rd, Rn, Rm
•SBC Rd, Rm
•NEG Rd, Rn
149 of 37
Instruction
•LSL Rd, Rm, #<#sh>
•LSL Rd, Rs
•LSR Rd, Rm, #<#sh>
•LSR Rd, Rs
•ASR Rd, Rm, #<#sh>
•ASR Rd, Rs
•ROR Rd, Rs
150 of 37
Instruction
•AND Rd, Rm
•EOR Rd, Rm
•ORR Rd, Rm
•BIC Rd, Rm
•MUL Rd, Rm
151 of 37
Instruction (using Hi registers)
•ADD Rd, Rm (1 or 2 Hi registers)
•CMP Rn, Rm (1 or 2 Hi registers)
•MOV Rd, Rm (1 or 2 Hi registers)
•ADD Rd, PC, #<imm8>
•ADD Rd, SP, #<imm8>
•ADD SP, SP, #<imm7>
•SUB SP, SP, #<imm7>
Except others donot set condition code bits
152 of 37
THUMB Single register data transfer
153 of 37
Data Transfer Instruction
•LDR|STR Rd, [Rn, #off5]
•LDR|STR Rd, [Rn, Rm]
•LDRB|STRB Rd, [Rn, #off5]
•LDRB|STRB Rd, [Rn, Rm]
•LDRH|STRH Rd, [Rn, #off5]
•LDRH|STRH Rd, [Rn, Rm]
Signed operands:
•LDR|STR {S} {H|B} Rd, [Rn, Rm]
154 of 37
THUMB Multiple register data transfer
155 of 37
Multiple register transfers
•LDMIA|STMIA Rn!, { <reg list> }
•Rn may be any register among Ro – R7
•Register set can be any subset of R0 – R7 but not
base register ‘Rn’
•Write back to base register is always selected.
156 of 37
Stack Mode
•POP|PUSH { <reg list> {, R}}
•R13 (sp) is used as base register
•Uses Full Descending Stack
•In addition any subset of Ro-R7 registers LR
(lr) may be included in PUSH instruction and
PC (pc)may be included in POP instruction
157 of 37
Properties
•Thumb code requires 70% of space of ARM code
•Thumb code uses 40% more instructions than the ARM
code
•With 32-bit memory ARM code is 40% faster
•With 16-bit memory Thumb code is 45% faster than ARM
code
•Thumb code uses 30% less external memory power than
ARM code.
158 of 37
Thumb Applications
159 of 37