Chapter 6
ARM Instruction Set
Hsung-Pin Chang
Department of Computer Science
National Chung Hsing University
Outline
o Data Processing Instructions
o Branch Instructions
o Load-store instructions
o Software interrupt instructions
o Program status register instructions
o Conditional Execution
ARM Instruction Set Format
6.1 Data Processing Instructions
o Manipulate data within registers
o Data processing instructions
n Move instructions
n Arithmetic instructions
n Logical instructions
n Comparison instructions
n Multiply instructions
6.1.1 Move Instruction
o Syntax: <instruction> {<cond>} {S} Rd, N
n N: a register or immediate value
o MOV : move
n MOV r0, r1; r0 = r1
n MOV r0, #5; r0 = 5
o MVN : move (negated)
n MVN r0, r1; r0 = NOT(r1)=~ (r1)
Preprocessed by Shifter
o Example 1
n PRE: r5 = 5, r7 = 8;
n MOV r7, r5, LSL #2; r7 = r5 << 2 = r5*4
n POST: r5 = 5, r7 = 20
6.1.2 Preprocessed by Shifter
o LSL: logical shift left
n x << y, the least significant bits are filled with zeroes
o LSR: logical shift right:
n (unsigned) x >> y, the most significant bits are filled with zeroes
o ASR: arithmetic shift right
n (signed) x >> y, copy the sign bit to the most significant bit
o ROR: rotate right
n ((unsigned) x >> y) | (x << (32-y))
o RRX: rotate right extended
n c flag <<31 | (( unsigned) x >> 1)
n Performs 33-bit rotate, with the CPSR’s C bit being inserted above
sign bit of the word
Preprocessed by Shifter (Cont.)
o Example 2
n PRE: r0 = 0x00000000, r1 = 0x80000004
n MOV r0, r1, LSL #1 ; r0 = r1 *2
n POST r0 = 0x00000008, r1 = 0x80000004
6.1.3 Arithmetic Instructions
o Syntax: <instruction> {<cond>} {S} Rd, Rn, N
n N: a register or immediate value
o ADD : add
n ADD r0, r1, r2; r0 = r1 + r2
o ADC : add with carry
n ADC r0, r1, r2; r0 = r1 + r2 + C
o SUB : subtract
n SUB r0, r1, r2; r0 = r1 - r2
o SBC : subtract with carry
n SUC r0, r1, r2; r0 = r1 - r2 + C -1
6.1.3 Arithmetic Instructions (Cont.)
o RSB : reverse subtract
n RSB r0, r1, r2; r0 = r2 – r1
o RSC : reverse subtract with carry
n RSC r0, r1, r2; r0 = r2 – r1 + C -1
o MUL : multiply
n MUL r0, r1, r2; r0 = r1 x r2
o MLA : multiply and accumulate
n MLA r0, r1, r2, r3; r0 = r1 x r2 + r3
6.1.4 Logical Operations
o Syntax: <instruction> {<cond>} {S} Rd, RN, N
n N: a register or immediate value
o AND : Bit-wise and
o ORR : Bit-wise or
o EOR : Bit-wise exclusive-or
o BIC : bit clear
n BIC r0, r1, r2; r0 = r1 & Not(r2)
Logical Operations (Cont)
o Example 3:
n PRE: r1 = 0b1111, r2 = 0b0101
n BIC r0, r1, r2 ; r0 = r1 AND (NOT(r2))
n POST: r0=0b1010
6.1.5 Comparison Instructions
o Compare or test a register with a 32-bit value
n Do not modify the registers being compared or
tested
n But only set the values of the NZCV bits of the
CPSR register
o Do not need to apply to S suffix for comparison
instruction to update the flags in CPSR register
Comparison Instructions (Cont.)
o Syntax: <instruction> {<cond>} {S} Rd, N
n N: a register or immediate value
o CMP : compare
n CMP r0, r1; compute (r0 - r1)and set NZCV
o CMN : negated compare
n CMP r0, r1; compute (r0 + r1)and set NZCV
o TST : bit-wise AND test
n TST r0, r1; compute (r0 AND r1)and set NZCV
o TEQ : bit-wise exclusive-or test
n TEQ r0, r1; compute (r0 EOR r1)and set NZCV
Comparison Instructions (Cont.)
o Example 4
n PRE: CPSR = nzcvqiFt_USER, r0 = 4, r9 = 4
n CMP r0, r9
n POST: CPSR = nZcvqiFt_USER
6.1.6 Multiply Instruction
o Syntax:
n MLA{<cond>} {S} Rd, Rm, Rs, Rn
n MUL{<cond>} {S} Rd, Rm, Rs
o MUL : multiply
n MUL r0, r1, r2; r0 = r1*r2
o MLA : multiply and accumulate
n MLA r0, r1, r2, r3; r0 = (r1*r2) + r3
Multiply Instruction (Cont.)
o Syntax: <instruction>{<cond>} {S} RdLo, RdHi, Rm, Rs
n Multiply onto a pair of register representing a 64-bit value
o UMULL : unsigned multiply long
n UMULL r0, r1, r2, r3; [r1,r0] = r2*r3
o UMLAL : unsigned multiply accumulate long
n UMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
o SMULL: signed multiply long
n SMULL r0, r1, r2, r3; [r1,r0] = r2*r3
o SMLAL : signed multiply accumulate long
n SMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
6.2 Branch Instructions
o Branch instruction
n Change the flow of execution
n Used to call a routine
o Allow applications to
n Have subroutines
n Implement if-then-else structure
n Implement loop structure
Branch Instructions (Cont.)
o Syntax
n B{<cond>} lable
n BL{<cond>} lable
o B : branch
n B label; pc (program counter) = label
n Used to change execution flow
o BL : branch and link
n BL label; pc = label, lr = address of the next
address after the BL
n Similar to the B instruction but can be used for subroutine
call
o Overwrite the link register (lr) with a return address
Branch Instructions (Cont.)
o Example 5
B forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
Forward
SUB r1, r2, #4
Backward
SUB r1, r2, #4
B backward
Branch Instructions (Cont.)
o Example 6:
BL subroutine
CMP r1, #5
MOVEQ r1, #0
…
subroutine
<subroutine code>
MOV pc, lr ; return by moving pc = lr
6.3 Load-Store Instructions
o Transfer data between memory and processor
registers
o Three types
n Single-register transfer
n Multiple-register transfer
n Swap
6.3.1 Simple-Register Transfer
o Moving a single data item in and out of
register
o Data item can be
n A word (32-bits)
n Halfword (16-bits)
n Bytes (8-bits)
Simple-Register Transfer (Cont.)
o Syntax
n <LDR|STR>{<cond>}{B} Rd, addressing1
n LDR{<cond>}SB|H|SH Rd, addressing2
n STR{<cond>} H Rd, addressing2
o LDR : load word into a register from memory
o LDRB : load byte
o LDRSB : load signed byte
o LDRH : load half-word
o LSRSH : load signed halfword
o STR: store word from a register to memory
o STRB : store byte
o STRH : store half-word
Simple-Register Transfer (Cont.)
o Example 7
LDR r0, [r1] ;= LDR r0, [r1, #0]
;r0 = mem32[r1]
STR r0, [r1] ;= STR r0, [r1, #0]
;mem32[r1]= r0
n Register r1 is called the base address register
6.3.2 Single-Register Load-Store
Addressing Mode
o Index method, also called Base-Plus-Offset
Addressing
n Base register
o r0 – r15
n Offset, add or subtract an unsigned number
o Immediate
o Register (not PC)
o Scaled register
Single-Register Load-Store Addressing
Mode (Cont.)
o Preindex:
n data: mem[base+offset]
n Base address register: not updated
n Ex: LDR r0,[r1,#4] ; r0:=mem32[r1+4]
o Postindex:
n data: mem[base]
n Base address register: base + offset
n Ex: LDR r0,[r1],#4 ; r0:=mem32[r1], then r1:=r1+4
o Preindex with writeback (also called auto-indexing)
n Data: mem[base+offset]
n Base address register: base + offset
n Ex: LDR r0, [r1,#4]! ; r0:=mem32[r1+4], then r1:=r1+4
Single-Register Load-Store Addressing
Mode (Cont.)
o Example 8
n r0 = 0x00000000, r1 = 0x00009000,
mem32[0x00009000] = 0x01010101,
mem32[0x00009004] = 0x02020202
n Preindexing: LDR r0, [r1, #4]
o r0 = 0x02020202, r1=0x00009000
n Postindexing: LDR r0, [r1], #4
o r0 = 0x01010101, r1=0x00009004
n Preindexing with writeback: LDR r0, [r1, #4]!
o R0 = 0x02020202, r1=0x00009004
Single-Register Load-Store Addressing
Mode (Cont.)
Addressing mode and index method Addressing syntax
Preindex with immediate offset [Rn, #+/-offset_12]
Preindex with register offset [Rn, +/-Rm]
Preindex with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Preindex writeback with immediate offset [Rn, #+/-offset_12]!
Preindex writeback with register offset [Rn, +/-Rm]!
Preindex writeback with scaled register offset [Rn, +/-Rm, shift #shift_imm]
Immediate postindexed [Rn], #+/-offset_12]
Register postindexed [Rn], +/-Rm!
Scaled register postindexed [Rn], +/-Rm, shift #shift_imm
Examples of LDR Using Different
Addressing Modes
Instruction r0= r1+=
Preindex with LDR r0, [r1, #0x4]! mem32[r1+0x4] 0x4
writeback
LDR r0, [r1,r2]! mem32[r1+r2] r2
LDR r0,[r1, r2, LSR#0x4]! mem32[r1+(r2 LSR 0x4)] (r2 LSR 0x4)
Preindex LDR r0, [r1, #0x4] mem32[r1+0x4] not updated
LDR r0, [r1, r2] mem32[r1+r2] not updated
LDR r0, [r1, -r2, LSR #0x4] Mem32[r1-(r2 LSR 0x4)] not updated
Postindex LDR r0, [r1], #0x4 mem32[r1] 0x4
LDR r0, [r1], r2 Mem32[r1] r2
LDR r0, [r1], r2 LSR #0x4 mem32[r1] (r2 LSR 0x4)
6.3.3 Multiple-Register Transfer
o Transfer multiple registers between memory
and the processor in a single instruction
o More efficient than single-register transfer
n Moving blocks of data around memory
n Saving and restoring context and stack
Multiple-Register Transfer (Cont.)
o Load-store multiple instruction can increase interrupt
latency
n Interrupt can be occurred after an instruction has been
completed
n Each load multiple instruction takes 2 + N*t cycles
o N: the number of registers to load
o t: the number of cycles required for sequential access to memory
n Compilers provides a switch to control the maximum
number of registers between transferred
o Limit the maximum interrupt latency
Multiple-Register Transfer (Cont.)
o Syntax:
n <LDM|STM>{<cond>} <mode> Rn{!}, <registers>{^}
n Address mode: See the next page
n ^: optional
o Can not be used in User Mode and System Mode
o If op is LDM and reglist contains the pc (r15)
n SPSR is also copied into the CPSR.
o Otherwise, data is transferred into or out of the User mode
registers instead of the current mode registers.
Addressing Mode
Addressing Description Start End Rn!
mode address address
IA increment address after Rn Rn+4*N -4 Rn+4*N
each transfer
IB increment address before Rn + 4 Rn+4*N Rn+4*N
each transfer
DA decrement address after Rn-4*N +4 Rn Rn-4*N
each transfer
DB decrement address before Rn-4*N Rn – 4 Rn+4*N
each transfer
Multiple-Register Transfer (Cont.)
o Example 9
n PRE:
mem32[0x80018] = 0x03,
mem32[0x80014] = 0x02,
mem32[0x80010] = 0x01,
r0 = 0x00080010,
r1 = r2 = r3= 0x00000000
n LDMIA r0!, {r1-r3}, or LDMIA r0!, {r1, r2, r3}
o Register can be explicitly listed or use the “-” character
Pre-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000000
0x80014 0x00000002 R2=0x00000000
R0 = 0x80010 0x80010 0x00000001 R1=0x00000000
0x8000c 0x00000000
Figure 1
Post-Condition for LDMIA Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004
0x80018 0x00000003 R3=0x00000003
0x80014 0x00000002 R2=0x00000002
0x80010 0x00000001 R1=0x00000001
0x8000c 0x00000000
Figure 2
Multiple-Register Transfer (Cont.)
o Example 9 (Cont.)
n POST:
r0 = 0x0008001c,
r1 = 0x00000001,
r2 = 0x00000002,
r3 = 0x00000003
Multiple-Register Transfer (Cont.)
o Example 10
n PRE: as shown in Fig. 1
n LDMIB r0!, {r1-r3}
n POST:
r0 = 0x0008001c
r1 = 0x00000004
r2 = 0x00000003
r3 = 0x00000002
Post-Condition for LDMIB Instruction
Memory Address Data
0x80020 0x00000005
R0 = 0x8001c 0x8001c 0x00000004 R3=0x00000004
0x80018 0x00000003 R2=0x00000003
0x80014 0x00000002 R1=0x00000002
0x80010 0x00000001
0x8000c 0x00000000
Figure 3
Multiple-Register Transfer (Cont.)
o Load-store multiple pairs when base update used (!)
n Useful for saving a group of registers and store them later
Store multiple Load multiple
STMIA LDMDB
STMIB LDMDA
STMDA LDMIB
STMDB LDMIA
Multiple-Register Transfer (Cont.)
o Example 11
n PRE:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
n STMIB r0!, {r1-r3}
MOV r1, #1
MOV r2, #2,
MOV r3, #3
Multiple-Register Transfer (Cont.)
o Example 11 (Cont.)
n PRE (2):
r0 = 0x0000900c
r1 = 0x00000001,
r2 = 0x00000002
r3 = 0x00000003
n LDMDA r0!, {r1-r3}
n POST:
r0 = 0x00009000
r1 = 0x00000009,
r2 = 0x00000008
r3 = 0x00000007
Multiple-Register Transfer (Cont.)
o Example 11 (Cont.)
n The STMIB stores the values 7, 8, 9 to memory
n Then corrupt register r1 to r3 by MOV instruction
n Finally, the LDMDA
o Reloads the original values, and
o Restore the base pointer r0
Multiple-Register Transfer (Cont.)
o Example 12: the use of the load-store multiple
instructions with a block memory copy
;r9 points to start of source data
;r10 points to start of destination data
;r11 points to end of the source
loop
LDMIA r9!, {r0-r7} ;load 32 bytes from source and update r9
STMIA r10!, {r0-r7} ;store 32 bytes to desti. and update r10
CMP r9, r11 ;have we reached the end
BNE loop
Multiple-Register Transfer (Cont.)
High memory
r11
Source
r9
Copy memory
Location
(transfer 32 bytes in
two instructions)
Destination
r10
Low memory
6.3.4 Stack Operations
o ARM architecture uses the load-store multiple
instruction to carry out stack operations
n PUSH: use a store multiple instruction
n POP: use a load multiple instruction
o Stack
n Ascending (A): stack grows towards higher
memory addresses
n Descending (D): stack grows towards lower
memory addresses
6.3.4 Stack Operations (Cont.)
o Stack
n Full stack (F): stack pointer sp points to the last
valid item pushed onto the stack
n Empty stack (E): sp points after the last item on
the stack
o The free slot where the next data item will be placed
o There are a number of aliases available to
support stack operations
n See next page
6.3.4 Stack Operations (Cont.)
o ARM support all four forms of stacks
n Full ascending (FA): grows up; base register points to
the highest address containing a valid item
n Empty ascending (EA): grows up; base register points to
the first empty location
n Full descending (FD): grows down; base register points
to the lowest address containing a valid data
n Empty descending (ED): grows down; base register
points to the first empty location below the stack
Addressing Methods for Stack
Operations
Addressing Description Pop =LDM Push =STM
mode
FA Full LDMFA LDMDA STMFA STMIB
ascending
FD Full LDMFD LDMIA STMFD STMDB
descending
EA Empty LDMEA LDMDB STMEA STMIA
ascending
ED Empty LDMED LDMIB STMED STMDA
descending
6.3.4 Stack Operations (Cont.)
o Example 13
n PRE:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x00080014
n STMFD sp!, {r1, r4}
n POST:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x0008000c
6.3.4 Stack Operations (Cont.)
o Example 13 (Cont.)
n STMFD – full stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
sp
0x80014 0x00000002 0x80014 0x00000002
0x80010 Empty 0x80010 0x00000003
sp
0x8000c Empty 0x8000c 0x00000002
6.3.4 Stack Operations (Cont.)
o Example 14
n PRE:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x00080010
n STMED sp!, {r1, r4}
n POST:
o r1 = 0x00000002
o r4 = 0x00000003
o sp = 0x00080008
6.3.4 Stack Operations (Cont.)
o Example 14 (Cont.)
n STMED – empty stack push operation
PRE POST
Address Data Address Data
0x80018 0x00000001 0x80018 0x00000001
0x80014 0x00000002 0x80014 0x00000002
sp 0x80010 Empty 0x80010 0x00000003
0x8000c Empty 0x8000c 0x00000002
sp
0x80008 Empty 0x80008 Empty
6.3.3 SWAP Instruction
o A special case of a load-store instruction
n Swap the contents of memory with the contents
of a register
n An atomic operation
o Cannot not be interrupted by any other instruction or
any other buy access
o The system “holds the bus” until the transaction is
complete
o Useful when implementing semaphores and mutual
exclusion in an operating system
6.3.3 SWAP Instruction (Cont.)
o Syntax: SWP{B}{<cond>} Rd, Rm, [Rn]
n tmp = mem32[Rn]
n Mem32[Rn] = Rm
n Rd = tmp
o SWP: swap a word between memory and a
register
o SWPB: swap a byte between memory and a
register
6.3.3 SWAP Instruction (Cont.)
o Example 15
n PRE:
o Mem32[0x9000] = 0x12345678
o r0 = 0x00000000
o r1 = 0x11112222
o r2 = 0x00009000
n SWP r0, r1, [r2]
n POST:
o mem32[0x9000] = 0x11112222
o r0 = 0x12345678
o r1 = 0x11112222
o r2 = 0x00009000
6.3.3 SWAP Instruction (Cont.)
o Example 15 (Cont.)
SPIN
MOV r1, =semaphore
MOV r2, #1
SWP r3, r2, [r1] ;hold the bus until complete
CMP r3, #1
BEQ spin
o The address pointed by the semaphore either contains the
value of 1 or 0
o When semaphore value == 1 , loop until semaphore becomes
0 (updated by the holding process)
6.4 Software Interrupt Instruction
o SWI: software interrupt instruction
n Cause a software interrupt exception
n Provide a mechanism for applications to call
operating system routines
n Each SWI instruction has an associated SWI
number
o Used to represent a particular function call or routines
6.4 Software Interrupt Instruction
(Cont.)
o Syntax: SWI{<cond>} SWI_number
n lr_svc = address of instruction following the SWI
n spsr_svc = cpsr
n pc = vector table + 0x8 ; jump to the swi
handling
n cpsr mode = SVC
n cpsr I = 1 (mask IRQ interrupt)
6.4 Software Interrupt Instruction
(Cont.)
o Example 16
n PRE:
o cpsr = nzcVqift_USER
o pc = 0x00008000
o lr = r14 = 0x003fffff
n 0x00008000 SWI 0x123456
n POST:
o cpsr = nzcVqIft_SVC
o spsr = nzcVqift_USER
o pc = 0x00000008
o lr = 0x00008004
6.5 Program Status Register
Instructions
o MRS
n Transfer the contents of either the cpsr or spsr
into a register
o MSR
n Transter the contents of a register into the cpsr or
spsr
6.5 Program Status Register
Instructions (Cont.)
o Syntax
n MRS{<cond>} Rd, <cpsr|spsr>
n MSR{<cond>} <cpsr|spsr>_<fields>, Rm
n MSR{<cond>} <cpsr|spsr>_<fields>, #immediate
o Field: any combination of
n Flags: [24:31]
n Status: [16:23]
n eXtension[8:15]
n Control[0:7]
PSR Registers
6.5 Program Status Register
Instructions (Cont.)
o Note: You cannot access the SPSR in User or
System Mode
n Assembler cannot warn you because it does not
know which mode will be executed in
6.5 Program Status Register
Instructions (Cont.)
o Example 17
n PRE:
o cpsr = nzcvqIFt_SVC
n MRS r1, cpsr
n BIC r1, r1, #0x80 ;0b10000000, clear bit 7
n MSR cpsr_c, r1 ;enable IRQ interrupts
n POST:
o cpsr = nzcvqiFt_SVC
n Note that, this example must be in SVC mode
o In user mode, you can only read all cpsr bits and can only update
the condition flag field f, i.e., cpsr[24:31]
6.6 Conditional Execution
o Almost all ARM instruction can include an
optional condition code
n Instruction is only executed if the condition code
flags in the CPSR meet the specified condition
n The default is AL, or always execute
o Conditional executions depends on two
components
n The condition field: located in the instruction
n The condition flags: located in the cpsr
Conditional Execution (Cont.)
o Example 18
ADDEQ r0, r1, r2
; r0 = r1 + r2 if zero flag is set
Condition Codes
6.6 Conditional Execution (Cont.)
o Thus, before activate conditional execution
n There must be an instruction that updates the
conditional code flag according the result
n If not specified, instructions will not update the
flags
o To make an instruction update the flags
n Include the S suffix
n Example: ADDS r0, r1,r2
6.6 Conditional Execution (Cont.)
o However, some instructions always update the flags
n Do not require the S suffix
n CMP, CMN, TST, TEQ
o Flags are preserved until updated
o Thus, you can execute an instruction conditionally,
based upon the flags set in another instruction, either:
n Immediately after the instruction which updated the flags
n After any number of intervening instructions that have not
updated the flags.
6.6 Conditional Execution (Cont.)
o Example 18
n Transfer the following code into the assembly
language
n Assume r1 = a, r2 = b
while ( a!= b )
{
if (a > b) a -= b; else b -= a;
}
6.6 Conditional Execution (Cont.)
o Example 18: Solution 1
gcd
CMP r1, r2
BEQ complete
BLT lessthan
SUB r1, r1, r2
B gcd
lessthan
SUB r2, r2, r1
B gcd
complete
6.6 Conditional Execution (Cont.)
o Example 18: Solution 2
gcd
CMP r1, r2
SUBGT r1, r1, r2
SUBLT r2, r2, r1
BNE gcd
o Solution 2 dramatically reduces the number of
instructions !!!
References
o Andrew N. Sloss, “ARM System Developer’s
Guide: Designing and Optimizing System
Software,” Morgan Kaufmann Publishers,
2004
n Chapter 3: Introduction to the ARM Instruction
Set