Module 3
ARM Instruction Set
ARM Instruction Sets
   Data Processing Instructions
   Branch Instructions
   Load-store instructions
   Software interrupt instructions
   Program status register instructions
   Conditional Execution
Data Processing Instructions
   Manipulate data within registers
   Data processing instructions
    ◼   Move instructions
    ◼   Arithmetic instructions
    ◼   Logical instructions
    ◼   Comparison instructions
    ◼   Multiply instructions
Data processing
❑They are move, arithmetic, logical, comparison and
 multiply instructions.
❑Most data processing instructions can process one of
 their operands using the barrel shifter.
   • General rules:
     – All operands are 32-bit, coming
       from registers or literals.
     – The result, if any, is 32-bit and
       placed in a register (with the
       exception for long multiply which
       produces a 64-bit result)
     – 3-address format
Data Processing Instruction
                              5
ARM Instruction Set Format
Move Instruction
   Syntax: <instruction> {<cond>} {S}Rd, N
    ◼   N: a register or immediate value
   MOV : move
    ◼   MOV r0, r1; r0 = r1
    ◼   MOV r0, #5; r0 = 5
   MVN : move (negated)
    ◼   MVN r0, r1; r0 = NOT(r1)=~ (r1)
Preprocessed by Shifter
   Example 1
    ◼   PRE: r5 = 5, r7 = 8;
    ◼   MOV r7, r5, LSL #2; r7 = r5 << 2 = r5*4
    ◼   POST: r5 = 5, r7 = 20
Preprocessed by Shifter
   LSL: logical shift left
    ◼   x << y, the least significant bits are filled with zeroes
   LSR: logical shift right:
    ◼   (unsigned) x >> y, the most significant bits are filled with zeroes
   ASR: arithmetic shift right
    ◼   (signed) x >> y, copy the sign bit to the most significant bit
   ROR: rotate right
    ◼   ((unsigned) x >> y) | (x << (32-y))
   RRX: rotate right extended
    ◼   c flag <<31 | (( unsigned) x >> 1)
    ◼   Performs 33-bit rotate, with the CPSR’s C bit being inserted above
        sign bit of the word
Shift Register Operands
    – ADD r1,r2,r3,LSL #3 ;r=
    – r1= r2 + (r3 << 3)                   31                              0       31                              0
    – A single instruction executed in a
      single cycle
                                                                   00000           00000
 ❑ LSL: Logical Shift Left by 0 to                     LSL #5                                  LSR #5
   31 places, 0 filled at the lsb          31                              0       31                              0
   end                                     0                                       1
 ❑ LSR, ASL (Arithmetic Shift
                                           00000 0                                 11111 1
   Left), ASR, ROR (Rotate
                                               ASR #5 , positive operand               ASR #5 , negative operand
   Right), RRX (Rotate Right
                                           31                              0       31                              0
   etended by 1 bit)                                                           C
    – ADD r5,r5,r3,LSL r2 ;
      r5:=r5+r3*2r2                                                            C   C
    – MOV r12,r4,ROR r3                               ROR #5                                     RRX
      ;r12:=r4 rotated right
      by value of r3
                                                                                                                   10
Preprocessed by Shifter (Cont.)
   Example 2
    ◼   PRE: r0 = 0x00000000, r1 = 0x80000004
    ◼   MOV r0, r1, LSL #1 ; r0 = r1 *2
    ◼   POST r0 = 0x00000008, r1 = 0x80000004
Arithmetic Instructions
   Syntax: <instruction> {<cond>} {S}Rd, Rn, N
    ◼   N: a register or immediate value
   ADD : add
    ◼   ADD r0, r1, r2; r0 = r1 + r2
   ADC : add with carry
    ◼   ADC r0, r1, r2; r0 = r1 + r2 + C
   SUB : subtract
    ◼   SUB r0, r1, r2; r0 = r1 - r2
   SBC : subtract with carry
    ◼   SUC r0, r1, r2; r0 = r1 - r2 + C -1
Arithmetic Instructions (Cont.)
   RSB : reverse subtract
    ◼   RSB r0, r1, r2; r0 = r2 – r1
   RSC : reverse subtract with carry
    ◼   RSC r0, r1, r2; r0 = r2 – r1 + C -1
   MUL : multiply
    ◼   MUL r0, r1, r2; r0 = r1 x r2
   MLA : multiply and accumulate
    ◼   MLA r0, r1, r2, r3; r0 = r1 x r2 + r3
Logical Operations
   Syntax: <instruction> {<cond>} {S}Rd, RN, N
    ◼   N: a register or immediate value
   AND : Bit-wise and
   ORR : Bit-wise or
   EOR : Bit-wise exclusive-or
   BIC : bit clear
    ◼   BIC r0, r1, r2; r0 = r1 & Not(r2)
Logical Operations (Cont)
   Example 3:
    ◼   PRE: r1 = 0b1111, r2 = 0b0101
    ◼   BIC r0, r1, r2   ; r0 = r1 AND (NOT(r2))
    ◼   POST: r0=0b1010
Comparison Instructions
   Compare or test a register with a 32-bit value
    ◼   Do not modify the registers being compared or
        tested
    ◼   But only set the values of the NZCV bits of the
        CPSR register
           Do not need to apply to S suffix for comparison
            instruction to update the flags in CPSR register
Comparison Instructions (Cont.)
   Syntax: <instruction> {<cond>} {S}Rd, N
    ◼   N: a register or immediate value
   CMP : compare
    ◼   CMP r0, r1; compute (r0 - r1)and set NZCV
   CMN : negated compare
    ◼   CMP r0, r1; compute (r0 + r1)and set NZCV
   TST : bit-wise AND test
    ◼   TST r0, r1; compute (r0 AND r1)and set NZCV
   TEQ : bit-wise exclusive-or test
    ◼   TEQ r0, r1; compute (r0 EOR r1)and set NZCV
Comparison Instructions (Cont.)
   Example 4
    ◼   PRE: CPSR = nzcvqiFt_USER, r0 = 4, r9 = 4
    ◼   CMP r0, r9
    ◼   POST: CPSR = nZcvqiFt_USER
Multiply Instruction
   Syntax:
    ◼   MLA{<cond>} {S} Rd, Rm, Rs, Rn
    ◼   MUL{<cond>} {S} Rd, Rm, Rs
   MUL : multiply
    ◼   MUL r0, r1, r2;   r0 = r1*r2
   MLA : multiply and accumulate
    ◼   MLA r0, r1, r2, r3;   r0 = (r1*r2) + r3
Multiply Instruction (Cont.)
   Syntax: <instruction>{<cond>} {S}RdLo, RdHi, Rm, Rs
    ◼   Multiply onto a pair of register representing a 64-bit value
   UMULL : unsigned multiply long
    ◼   UMULL r0, r1, r2, r3; [r1,r0] = r2*r3
   UMLAL : unsigned multiply accumulate long
    ◼   UMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
   SMULL: signed multiply long
    ◼   SMULL r0, r1, r2, r3; [r1,r0] = r2*r3
   SMLAL : signed multiply accumulate long
    ◼   SMLAL r0, r1, r2, r3; [r1,r0] = [r1,r0]+(r2*r3)
Branch Instructions
   Branch instruction
    ◼   Change the flow of execution
    ◼   Used to call a routine
   Allow applications to
    ◼   Have subroutines
    ◼   Implement if-then-else structure
    ◼   Implement loop structure
Branch Instructions (Cont.)
   Syntax
    ◼   B{<cond>} lable
    ◼   BL{<cond>} lable
   B : branch
    ◼   B label; pc (program counter) = label
    ◼   Used to change execution flow
   BL : branch and link
    ◼   BL label; pc = label, lr = address of the next
        address after the BL
    ◼   Similar to the B instruction but can be used for subroutine
        call
           Overwrite the link register (lr) with a return address
Branch Instructions (Cont.)
   Example 5
              B forward
              ADD r1, r2, #4
              ADD r0, r6, #2
              ADD r3, r7, #4
    Forward
           SUB r1, r2, #4
    Backward
           SUB r1, r2, #4
           B backward
Branch Instructions (Cont.)
   Example 6:
         BL subroutine
         CMP r1, #5
         MOVEQ r1, #0
         …
subroutine
         <subroutine code>
         MOV pc, lr ; return by moving pc = lr
Load-Store Instructions
   Transfer data between memory and processor
    registers
   Three types
    ◼   Single-register transfer
    ◼   Multiple-register transfer
    ◼   Swap
Simple-Register Transfer
   Moving a single data item in and out of
    register
   Data item can be
    ◼   A word (32-bits)
    ◼   Halfword (16-bits)
    ◼   Bytes (8-bits)
Simple-Register Transfer (Cont.)
   Syntax
    ◼   <LDR|STR>{<cond>}{B} Rd, addressing1
    ◼   LDR{<cond>}SB|H|SH Rd, addressing2
    ◼   STR{<cond>} H Rd, addressing2
   LDR : load word into a register from memory
   LDRB : load byte
   LDRSB : load signed byte
   LDRH : load half-word
   LSRSH : load signed halfword
   STR: store word from a register to memory
   STRB : store byte
   STRH : store half-word
Simple-Register Transfer (Cont.)
   Example 7
    LDR r0, [r1]            ;= LDR r0, [r1, #0]
                            ;r0 = mem32[r1]
    STR r0, [r1]            ;= STR r0, [r1, #0]
                            ;mem32[r1]= r0
    ◼   Register r1 is called the base address register
Single-Register Load-Store Addressing Mode
     Index method, also called Base-Plus-Offset
      Addressing
      ◼   Base register
             r0 – r15
      ◼   Offset, add or subtract an unsigned number
             Immediate
             Register (not PC)
             Scaled register
    Single-Register Load-Store Addressing
    Mode (Cont.)
   Preindex:
    ◼   data: mem[base+offset]
    ◼   Base address register: not updated
    ◼   Ex: LDR r0,[r1,#4] ; r0:=mem32[r1+4]
   Postindex:
    ◼   data: mem[base]
    ◼   Base address register: base + offset
    ◼   Ex: LDR r0,[r1],#4 ; r0:=mem32[r1], then r1:=r1+4
   Preindex with writeback (also called auto-indexing)
    ◼   Data: mem[base+offset]
    ◼   Base address register: base + offset
    ◼   Ex: LDR r0, [r1,#4]! ; r0:=mem32[r1+4], then r1:=r1+4
    Single-Register Load-Store Addressing
    Mode (Cont.)
   Example 8
    ◼   r0 = 0x00000000, r1 = 0x00009000,
        mem32[0x00009000] = 0x01010101,
        mem32[0x00009004] = 0x02020202
    ◼   Preindexing: LDR r0, [r1, #4]
           r0 = 0x02020202, r1=0x00009000
    ◼   Postindexing: LDR r0, [r1], #4
           r0 = 0x01010101, r1=0x00009004
    ◼   Preindexing with writeback: LDR r0, [r1, #4]!
           R0 = 0x02020202, r1=0x00009004
 Single-Register Load-Store Addressing
 Mode (Cont.)
Addressing mode and index method                 Addressing syntax
Preindex with immediate offset                   [Rn, #+/-offset_12]
Preindex with register offset                    [Rn, +/-Rm]
Preindex with scaled register offset             [Rn, +/-Rm, shift #shift_imm]
Preindex writeback with immediate offset         [Rn, #+/-offset_12]!
Preindex writeback with register offset          [Rn, +/-Rm]!
Preindex writeback with scaled register offset   [Rn, +/-Rm, shift #shift_imm]
Immediate postindexed                            [Rn], #+/-offset_12]
Register postindexed                             [Rn], +/-Rm!
Scaled register postindexed                      [Rn], +/-Rm, shift #shift_imm
Examples of LDR Using Different
Addressing Modes
                Instruction                   r0=                    r1+=
Preindex with   LDR r0, [r1, #0x4]!           mem32[r1+0x4]          0x4
writeback
                LDR r0, [r1,r2]!              mem32[r1+r2]           r2
                LDR r0,[r1, r2, LSR#0x4]!     mem32[r1+(r2 LSR 0x4)] (r2 LSR 0x4)
Preindex        LDR r0, [r1, #0x4]            mem32[r1+0x4]          not updated
                LDR r0, [r1, r2]              mem32[r1+r2]           not updated
                LDR r0, [r1, -r2, LSR #0x4]   Mem32[r1-(r2 LSR 0x4)] not updated
Postindex       LDR r0, [r1], #0x4            mem32[r1]              0x4
                LDR r0, [r1], r2              Mem32[r1]              r2
                LDR r0, [r1], r2 LSR #0x4     mem32[r1]              (r2 LSR 0x4)
Multiple-Register Transfer
   Transfer multiple registers between memory
    and the processor in a single instruction
   More efficient than single-register transfer
    ◼   Moving blocks of data around memory
    ◼   Saving and restoring context and stack
Multiple-Register Transfer (Cont.)
   Load-store multiple instruction can increase interrupt
    latency
    ◼   Interrupt can be occurred after an instruction has been
        completed
    ◼   Each load multiple instruction takes 2 + N*t cycles
           N: the number of registers to load
           t: the number of cycles required for sequential access to memory
    ◼   Compilers provides a switch to control the maximum
        number of registers between transferred
           Limit the maximum interrupt latency
Multiple-Register Transfer (Cont.)
   Syntax:
    ◼   <LDM|STM>{<cond>} <mode> Rn{!}, <registers>{^}
    ◼   Address mode: See the next page
    ◼   ^: optional
           Can not be used in User Mode and System Mode
           If op is LDM and reglist contains the pc (r15)
            ◼   SPSR is also copied into the CPSR.
           Otherwise, data is transferred into or out of the User mode
            registers instead of the current mode registers.
Multiple-Register Transfer (Cont.)
   Example 9
    ◼   PRE:
        mem32[0x80018] = 0x03,
        mem32[0x80014] = 0x02,
        mem32[0x80010] = 0x01,
        r0 = 0x00080010,
        r1 = r2 = r3= 0x00000000
    ◼   LDMIA r0!, {r1-r3}, or LDMIA r0!, {r1, r2, r3}
           Register can be explicitly listed or use the “-” character
Pre-Condition for LDMIA Instruction
               Memory Address Data
               0x80020        0x00000005
               0x8001c        0x00000004
               0x80018        0x00000003   R3=0x00000000
               0x80014        0x00000002   R2=0x00000000
R0 = 0x80010   0x80010        0x00000001   R1=0x00000000
               0x8000c        0x00000000
                         Figure 1
Post-Condition for LDMIA Instruction
               Memory Address Data
               0x80020       0x00000005
R0 = 0x8001c   0x8001c       0x00000004
               0x80018       0x00000003   R3=0x00000003
               0x80014       0x00000002   R2=0x00000002
               0x80010       0x00000001   R1=0x00000001
               0x8000c       0x00000000
                         Figure 2
Multiple-Register Transfer (Cont.)
   Example 9 (Cont.)
    ◼   POST:
        r0 = 0x0008001c,
        r1 = 0x00000001,
        r2 = 0x00000002,
        r3 = 0x00000003
Multiple-Register Transfer (Cont.)
   Example 10
    ◼   PRE: as shown in Fig. 1
    ◼   LDMIB r0!, {r1-r3}
    ◼   POST:
        r0 = 0x0008001c
        r1 = 0x00000004
        r2 = 0x00000003
        r3 = 0x00000002
Post-Condition for LDMIB Instruction
               Memory Address Data
               0x80020      0x00000005
R0 = 0x8001c   0x8001c      0x00000004   R3=0x00000004
               0x80018      0x00000003   R2=0x00000003
               0x80014      0x00000002   R1=0x00000002
               0x80010      0x00000001
               0x8000c      0x00000000
                         Figure 3
Multiple-Register Transfer (Cont.)
   Load-store multiple pairs when base update used (!)
    ◼   Useful for saving a group of registers and store them later
            Store multiple        Load multiple
            STMIA                 LDMDB
            STMIB                 LDMDA
            STMDA                 LDMIB
            STMDB                 LDMIA
Multiple-Register Transfer (Cont.)
   Example 11
    ◼   PRE:
        r0 = 0x00009000
        r1 = 0x00000009,
        r2 = 0x00000008
        r3 = 0x00000007
    ◼   STMIB r0!, {r1-r3}
        MOV r1, #1
        MOV r2, #2,
        MOV r3, #3
Multiple-Register Transfer (Cont.)
   Example 11 (Cont.)
    ◼   PRE (2):
        r0 = 0x0000900c
        r1 = 0x00000001,
        r2 = 0x00000002
        r3 = 0x00000003
    ◼   LDMDA r0!, {r1-r3}
    ◼   POST:
        r0 = 0x00009000
        r1 = 0x00000009,
        r2 = 0x00000008
        r3 = 0x00000007
Multiple-Register Transfer (Cont.)
   Example 11 (Cont.)
    ◼   The STMIB stores the values 7, 8, 9 to memory
    ◼   Then corrupt register r1 to r3 by MOV instruction
    ◼   Finally, the LDMDA
           Reloads the original values, and
           Restore the base pointer r0
Multiple-Register Transfer (Cont.)
   Example 12: the use of the load-store multiple
    instructions with a block memory copy
    ;r9 points to start of source data
    ;r10 points to start of destination data
    ;r11 points to end of the source
    loop
         LDMIA r9!, {r0-r7} ;load 32 bytes from source and update r9
         STMIA r10!, {r0-r7} ;store 32 bytes to desti. and update r10
         CMP      r9, r11     ;have we reached the end
         BNE      loop
Multiple-Register Transfer (Cont.)
       High memory
        r11
                       Source
         r9
                                   Copy memory
                                   Location
                                   (transfer 32 bytes in
                                   two instructions)
                     Destination
        r10
       Low memory
Stack Operations
   ARM architecture uses the load-store multiple
    instruction to carry out stack operations
    ◼   PUSH: use a store multiple instruction
    ◼   POP: use a load multiple instruction
   Stack
    ◼   Ascending (A): stack grows towards higher
        memory addresses
    ◼   Descending (D): stack grows towards lower
        memory addresses
Stack Operations (Cont.)
   Stack
    ◼   Full stack (F): stack pointer sp points to the last
        valid item pushed onto the stack
    ◼   Empty stack (E): sp points after the last item on
        the stack
           The free slot where the next data item will be placed
   There are a number of aliases available to
    support stack operations
    ◼   See next page
Stack Operations (Cont.)
   ARM support all four forms of stacks
    ◼   Full ascending (FA): grows up; base register points to
        the highest address containing a valid item
    ◼   Empty ascending (EA): grows up; base register points to
        the first empty location
    ◼   Full descending (FD): grows down; base register points
        to the lowest address containing a valid data
    ◼   Empty descending (ED): grows down; base register
        points to the first empty location below the stack
Addressing Methods for Stack Operations
Addressing Description Pop      =LDM    Push    =STM
mode
FA         Full         LDMFA   LDMDA   STMFA   STMIB
           ascending
FD         Full         LDMFD   LDMIA   STMFD   STMDB
           descending
EA         Empty        LDMEA   LDMDB   STMEA   STMIA
           ascending
ED         Empty        LDMED   LDMIB   STMED   STMDA
           descending
Stack Operations (Cont.)
   Example 13
    ◼   PRE:
           r1 = 0x00000002
           r4 = 0x00000003
           sp = 0x00080014
    ◼   STMFD sp!, {r1, r4}
    ◼   POST:
           r1 = 0x00000002
           r4 = 0x00000003
           sp = 0x0008000c
Stack Operations (Cont.)
   Example 13 (Cont.)
    ◼    STMFD – full stack push operation
                PRE                          POST
           Address      Data            Address      Data
          0x80018    0x00000001        0x80018    0x00000001
    sp
          0x80014    0x00000002        0x80014    0x00000002
          0x80010    Empty             0x80010    0x00000003
                                  sp   0x8000c    0x00000002
          0x8000c    Empty
Stack Operations (Cont.)
   Example 14
    ◼   PRE:
           r1 = 0x00000002
           r4 = 0x00000003
           sp = 0x00080010
    ◼   STMED sp!, {r1, r4}
    ◼   POST:
           r1 = 0x00000002
           r4 = 0x00000003
           sp = 0x00080008
Stack Operations (Cont.)
    Example 14 (Cont.)
     ◼   STMED – empty stack push operation
                   PRE                          POST
          Address        Data          Address      Data
         0x80018    0x00000001        0x80018    0x00000001
         0x80014    0x00000002        0x80014    0x00000002
    sp   0x80010    Empty             0x80010    0x00000003
         0x8000c    Empty             0x8000c    0x00000002
                                 sp
         0x80008    Empty             0x80008    Empty
SWAP Instruction
   A special case of a load-store instruction
    ◼   Swap the contents of memory with the contents
        of a register
    ◼   An atomic operation
           Cannot not be interrupted by any other instruction or
            any other buy access
           The system “holds the bus” until the transaction is
            complete
           Useful when implementing semaphores and mutual
            exclusion in an operating system
SWAP Instruction (Cont.)
   Syntax: SWP{B}{<cond>} Rd, Rm, [Rn]
    ◼   tmp = mem32[Rn]
    ◼   Mem32[Rn] = Rm
    ◼   Rd = tmp
   SWP: swap a word between memory and a
    register
   SWPB: swap a byte between memory and a
    register
SWAP Instruction (Cont.)
   Example 15
    ◼   PRE:
           Mem32[0x9000] = 0x12345678
           r0 = 0x00000000
           r1 = 0x11112222
           r2 = 0x00009000
    ◼   SWP r0, r1, [r2]
    ◼   POST:
           mem32[0x9000] = 0x11112222
           r0 = 0x12345678
           r1 = 0x11112222
           r2 = 0x00009000
SWAP Instruction (Cont.)
   Example 15 (Cont.)
    SPIN
        MOV r1, =semaphore
        MOV r2, #1
        SWP r3, r2, [r1] ;hold the bus until complete
        CMP r3, #1
        BEQ spin
   The address pointed by the semaphore either contains the
    value of 1 or 0
   When semaphore value == 1 , loop until semaphore becomes
    0 (updated by the holding process)
Software Interrupt Instruction
   SWI: software interrupt instruction
    ◼   Cause a software interrupt exception
    ◼   Provide a mechanism for applications to call
        operating system routines
    ◼   Each SWI instruction has an associated SWI
        number
           Used to represent a particular function call or routines
Software Interrupt Instruction (Cont.)
   Syntax: SWI{<cond>} SWI_number
    ◼   lr_svc = address of instruction following the SWI
    ◼   spsr_svc = cpsr
    ◼   pc = vector table + 0x8 ; jump to the swi
        handling
    ◼   cpsr mode = SVC
    ◼   cpsr I = 1 (mask IRQ interrupt)
Software Interrupt Instruction (Cont.)
   Example 16
    ◼   PRE:
           cpsr = nzcVqift_USER
           pc = 0x00008000
           lr = r14 = 0x003fffff
    ◼   0x00008000 SWI 0x123456
    ◼   POST:
           cpsr = nzcVqIft_SVC
           spsr = nzcVqift_USER
           pc = 0x00000008
           lr = 0x00008004
Program Status Register Instructions
   MRS
    ◼   Transfer the contents of either the cpsr or spsr
        into a register
   MSR
    ◼   Transter the contents of a register into the cpsr or
        spsr
Program Status Register Instructions
(Cont.)
   Syntax
    ◼   MRS{<cond>} Rd, <cpsr|spsr>
    ◼   MSR{<cond>} <cpsr|spsr>_<fields>, Rm
    ◼   MSR{<cond>} <cpsr|spsr>_<fields>, #immediate
   Field: any combination of
    ◼   Flags: [24:31]
    ◼   Status: [16:23]
    ◼   eXtension[8:15]
    ◼   Control[0:7]
PSR Registers
Program Status Register Instructions
(Cont.)
   Note: You cannot access the SPSR in User or
    System Mode
    ◼   Assembler cannot warn you because it does not
        know which mode will be executed in
Program Status Register Instructions
(Cont.)
   Example 17
    ◼   PRE:
           cpsr = nzcvqIFt_SVC
    ◼   MRS r1, cpsr
    ◼   BIC r1, r1, #0x80 ;0b10000000, clear bit 7
    ◼   MSR cpsr_c, r1    ;enable IRQ interrupts
    ◼   POST:
           cpsr = nzcvqiFt_SVC
    ◼   Note that, this example must be in SVC mode
           In user mode, you can only read all cpsr bits and can only update
            the condition flag field f, i.e., cpsr[24:31]
Conditional Execution
   Almost all ARM instruction can include an
    optional condition code
    ◼   Instruction is only executed if the condition code
        flags in the CPSR meet the specified condition
    ◼   The default is AL, or always execute
   Conditional executions depends on two
    components
    ◼   The condition field: located in the instruction
    ◼   The condition flags: located in the cpsr
Conditional Execution (Cont.)
   Example 18
    ADDEQ r0, r1, r2
    ; r0 = r1 + r2 if zero flag is set
Condition Codes
Conditional Execution (Cont.)
   Thus, before activate conditional execution
    ◼   There must be an instruction that updates the
        conditional code flag according the result
    ◼   If not specified, instructions will not update the
        flags
   To make an instruction update the flags
    ◼   Include the S suffix
    ◼   Example: ADDS r0, r1,r2
Conditional Execution (Cont.)
   However, some instructions always update the flags
    ◼   Do not require the S suffix
    ◼   CMP, CMN, TST, TEQ
   Flags are preserved until updated
   Thus, you can execute an instruction conditionally,
    based upon the flags set in another instruction, either:
    ◼   Immediately after the instruction which updated the flags
    ◼   After any number of intervening instructions that have not
        updated the flags.
Conditional Execution (Cont.)
   Example 18
    ◼   Transfer the following code into the assembly
        language
    ◼   Assume r1 = a, r2 = b
        while ( a!= b )
        {
               if (a > b) a -= b; else b -= a;
         }
Conditional Execution (Cont.)
   Example 18: Solution 1
      gcd
            CMP         r1, r2
            BEQ         complete
            BLT         lessthan
            SUB         r1, r1, r2
            B           gcd
      lessthan
            SUB         r2, r2, r1
            B           gcd
      complete
Conditional Execution (Cont.)
   Example 18: Solution 2
      gcd
            CMP         r1, r2
            SUBGT       r1, r1, r2
            SUBLT       r2, r2, r1
            BNE         gcd
   Solution 2 dramatically reduces the number of
    instructions !!!
References
   Andrew N. Sloss, “ARM System Developer’s
    Guide: Designing and Optimizing System
    Software,” Morgan Kaufmann Publishers,
    2004
    ◼   Chapter 3: Introduction to the ARM Instruction
        Set
ARM7TDMI Microprocessor
   Thumb Instruction Set
                           107 of 37
        Processor Operating States
ARM state
            which executes 32-bit, word-aligned ARM
instructions.
THUMB state
        which operates with 16-bit, halfword-aligned
THUMB instructions.
                                               108 of 37
    Thumb Instruction Set
•ARM architecture versions v4T and above define a 16-bit
instruction set called the Thumb instruction set. The
functionality of the Thumb instruction set is a subset of the
functionality of the 32-bit ARM instruction set.
•A processor that is executing Thumb instructions is
operating in Thumb state. A processor that is executing ARM
instructions is operating in ARM state.
                                               109 of 37
 Thumb Instruction Set
•A processor in ARM state cannot execute Thumb
instructions, and a processor in Thumb state cannot
execute ARM instructions.
•Each instruction set includes instructions to change
processor state.
Note: ARM processors always start executing code in
ARM state.
                                           110 of 37
    Thumb Instruction Set
•Thumb does not provide direct access to the CPSR or any
SPSR.
•Thumb execution is flagged by the T bit(bit[5]) in the CPSR.
T==0    32-bit instructions are fetched(ARM instruction)
T==1    16-bit instructions are fetched(Thumb instruction)
                                                 111 of 37
              Thumb applications
        In a typical embedded system:
          use ARM code in 32-bit on-chip memory for small speed- critical routines
          use Thumb code in 16-bit off-chip memory for large non-critical control routines
Note:
Switching between ARM and Thumb States of Execution Using BX
Instruction                                       112 of 37
              Thumb applications
For Most Instruction Generated by the Compiler
  Condition Execution is not used.
  Source and Destination Registers are identical
  Only low registers used
  Constants are limited size
  Inline barrel shifter not used
                                                   113 of 37
  DATA TYPES
Byte (8-bit):
     placed on any byte boundary.
Half-word (16-bit):
     aligned to two-byte boundaries.
Word (32-bit):
     aligned to four- byte boundaries.
                                         114 of 37
         Features
•Not a complete architecture
•Dynamically decompressed to ARM Instruction
•Fully supported by ARM development tools
•Both entry and exit are done using corresponding BX
Instruction
•Increases the maximum clock rate to 40 MHz
•Expanded Cache to 8 kB
•Thumb is a combination of new instruction set with16 bit long
instruction format & Hardware logic unit is present.
•Translated thumb instruction to regular
•Thumb improves ARM instruction density by about 25% to
                                                 115 of 37
35%
•16 bit wide memory
    Thumb State Philosophy
The Thumb instruction set(16 bit) addresses the issue of code density.
It may be viewed as a compressed form of a subset of the ARM instruction set
Thumb instructions map onto ARM instructions
The Thumb programmer’s model map onto the ARM programmer’s model
Implementations of Thumb use dynamic decompression in an ARM instruction
pipeline & then instructions execute as standard ARM instructions within the
processor
Thumb is not a complete architecture; it is not anticipated that a processor would
execute Thumb instructions without supporting the ARM instruction set.
Therefore Thumb instruction set need to only support common application functions.
Exceptions will not be handled in THUMB state                   116 of 37
117 of 37
118 of 37
119 of 37
Thumb-ARM Decompression
•Translation from 16-bit Thumb instruction to 32-bit
ARM instruction
•Condition bits changed to ‘always’
•Lookup to translate major and minor opcodes
•Zero extending 3-bit register specifiers to give 4-bit
specifiers
•Zero extending immediate values
•Implicit ‘S’(affecting condition codes) should be
explicitly specified.
•Thumb 2-address format must be mapped to ARM
3-address format                              120 of 37
THUMB-ARM Instruction Mapping
                        121 of 37
❖ So where performance is all important, a system should use 32 bit memory and run
  ARM code
❖ Where Power consumption and cost are more important , a 16 bit memory system
  and THUMB code may be a better choice
                                                                    122 of 37
     Mode Switching
•Default entry to exception mode is always ARM
•Explicit entry to Thumb is done using ARM mode BX
Instruction
•Explicit entry back to ARM mode is done using Thumb
mode BX Instruction
                                          123 of 37
124 of 37
125 of 37
      Thumb Programmers Model
•Registers r0 to r7 are accessible (Lo)
•Few instructions require r8 to r15 to be specified
•r13 is used as the stack pointer
•r14 is used as the link register
•r15 is used as the program counter
                                               126 of 37
127 of 37
128 of 37
129 of 37
130 of 37
131 of 37
THUMB Programmer’s Model
                           132 of 37
THUMB Register Organisation
                 Thumb General registers and Program Counter
 User / System        FIQ            Supervisor      Abort         IRQ                 Undefined
          r0            r0                   r0         r0         r0                       r0
          r1            r1                   r1         r1         r1                       r1
          r2            r2                   r2         r2         r2                       r2
          r3            r3                   r3         r3         r3                       r3
          r4            r4                   r4         r4         r4                       r4
          r5            r5                   r5         r5         r5                       r5
          r6            r6                   r6         r6         r6                       r6
          r7            r7                   r7         r7         r7                       r7
          SP           SP_FIQ              SP_SVC      SP_ABT     SP_IRQ                  SP_UND
          LR           LR_ FIQ             LR_ SVC    LR_ ABT     LR_ IRQ                 LR_ UND
         PC            PC_ FIQ             PC_ SVC    PC_ ABT     PC_ IRQ                 PC_ UND
                                 Thumb Program Status Registers
         CPSR       CPSR                  CPSR         CPSR          CPSR                   CPSR
                  sprsr_fiq
                   SPSR_FIQ             SPSR_SVC     SPSR_ABT     sprsr_fiq
                                                                      SPSR_IRQ          SPSR _UND
                                                                                        sprsr_fiq
                                                                                 133 of 37
     ARM-Thumb Similarities
•Load-store architecture
•Support 8-bit byte, 16-bit half-word and 32 bit word
data types with aligned boundaries
•32 bit unsegmented memory.
•However , in order to achieve a 16 bit instruction
length a number of characteristic features of the ARM
instruction set have not been supported in Thumb state
                                           134 of 37
 ARM-Thumb differences
•Unconditional Execution of instruction except branch instructions
    Where all ARM instructions are executed conditionally
•2-address format for data processing
     ARM data processing instructions uses 3 address format
        (Except 64 bit MUL instructions)
•Thumb instruction are Less regular instruction formats than ARM, as
a result of the dense encoding
•There are NO status register access instructions(MRS/MSR) in
Thumb state
•Many addressing modes of ARM not supported in Thumb state
                                                    135 of 37
•No banked registers and privileged modes in Thumb state
ARM-Thumb differences
The biggest register difference involves the SP register
The Thumb state has unique mnemonics (PUSH, POP) that
don’t exist in ARM state
These instructions assume the existence of a stack pointer,
for which R13 is used
   They translate into load and store instructions internally
No SWP instructions in Thumb state
No support for coprocessor instructions in Thumb state
Barrel shifter operations are separate instructions
                                              136 of 37
      Thumb exception
•With exception processor is returned to ARM mode.
•While returning previous mode is restored as SPSR is
transferred to CPSR
•Use of the Thumb instruction set can improve code
density , Power efficiency, Save cost and Enhance
performance all at one
                                             137 of 37
    Thumb Branching
•Short conditional branches
•Medium range unconditional branches
•Long range Subroutine calls
•Branch to change to ARM Mode
                                       138 of 37
Branch Instruction Formats
  B <cond> <label>
     15 14 13 12 11            8 7                               0
     1   1   0   1 Condition               8-Bit Offset
  B <label>
     1 1 1 0 0                      11 – Bit   Offset
  BL <label>
     1   1   1     H                11 – Bit Offset
  BX Rm
     0 1     0 0   0   1 1 1    0      H       Rm       0   0   0
                                                                139 of 37
THUMB Branch Instructions
                            140 of 37
Features
•Different format for each case
•Offset is reduced to 11bit and 8 bit
•Offset is shifted left by 1-bit (to give half-word
alignment) and sign-extended to 32 bits.
•BL is more subtle to give 22-bit offset using link register
for temporary storage
•No direct mapping to ARM instructions as Thumb
require half-word aligned offsets.
                                               141 of 37
BL Instruction
To allow for a reasonably large offset to the target
    subroutine each of these two instructions is
    automatically translated by the assembler into a
    sequence of two 16 bit thumb instructions
1. H = 10
   LR := PC + (sign-extended offset shifted left 12 places);
2. H = 11
   PC := LR + (offset shifted left 1 place)
3. LR := address of next instruction                           142 of 37
Software Interrupt Instruction
   1 1 0 1 1 1 1 1        8 – Bit Immediate
•Address of next instruction is saved in r14_svc
•CPSR is saved in r14_svc
•Disables IRQ, Clears T bit, Enters Supervisor mode
•PC is forced to 0x08
•8 bit immediate is zero extended to fill the 24-bit
field in the ARM instruction.
•Limits SWIs to first 256 of 16 million ARM SWIs.
                                              143 of 37
      Data Processing
•Fairly complex instruction formats
•No conditional execution
•Separate shift operations provided, no shifting of
second operand
•All data processing instruction set condition code bits
(no need of ‘S’)
                                               144 of 37
THUMB data processing instructions
                              145 of 37
THUMB data processing instructions
                              146 of 37
        Instruction formats
•<op>    Rd, Rn, Rm
•<op>    Rd, Rn, # <imm3>
•<op>    Rd|Rn, Rm|Rs
•<op>    Rd, Rn, #<sh 5>
•<op>    Rd, #<imm 8>
                              147 of 37
Instructions
•MOV    Rd, #<imm8>
•MVN    Rd, Rm
•CMP    Rn, #<imm8>
•CMP    Rn, Rm
•CMN    Rn, Rm
•TST    Rn, Rm
                      148 of 37
Instruction
•ADD   Rd, Rn, #<imm3>
•ADD   Rd, #< imm8>
•ADD   Rd, Rn, Rm
•ADC   Rd, Rm
•SUB   Rd, Rn, #<imm3>
•SUB   Rd, #< imm8>
•SUB   Rd, Rn, Rm
•SBC   Rd, Rm
•NEG   Rd, Rn
                         149 of 37
Instruction
    •LSL Rd, Rm, #<#sh>
    •LSL Rd, Rs
    •LSR Rd, Rm, #<#sh>
    •LSR Rd, Rs
    •ASR Rd, Rm, #<#sh>
    •ASR Rd, Rs
    •ROR Rd, Rs
                          150 of 37
Instruction
   •AND       Rd, Rm
   •EOR       Rd, Rm
   •ORR       Rd, Rm
   •BIC       Rd, Rm
   •MUL       Rd, Rm
                       151 of 37
Instruction (using Hi registers)
•ADD Rd, Rm       (1 or 2 Hi registers)
•CMP Rn, Rm       (1 or 2 Hi registers)
•MOV Rd, Rm       (1 or 2 Hi registers)
•ADD Rd, PC, #<imm8>
•ADD Rd, SP, #<imm8>
•ADD SP, SP, #<imm7>
•SUB SP, SP, #<imm7>
   Except others donot set condition code bits
                                                 152 of 37
THUMB Single register data transfer
                             153 of 37
Data Transfer Instruction
    •LDR|STR           Rd, [Rn, #off5]
    •LDR|STR           Rd, [Rn, Rm]
    •LDRB|STRB         Rd, [Rn, #off5]
    •LDRB|STRB         Rd, [Rn, Rm]
    •LDRH|STRH         Rd, [Rn, #off5]
    •LDRH|STRH         Rd, [Rn, Rm]
    Signed operands:
    •LDR|STR {S} {H|B} Rd, [Rn, Rm]
                                         154 of 37
THUMB Multiple register data transfer
                               155 of 37
   Multiple register transfers
•LDMIA|STMIA Rn!, { <reg list> }
•Rn may be any register among Ro – R7
•Register set can be any subset of R0 – R7 but not
base register ‘Rn’
•Write back to base register is always selected.
                                              156 of 37
  Stack Mode
•POP|PUSH { <reg list> {, R}}
•R13 (sp) is used as base register
•Uses Full Descending Stack
•In addition any subset of Ro-R7 registers LR
(lr) may be included in PUSH instruction and
PC (pc)may be included in POP instruction
                                          157 of 37
Properties
•Thumb code requires 70% of space of ARM code
•Thumb code uses 40% more instructions than the ARM
code
•With 32-bit memory ARM code is 40% faster
•With 16-bit memory Thumb code is 45% faster than ARM
code
•Thumb code uses 30% less external memory power than
ARM code.
                                        158 of 37
Thumb Applications
                     159 of 37