4 Isa 2
4 Isa 2
Brett H. Meyer
Winter 2024
Revision history:
Warren Gross – 2017
Christophe Dubach – W2020, F2020, F2021, F2022, F2023
Brett H. Meyer – W2021, W2022, W2023, W2024
Some material from Hamacher, Vranesic, Zaky, and Manjikian, Computer Organization and Embedded Systems, 6 th ed, 2012, McGraw Hill
and Patterson and Hennessy, Computer Organization and Design, ARM Edition, Morgan Kaufmann, 2017, and notes by A. Moshovos
                                                                                                                                     1
Disclaimer
                                                                             2
Introduction
Instruction Set Architecture
                                                                      3
Instruction Set Architecture
   Note: the ISA need not define how hardware will implement any
   given feature.
                                                                    4
Different Implementations of an ISA
                                                                         5
The ARM Architecture
                                                                                  6
ARM ISA
                                                                                7
In the lab you will program an ARM Cortex-A9 processor
implementing the ARMv7-A ISA.
From now on, I will just refer to “ARM ISA” or “ARM assembly
language.”
                                                                         8
ARM ISA
ARM ISA
Overview
Textbook§D.1, D.2
ARM ISA Basics
                                                                            9
ARM ISA Basics
                                                                            9
ARM ISA Memory
                                                                       10
ARM Programmer-visible Registers
                                                                           11
ARM Programmer-visible Registers
                                                                             11
ARM Programmer-visible Registers
Syntax
Textbook§2.5, D.4
Assembly Language Syntax
     • ADD is a mnemonic
     • R1 is a destination register; the first operand
     • R2 and R3 are source registers; the second and third operand
     • // R1 <-- R2 + R3 is a comment (not a very useful one)
                                                                      12
There are different ways to use each instruction.
Here, the syntax of the instruction is ADD Rd, Rn, Imm where
                                                               13
Instruction Format and Operands
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                               14
Instruction Format and Operands
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                               15
ARM ISA
                                                                         16
Logic Instructions
                                                                       17
Shift and Rotate Instructions
     • Less significant bits (on the right of the register) are moved into
       the most significant positions (on the left of the register).
                                                                             18
Arithmetic Instructions
   Addition/subtraction instructions:
   ADD   R0,   R1,   R2          //   R0   <--   R1   +   R2
   ADD   R0,   R1,   #-24        //   R0   <--   R1   +   (-24)
   SUB   R0,   R1,   #24         //   R0   <--   R1   -   (24)
   ADD   R0,   R1,   R2, LSL#2   //   R0   <--   R1   +   R2*4
                                                                  19
Arithmetic Instructions
   Addition/subtraction instructions:
   ADD   R0,   R1,   R2          //   R0   <--   R1   +   R2
   ADD   R0,   R1,   #-24        //   R0   <--   R1   +   (-24)
   SUB   R0,   R1,   #24         //   R0   <--   R1   -   (24)
   ADD   R0,   R1,   R2, LSL#2   //   R0   <--   R1   +   R2*4
   Multiply-accumulate instruction
   MLA   R2, R3, R4, R5          // R2 <-- (R3 * R4) + R5
                                                                            19
Arithmetic Instructions
   Addition/subtraction instructions:
   ADD   R0,   R1,   R2          //   R0   <--   R1   +   R2
   ADD   R0,   R1,   #-24        //   R0   <--   R1   +   (-24)
   SUB   R0,   R1,   #24         //   R0   <--   R1   -   (24)
   ADD   R0,   R1,   R2, LSL#2   //   R0   <--   R1   +   R2*4
   Multiply-accumulate instruction
   MLA   R2, R3, R4, R5          // R2 <-- (R3 * R4) + R5
                                                                    20
ARM ISA
Memory Instructions
Textbook§2.4, D.3
Arrays in C (Review)
short arr [ 5 ] = { 1 , 2 , 3 , 4 , 5}
                                                                                               22
Array Access Example
                                                                                               22
int arr [ 8 ] = { 1 7 , 58 , 79 , 15 ,   . . . } ; // s i z e o f ( i n t ) = 4 b y t e s
...
for ( int i =0; i <8; i ++) {
  v = arr [ i ] ;
   ...
  arr [ i ] = v ;
}
                                                                                            23
int arr [ 8 ] = { 1 7 , 58 , 79 , 15 ,   . . . } ; // s i z e o f ( i n t ) = 4 b y t e s
...
for ( int i =0; i <8; i ++) {
  v = arr [ i ] ;
   ...
  arr [ i ] = v ;
}
                                                                                            23
int arr [ 8 ] = { 1 7 , 58 , 79 , 15 ,   . . . } ; // s i z e o f ( i n t ) = 4 b y t e s
...
for ( int i =0; i <8; i ++) {
  v = arr [ i ] ;
   ...
  arr [ i ] = v ;
}
        Address     Content
                      ...                  Assume the base address
         0x0100    MOV R2,#4               of arr is 0x1000 (R1) and
         0x0104 MUL R2,R0,R2
                                           i=3 (R0). After execution
         0x0108 ADD R3,R1,R2
                                           of the load:
         0x010C   LDR R4,[R3]
                      ...                   Registers
         0x1000       17
                                              R0     0x00000003
         0x1004       58                      R1     0x00001000
         0x1008       79                      R2     0x0000000C
                                              R3     0x0000100C
         0x100C       15
                                              R4     0x0000000F
                      ...
                                                                                            24
Load and Store Instructions
   Memory accesses commonly∗ access words and take the form of:
   LDR   Rd, <EA>   // Rd <-- Mem[EA]; reads a 32-bit word
   STR   Rm, <EA>   // Mem[EA] <-- Rm; writes a 32-bit word
                                                                        25
Load and Store Instructions
   Memory accesses commonly∗ access words and take the form of:
   LDR   Rd, <EA>   // Rd <-- Mem[EA]; reads a 32-bit word
   STR   Rm, <EA>   // Mem[EA] <-- Rm; writes a 32-bit word
                                                                                     26
Back to our Example
     ...
     v = arr [ i ] ;
     ...
                                                                        27
Back to our Example
     ...
     v = arr [ i ] ;
     ...
   Index: EA = R1 + R2
   MOV   R2, #4       // R2 <-- 4
   MUL   R2, R0, R2   // R2 <-- i*4
   LDR   R4, [R1, R2] // R4 <-- Mem[R1+R2]
                                                                        27
Back to our Example
     ...
     v = arr [ i ] ;
     ...
   Index: EA = R1 + R2
   MOV   R2, #4       // R2 <-- 4
   MUL   R2, R0, R2   // R2 <-- i*4
   LDR   R4, [R1, R2] // R4 <-- Mem[R1+R2]
   Here’s our C code again, but this time we’re copying into arr:
   int arr [ 8 ] = { 1 7 , 58 , 79 , 15 ,   . . . } ; // s i z e o f ( i n t ) = 4 b y t e s
   ...
   for ( int i =0; i <8; i ++) {
     v = arr [ i ] ;
      ...
     arr [ i ] = v ;
   }
                                                                                               28
Checkpoint
                                                                            29
Pointers in C (Review)
What is p+1?
                                                                    31
i n t a r r [ 8 ] = { 5 6 , 2 6 , 8 8 , 4 5 , −45 , 7 7 , 9 8 , 1 3 } ;
print ( arr ) ;
p r i n t (& a r r [ 1 ] ) ;                                              Address   Content
                                                                                       ...
i n t * p t r = &a r r [ 1 ] ;
print ( ptr ) ;                                                            0x1000     56
print (* ptr ) ;                                                           0x1004     26
                                                                           0x1008     88
print ( ptr + 2 ) ;
print (*( ptr + 2 ) ) ;                                                    0x100C     45
                                                                           0x1010    -45
print ( ptr + + ) ;
print ( ptr ) ;                                                            0x1014     77
                                                                           0x1018     98
print (++ ptr ) ;                                                          0x101C     13
print ( ptr ) ;
                                                                                      ...
print (*( ptr + + ) ) ;
print (*(++ ptr ) ) ;
                                                                                              32
Pointers in Assembly
 C code without pointers:
 int arr [8] = . . . ;
 for ( int i =0; i <8; i ++) {
   v = arr [ i ] ;
    ...
 }
                                    33
Pointers in Assembly
 C code without pointers:           C code with pointers:
 int arr [8] = . . . ;              int arr [8] = . . . ;
 for ( int i =0; i <8; i ++) {      int * ptr = arr ;
   v = arr [ i ] ;                  while ( p t r < ( a r r + 8 ) ) {
    ...                               v = *( ptr + + ) ;
 }                                     ...
                                    }
 Loop body in assembly:
 // R0 = i
 // R1 = base address of arr
 // R2 = v
 LDR R2,[R1,R0,LSL#2] // v=arr[i]
 ADD R0,R0,#1         // i++
                                                                        33
Pointers in Assembly
 C code without pointers:             C code with pointers:
 int arr [8] = . . . ;                 int arr [8] = . . . ;
 for ( int i =0; i <8; i ++) {         int * ptr = arr ;
   v = arr [ i ] ;                     while ( p t r < ( a r r + 8 ) ) {
    ...                                  v = *( ptr + + ) ;
 }                                        ...
                                       }
 Loop body in assembly:
                                      Loop body in assembly:
 // R0 = i
 // R1 = base address of arr          // R0 =      ptr
 // R2 = v                            // R1 =      v
 LDR R2,[R1,R0,LSL#2] // v=arr[i]     LDR R1,      [R0]         // v = *ptr
 ADD R0,R0,#1         // i++          ADD R0,      R0, #4       // ptr=ptr+4
                                                                               33
Post/Pre-indexed Addressing Mode
                                             Address     Content
                                                           ...
                                              0x1000       56
Assuming R0=0x1008 before the LDR
                                              0x1004       26
instruction executes, what’s the content
of R0 and R1 after the instruction            0x1008       88
executes?
                                              0x100C       45
                                              0x1010      -45
                                                           ...
                                                                        35
Load/Store Addressing Mode Summary
           Name                   Assembler Syntax       Address Generation
           Offset:
              immediate offset    [Rn,#offset]           Address = Rn + offset
              offset in Rm        [Rn,±Rm,shift]         Address = Rn ± shifted(Rm)
           Pre-indexed:
              immediate offset    [Rn,#offset]!          Address = Rn + offset
                                                         Rn ← Address
              offset in Rm        [Rn,±Rm,shift]!        Address = Rn ± shifted(Rm)
                                                         Rn ← Address
           Post-indexed:
             immediate offset     [Rn],#offset           Address = Rn
                                                         Rn ← Rn + offset
              offset in Rm        [Rn],±Rm,shift         Address = Rn
                                                         Rn ← Rn ± shifted(Rm)
                                                                      37
Loading and Storing Multiple Words
   LDM and STM load and store blocks of words in consecutive memory
   addresses into multiple registers.
   STM: registers are accessed in order from largest-to-smallest index
   (R15..R0)
   LDM: registers are accessed in order from smallest to largest index
   (R0..R15)
   To determine the direction in which memory addresses are
   computed, you must use one of the following suffixes for the
   mnemonic to determine how to update the address:
     •   IA – Increment After the transfer (default)
     •   IB – Increment Before the transfer
     •   DA – Decrement After the transfer
     •   DB – Decrement Before the transfer
   Registers need not be consecutive, e.g.,: LDMIA R8, {R0,R2,R9}.
                                                                         38
Example:
LDMIA R3 ! , { R4 , R6−R8 , R10 }
R4 ← Mem[R3]
R6 ← Mem[R3 + 4]
R7 ← Mem[R3 + 8]
R8 ← Mem[R3 + 12]
R10 ← Mem[R3 + 16]
R3 ← R3 + 20 // increment after
                                    39
PC-relative Addressing
                                                       Address      Content
   • The PC can be used as the base register
     to access memory locations in terms of                           ...
     their distance relative to PC+8
                                                        0x0FF0        96
        • Recall pipelining
        • The CPU updates PC ← PC+4 upon                0x0FF4        -8
          fetching instruction i
        • While i is being decoded, i + 1 is fetched    0x0FF8        78
                                                                                   40
PC-relative Addressing
                                                       Address      Content
   • The PC can be used as the base register
     to access memory locations in terms of                           ...
     their distance relative to PC+8
                                                        0x0FF0        96
        • Recall pipelining
        • The CPU updates PC ← PC+4 upon                0x0FF4        -8
          fetching instruction i
        • While i is being decoded, i + 1 is fetched    0x0FF8        78
                                                                       41
Loading 32-bit Constants into Registers
   We often need a way to load large constant values into registers, e.g.,
   32-bit addresses. The assembler uses a pseudo-instruction to do this.
   LDR Rd, =value   // pseudo-instruction: is it a load? a mov?
                                                                             42
Example of 32-bit Constants (and our first programs!)
NZC V I F M[4:0]
        • Condition code flag bits are set to 1 when the condition is true
                • Recall ALU flags: N = Negative, Z = Zero, C = Carry, V = Overflow
        • Interrupt flags
                • I = IRQ mask bit, F = FRQ (Fast interrupt) mask bit
        • Processor mode
                •   10000 = User (most of user code)
                •   10001 = Serving fast interrupt (when dealing with I/O)
                •   10010 = Serving normal interrupt (when dealing with I/O)
                •   10011 = Supervisor (used by the Operating System)
    CPSR is not a general-purpose register
    Special instructions modify the CPSR, directly or as a side-effect,
    while others will behave differently depending on CPSR state.
                                                                                                         45
Condition Codes
                                                                         46
Branch Instructions
                                                                            47
Test & Compare Instructions
                                                                         48
Example
                                                                        49
Setting Conditions Codes with S Suffix
   Condition codes are set based on the result of the data processing
   instruction.
   Note that the following two instructions set condition codes in the
   same manner:
   SUBS   R0, R1, R2
   CMP    R1, R2
                                                                          50
Conditional Execution
                                        LDR     R0,   X
 i f ( x >3)
                                        CMP     R0,   #3 // set flags
    y = 7;
                                        MOVGT   R1,   #7 // if R0-3 > 0
 else
                                        MOVLE   R1,   #13 // if R0-3 <= 0
    y = 13;
                                        STR     R1,   Y
                                                                            51
ARM ISA
       dotP = 0 ;
       f o r ( i = 0 ; i <n ; i + + )
           dotP += v e c t o r A [ i ] * v e c t o r B [ i ] ;
      • .word a, b, c, ...
        allocate storage for 1 or more words (4 bytes each) and initialize
        with the values a, b, c, ...
      • .space 4
        allocate 4 bytes without initialization
      • n, vectorA, ... are addresses (labels) corresponding to the start
        of the allocated space
                                                                                               53
The for loop expands to a number of initialization instructions and
other code that is repeated once each iteration.
dotP = 0 ;
f o r ( i = 0 ; i <n ; i + + )
    dotP += v e c t o r A [ i ] * v e c t o r B [ i ] ;
MOV R6 , #0 // i n i t i a l i z e i t e r a t i o n v a r i a b l e i
LOOP :
 CMP R6 , R2                   //   do i −n and s e t f l a g s a c c o r d i n g l y
 BGE END                       //   we ’ re done i f i −n >= 0 ( i f i >= n )
 LDR R4 , [ R0 ] , #4          //   g e t v e c t o r A [ i ] ; post − index increments R0 a f t e r
 LDR R5 , [ R1 ] , #4          //   g e t v e c t o r B [ i ] ; post − index increments R1 a f t e r
 MLA R3 , R4 , R5 , R3         //   R3 <−− ( R4 * R5 ) + R3
 ADD R6 , R6 , # 1             //   i ++
 B     LOOP
END :
 STR R3 , dotP                 // Mem[ dotP ] <−− R3
                                                                                                         54
A more efficient approach uses SUBS:
dotP = 0 ;
i = n;
do { // assumes t h e r e i s at l e a s t one element i n each a r r a y
  dotP += v e c t o r A [ i ] * v e c t o r B [ i ] ;
  i − −;
} while ( i > 0 )
LOOP :
 LDR     R4 , [ R0 ] , #4    //   g e t v e c t o r A [ i ] ; post − index increments R0 a f t e r
 LDR     R5 , [ R1 ] , #4    //   g e t v e c t o r B [ i ] ; post − index increments R1 a f t e r
 MLA     R3 , R4 , R5 , R3   //   R3 = ( R4 * R5 ) + R3
 SUBS    R2 , R2 , # 1       //   i − − and s e t c o n d i t i o n f l a g s
 BGT     LOOP                //   we ’ re not done i f i >0
STR R3 , dotP
                                                                           56
Full dot product code in ARM assembly
   . g l o b a l _ s t a r t // t e l l s the assembler / l i n k e r where to s t a r t e x e c u t i o n
   n:              . word 6
   v e c t o r A : . word 5 , 3 , − 6 , 1 9 , 8 , 1 2
   v e c t o r B : . word 2 , 1 4 , − 3 , 2 , − 5 , 3 6
   dotP :          . space 4
   _start :
    MOV R3 ,      #0                  //   r e g i s t e r R3 w i l l accumulate the product
    LDR R0 ,      = vectorA           //   R0 = v e c t o r A base address ( pseudo − i n s t r u c t i o n )
    LDR R1 ,      = vectorB           //   R1 = v e c t o r B base address ( pseudo − i n s t r u c t i o n )
    LDR R2 ,      n                   //   R2 =6 ( R2 i s our loop i t e r a t i o n v a r i a b l e i )
   LOOP :
    LDR      R4 , [ R0 ] , #4         //   g e t v e c t o r A [ i ] ; post − index increments R0 a f t e r
    LDR      R5 , [ R1 ] , #4         //   g e t v e c t o r B [ i ] ; post − index increments R1 a f t e r
    MLA      R3 , R4 , R5 , R3        //   R3 = ( R4 * R5 ) + R3
    SUBS     R2 , R2 , # 1            //   i − − and s e t c o n d i t i o n f l a g s
    BGT      LOOP                     //   we ’ re not done i f i >0
   STOP :
     B         STOP                   // i n f i n i t e loop once we ’ re done
                                                                                                                57
ARM ISA
Subroutine Calls
Textbook§2.6, 2.7, D.4
Subroutines
   void main ( ) {
     i n t sum = 0 ;
       sum += add3 ( 1 , 2 , 3 ) ;
       sum += 1 0 ;
       sum += add3 ( 1 0 , 2 0 , 3 0 ) ;
                                                                            58
Requirements for calling subroutines:
  • We should be able to call a
    subroutine from anywhere in our       i n t add3 ( i n t a , i n t b , i n t c )
    program, i.e., change the PC so       {
    that the routine is executed              return a + b + c ;
                                          }
  • A subroutine must be able to
    return, i.e., change the PC so that   void main ( ) {
    execution continues immediately         i n t sum = 0 ;
                                                                                       59
Calling and Returning
   To return, branch to the address stored in the link register with the
   BX instruction (branches to the address in a register).
   BX Rn       // PC <-- Rn
 C code:
 boo ( ) {
   coo ( ) ;
                              ARM assembly:
    ...                       boo:   BL coo // LR <-- PC +4; PC <-- coo
 }                                   ...
 coo ( ) {                    coo:   ...
    ...                              BX LR // PC <-- LR
   return ;
 }
                                                                            60
Nested Subroutine Calls
 boo ( ) {
        coo ( ) ;
 B1 : doo ( ) ;
                       • These calls are nested: boo calls coo, coo calls doo
 B2 :                  • If we save return addresses in LR, calling doo from
 return ;
                         coo overwrites the return address back to boo!
 }
 coo ( ) {             • doo() is called from two different places, and is
       doo ( ) ;         expected to return to different places for each call
 C:    return ;
 }                     • How do we remember the return addresses for each
 doo ( ) {               call, in the correct order? (I.e., the reverse call
       return ;          order.)
 }
                                                                              62
Stack Operations
                                                                    63
ARM Memory Layout
                                                                              64
The Stack in ARM
                                                                65
The Stack in ARM
   ∗
       by convention; breaking from convention may break your code
                                                                               65
Stack Operations in ARM
 Push from Rj
 STR Rj, [SP, #-4]!
SP ← SP - 4 -28 SP
Mem[SP] ← Rj 17
                                         739
 Pop into Rj
                                          ...
 LDR Rj, [SP], #4
 Rj ← Mem[SP]
 SP ← SP + 4
                          Assuming Rj=19, SP=0xFFFFABCC and i=2,
                          what’s the content of the stack, register Rj,
 Peek(i) into Rj
                          and SP, after each instruction executes?
 LDR Rj, [SP, #const]     (consider them separately)
 where const = i ∗ 4
 Rj ← Mem[SP+const]
                                                                          66
Pushing and Popping Multiple Elements
                                                                          67
Nested Subroutine Calls, Revisited
           MOV    R0, #1
           MOV    R1, #2
           MOV    R2, #3
           PUSH   {LR}         // STR LR,[SP,#-4]!; saves return address
           BL     add3
           STR    R0, SUM      // return value is in R0
           POP    {LR}         // LDR LR,[SP],#4; restores return address
           ...
     • In the previous example, the callee overwrote R0, which was OK,
       since the caller knew that the return value would be in R0
     • In general, the caller may need the register values after the
       callee returns, so the rule is a callee is responsible for leaving
       the registers as it found them
   Callee-save convention:
   A subroutine should save any∗ registers it wants to use on the stack
   and then restore the original values to the registers after it is
   finished using them.
                                                                            70
ARM APCS Uses the Callee-save Convention
   add3:     ADD R0, R0, R1
             ADD R0, R0, R2
             BX LR
       • In the previous example, the callee overwrote R0, which was OK,
         since the caller knew that the return value would be in R0
       • In general, the caller may need the register values after the
         callee returns, so the rule is a callee is responsible for leaving
         the registers as it found them
   Callee-save convention:
   A subroutine should save any∗ registers it wants to use on the stack
   and then restore the original values to the registers after it is
   finished using them.
   ∗
     The ARM APCS states that argument registers A1 – A4 need not be
   saved, but remember: they might be changed inside of subroutines!
                                                                              70
Registers in the ARM Architecture Procedure Call Standard
                                                                                    71
Passing Parameters On the Stack
   When you have more than four parameters, you can pass four in
   registers, and the additional ones on the stack. (This is what
   compilers do, and what the APCS recommends.)
   Or, you can pass all parameters and the return value on the stack.
   Passing parameters in registers will always be faster. Why?
   When you want to pass a data structure that does not fit into four
   words, you must use the stack (for at least part of it). Example:
   struct largeDataStruct {
     int a ;
     int b ;
     int c ;
     int d ;
     int e ;
   }
   Let’s see how to pass everything on the stack with a program that
   sums a list of numbers.                                              72
ARRAY :         . word 6 , 5 , 4 , 3 , 2 , 1 , 1 4 , 1 3 , 1 2 , 1 1 , 1 0 , 9 , 8 , 7 // sum these
N:              . word 14                                                              // t h i s many o f them
SUM :           . space 4                                                              // r e s u l t goes here
                . global _start
_ s t a r t : LDR      A1 , =ARRAY                // A1 p o i n t s to ARRAY
                LDR    A2 , N                     // A2 c o n t a i n s number o f elements to add
                PUSH { A1 , A2 , LR } // push parameters and LR ( A1 i s TOS )
                BL     listadd                    // c a l l s u b r o u t i n e
                LDR    A1 , [ SP , # 0 ] // r e t u r n i s a t TOS
                STR    A1 , SUM                   // s t o r e i t i n memory
                ADD    SP , SP , #8               // c l e a r parameters
                POP    { LR }                     // r e s t o r e LR
stop :          B      stop
l i s t a d d : PUSH { V1 − V3 }                  // c a l l e e − save r e g i s t e r s l i s t a d d uses
                LDR    V1 , [ SP , # 1 6 ] // load param N from s t a c k
                LDR    V2 , [ SP , # 1 2 ] // load param ARRAY from s t a c k
                MOV    A1 , #0                    // c l e a r R0 ( sum )
loop :          LDR    V3 , [ V2 ] , #4 // g e t next value from ARRAY
                ADD    A1 , A1 , V3               // form the p a r t i a l sum
                SUBS V1 , V1 , #1                 // decrement loop counter
                BGT    loop
                STR    A1 , [ SP , # 1 2 ] // s t o r e sum on s t a c k , r e p l a c i n g ARRAY
                POP    { V1 − V3 }                // r e s t o r e r e g i s t e r s
                BX     LR
                                                                                                                  73
Passing by Value, Passing by Reference
                                               i n t add3Val ( i n t a ) {
                                                   a = a+3;
                                                   return a ;
                                               }
 Recap from C:                                 void add3Ref ( i n t * a ) {
                                                                              74
ARRAY:     .word 6,5,4,3,2,1,14,13,12,11,10,9,8,7
N:         .word 14
...
                                                                       75
Stack Frame
                                                      localvar1
   • Using a frame pointer (R11) gives a
                                                      saved R4
     consistent reference to parameters               saved R5
     [FP, #const] and local variables                 saved R6
     [FP, #-const]                                    saved LR
                                                       param2
   • FP is not strictly required; it is mainly used
                                                       param3
     to make assembly programs easier to write,
                                                       param4
     and to help with the debugger
                                                         ...      old TOS
   • FP remains constant while in the same               ...
subroutine
                                                                            76
Stack Frame
                                                         localvar1
   • Using a frame pointer (R11) gives a
                                                          saved R4
     consistent reference to parameters                   saved R5
     [FP, #const] and local variables                     saved R6
     [FP, #-const]                                        saved LR
                                                          param2
   • FP is not strictly required; it is mainly used
                                                          param3
     to make assembly programs easier to write,
                                                          param4
     and to help with the debugger
                                                            ...      old TOS
    • FP remains constant while in the same                  ...
      subroutine
   ∗
     Most local variables are actually allocated this way, reducing the
   total memory required by a program.                                         76
ARM Instruction Encoding
Textbook§2.13
ARM Assembly vs. Binary
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cond
                                                                                                                                                  77
ARM Assembly vs. Binary
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cond
   ∗
     16-bit versions are available for many instructions, but such
   instructions tend to be less flexible.
                                                                                                                                                  77
Condition Field
                                                                                  78
Data Processing Instruction Encoding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cond 0 0 I 0 1 0 0 S Rn Rd Operand2
   Examples:
   ADDGES R1, R2, R3
                                                                                                                                                  79
Immediate Value Encoding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cond 0 0 I 0 1 0 0 S Rn Rd Operand2
11 10 9 8 7 6 5 4 3 2 1 0
Rotate Immediate
                                                                                                                                                       80
           Rotation   31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10                    9    8   7   6   5   4   3   2   1   0
0x0 7 6 5 4 3 2 1 0
0x1 1 0 7 6 5 4 3 2
0x2 3 2 1 0 7 6 5 4
0x3 5 4 3 2 1 0 7 6
0x4 7 6 5 4 3 2 1 0
0x5 7 6 5 4 3 2 1 0
0x6 7 6 5 4 3 2 1 0
0x7 7 6 5 4 3 2 1 0
0x8 7 6 5 4 3 2 1 0
0x9 7 6 5 4 3 2 1 0
0xA 7 6 5 4 3 2 1 0
0xB 7 6 5 4 3 2 1 0
0xC 7 6 5 4 3 2 1 0
0xD 7 6 5 4 3 2 1 0
0xE 7 6 5 4 3 2 1 0
0xF 7 6 5 4 3 2 1 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
                                                                                                               82
Branch Instruction Encoding
     31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10   9   8   7   6   5   4   3   2   1   0
Cond 1 0 1 L offset
This set of lectures has presented the ARM ISA and introduced:
84