CSCI232
Assembly: Arithmetic and Logic
   Machine Programming
       Reading: B&O 3.5-3.6
                                 1
How does a computer
interpret and execute
C programs?
              Learning Assembly
               Arithmetic and
Moving data
                   logical      Control flow   Function calls
 around
                 operations
Week4         This Lecture
                         Learning Goals
• Learn how to perform arithmetic and logical operations in assembly
• Begin to learn how to read assembly and understand the C code that
  generated it
                            Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
                                    mov
The mov instruction copies bytes from one place to another;
it is similar to the assignment operator (=) in C.
                              mov         src,dst
The src and dst can each be one of:
• Immediate (constant value, like a number) (only src)
• Register
• Memory Location
  (at most one of src, dst)
   Memory Location Syntax
    Syntax                          Meaning
     0x104                    Address 0x104 (no $)
    (%rax)                       What’s in %rax
    4(%rax)                   What’s in %rax, plus 4
 (%rax, %rdx)            Sum of what’s in %rax and %rdx
 4(%rax, %rdx)       Sum of values in %rax and %rdx, plus 4
                   What’s in %rcx, times 4 (multiplier can be 1,
   (, %rcx, 4)
                                      2, 4, 8)
(%rax, %rcx, 2)    What’s in %rax, plus 2 times what’s in %rcx
                   What’s in %rax, plus 2 times what’s in %rcx,
8(%rax, %rcx, 2)
                                      plus 8
                            Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
                               Data Sizes
Data sizes in assembly have slightly different terminology to get used to:
• A byte is 1 byte.
• A word is 2 bytes.
• A double word is 4 bytes.
• A quad word is 8 bytes.
Assembly instructions can have suffixes to refer to these sizes:
• b means byte
• w means word
• l means double word
• q means quad word                                                          12
            Register Sizes
Bit: 63            31        15    7      0
     %rax          %eax      %ax   %al
     %rbx          %ebx      %bx   %bl
     %rcx          %ecx      %cx   %cl
     %rdx          %edx      %dx   %dl
     %rsi          %esi      %si   %sil
     %rdi          %edi      %di   %dil
            Register Sizes
Bit: 63            31        15      7       0
     %rbp          %ebp      %bp     %bpl
     %rsp          %esp      %sp     %spl
     %r8           %r8d      %r8w    %r8b
     %r9           %r9d      %r9w    %r9b
     %r10          %r10d     %r10w   %r10b
     %r11          %r11d     %r11w   %r11b
            Register Sizes
Bit: 63            31        15      7       0
     %r12          %r12d     %r12w   %r12b
     %r13          %r13d     %r13w   %r13b
     %r14          %r14d     %r14w   %r14b
     %r15          %r15d     %r15w   %r15b
                                                 15
                Register Responsibilities
Some registers take on special responsibilities during program execution.
• %rax stores the return value
• %rdi stores the first parameter to a function
• %rsi stores the second parameter to a function
• %rdx stores the third parameter to a function
• %rip stores the address of the next instruction to execute
• %rsp stores the address of the current top of the stack
                                 mov Variants
• mov can take an optional suffix (b,w,l,q) that specifies the size of data to move:
  movb, movw, movl, movq
• mov only updates the specific register bytes or memory locations indicated.
   • Exception: movl writing to a register will also set high order 4 bytes to 0.
             Practice: mov And Data Sizes
For each of the following mov instructions, determine the appropriate suffix
based on the operands (e.g. movb, movw, movl or movq).
1.   mov__   %eax, (%rsp)
2.   mov__   (%rax), %dx
3.   mov__   $0xff, %bl
4.   mov__   (%rsp,%rdx,4),%dl
5.   mov__   (%rdx), %rax
6.   mov__   %dx, (%rax)
             Practice: mov And Data Sizes
For each of the following mov instructions, determine the appropriate suffix
based on the operands (e.g. movb, movw, movl or movq).
1.   movl   %eax, (%rsp)
2.   movw   (%rax), %dx
3.   movb   $0xff, %bl
4.   movb   (%rsp,%rdx,4),%dl
5.   movq   (%rdx), %rax
6.   movw   %dx, (%rax)
                                    mov
• The movabsq instruction is used to write a 64-bit Immediate (constant) value.
• The regular movq instruction can only take 32-bit immediates.
• 64-bit immediate as source, only register as destination.
                    movabsq $0x0011223344556677, %rax
                          movz and movs
• There are two mov instructions that can be used to copy a smaller source to a
  larger destination: movz and movs.
• movz fills the remaining bytes with zeros
• movs fills the remaining bytes by sign-extending the most significant bit in the
  source.
• The source must be from memory or a register, and the destination is a
  register.
                    movz and movs
              MOVZ S,R    R ← ZeroExtend(S)
Instruction               Description
movzbw                    Move zero-extended byte to word
movzbl                    Move zero-extended byte to double word
movzwl                    Move zero-extended word to double word
movzbq                    Move zero-extended byte to quad word
movzwq                    Move zero-extended word to quad word
                    movz and movs
              MOVS S,R    R ← SignExtend(S)
Instruction               Description
movsbw                    Move sign-extended byte to word
movsbl                    Move sign-extended byte to double word
movswl                    Move sign-extended word to double word
movsbq                    Move sign-extended byte to quad word
movswq                    Move sign-extended word to quad word
movslq                    Move sign-extended double word to quad word
cltq                      Sign-extend %eax to %rax
                          %rax <- SignExtend(%eax)
                            Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
                                      lea
The lea instruction copies an “effective address” from one place to another.
                              lea          src,dst
Unlike mov, which copies data at the address src to the destination, lea copies
the value of src itself to the destination.
              The syntax for the destinations is the same as
              mov. The difference is how it handles the src.
                   lea vs. mov
Operands        mov Interpretation                        lea Interpretation
6(%rax), %rdx   Go to the address (6 + what’s in %rax),   Copy 6 + what’s in %rax into %rdx.
                and copy data there into %rdx
                        lea vs. mov
Operands             mov Interpretation                        lea Interpretation
6(%rax), %rdx        Go to the address (6 + what’s in %rax),   Copy 6 + what’s in %rax into %rdx.
                     and copy data there into %rdx
(%rax, %rcx), %rdx   Go to the address (what’s in %rax +      Copy (what’s in %rax + what’s in %rcx)
                     what’s in %rcx) and copy data there into into %rdx.
                     %rdx
                           lea vs. mov
Operands                mov Interpretation                        lea Interpretation
6(%rax), %rdx           Go to the address (6 + what’s in %rax),   Copy 6 + what’s in %rax into %rdx.
                        and copy data there into %rdx
(%rax, %rcx), %rdx      Go to the address (what’s in %rax +      Copy (what’s in %rax + what’s in %rcx)
                        what’s in %rcx) and copy data there into into %rdx.
                        %rdx
(%rax, %rcx, 4), %rdx   Go to the address (%rax + 4 * %rcx) and   Copy (%rax + 4 * %rcx) into %rdx.
                        copy data there into %rdx.
                            lea vs. mov
Operands                 mov Interpretation                        lea Interpretation
6(%rax), %rdx            Go to the address (6 + what’s in %rax),   Copy 6 + what’s in %rax into %rdx.
                         and copy data there into %rdx
(%rax, %rcx), %rdx       Go to the address (what’s in %rax +      Copy (what’s in %rax + what’s in %rcx)
                         what’s in %rcx) and copy data there into into %rdx.
                         %rdx
(%rax, %rcx, 4), %rdx    Go to the address (%rax + 4 * %rcx) and   Copy (%rax + 4 * %rcx) into %rdx.
                         copy data there into %rdx.
7(%rax, %rax, 8), %rdx   Go to the address (7 + %rax + 8 * %rax)   Copy (7 + %rax + 8 * %rax) into %rdx.
                         and copy data there into %rdx.
            Unlike mov, which copies data at the address
            src to the destination, lea copies the value of
            src itself to the destination.
                           Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
                          Unary Instructions
The following instructions operate on a single operand (register or memory):
            Instruction        Effect                             Description
            inc D              D ← D + 1                          Increment
            dec D              D ← D - 1                          Decrement
            neg D              D ← -D                             Negate
            not D              D ← ~D                             Complement
Examples:                               Whenever a register is being referenced with () i.e. (%rax), it means that the register's value
                                        should be taken as a memory address and the value that's in action is the value in that memory
     incq 16(%rax)                      address (also called dereferencing).
     dec %rdx
     not %rcx
                                                                                                                                 31
                          Binary Instructions
The following instructions operate on two operands (both can be register or
memory, source can also be immediate). Both cannot be memory locations.
Read it as, e.g. “Subtract S from D”:
            Instruction          Effect          Description
            add S, D             D ← D + S       Add
            sub S, D             D ← D - S       Subtract
            imul S, D            D ← D * S       Multiply
            xor S, D             D ← D ^ S       Exclusive-or
            or S, D              D ← D | S       Or
            and S, D             D ← D & S       And
Examples:
      addq %rcx,(%rax)
      xorq $16,(%rax, %rdx, 8)
                                                                              32
      subq %rdx,8(%rax)
                            Lecture Plan
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
                         Assembly Exercise 1
00000000004005ac <sum_example1>:
     4005bd: 8b 45 e8       mov %esi,%eax
     4005c3: 01 d0          add %edi,%eax
     4005cc: c3             retq
Which of the following is most likely to have generated the above assembly?
// A)                                              // B)
void sum_example1() {                              int sum_example1(int x, int y) {
    int x;                                             return x + y;
    int y;                                         }
    int sum = x + y;
}
// C)
void sum_example1(int x, int y) {
    int sum = x + y;
}                         A and C does not return a value
                                                                                      42
                       Assembly Exercise 2
0000000000400578          <sum_example2>:
     400578: 8b           47 0c      mov 0xc(%rdi),%eax
     40057b: 03           07         add (%rdi),%eax
     40057d: 2b           47 18      sub 0x18(%rdi),%eax
     400580: c3                      retq
int sum_example2(int arr[]) {         What location or value in the assembly above represents the
    int sum = 0;                      C code’s sum variable?
    sum += arr[0];
    sum += arr[3];
    sum -= arr[6];                          %eax
    return sum;
}                                                                                                   43
                Disassembling Object Code
Disassembler
gcc -g -O -c example2.c
objdump –d example2.o
 Useful tool for examining object code
Analyzes bit pattern of series of instructions
Produces approximate rendition of assembly code
 Can be run on either a.out(complete executable) or .ofile
                       Assembly Exercise 3
0000000000400578          <sum_example2>:
     400578: 8b           47 0c      mov 0xc(%rdi),%eax
     40057b: 03           07         add (%rdi),%eax
     40057d: 2b           47 18      sub 0x18(%rdi),%eax
     400580: c3                      retq
int sum_example2(int arr[]) {         What location or value in the assembly code above
    int sum = 0;                      represents the C code’s 6 (as in arr[6])?
    sum += arr[0];
    sum += arr[3];
    sum -= arr[6];                            0x18
    return sum;
}                                                                                         44
                                  Recap
• Recap: mov so far
• Data and Register Sizes
• The lea Instruction
• Logical and Arithmetic Operations
• Practice: Reverse Engineering
Next Time: control flow in assembly (while loops, if statements, and more)
                                                                             46
Lecture takeaway: There are assembly instructions for
arithmetic and logical operations. They share the
same operand form as mov, but lea interprets them
differently. There are also different register sizes that
may be used in assembly instructions.
                                                            48
              A Note About Operand Forms
• Many instructions share the same address operand forms that mov uses.
   • Eg. 7(%rax, %rcx, 2).
• These forms work the same way for other instructions, e.g. sub:
   • sub 8(%rax,%rdx),%rcx -> Go to 8 + %rax + %rdx, subtract what’s there from %rcx
• The exception is lea:
   • It interprets this form as just the calculation, not the dereferencing
   • lea 8(%rax,%rdx),%rcx -> Calculate 8 + %rax + %rdx, put it in %rcx
                                                                                       49
                Reverse Engineering 1
int add_to(int x, int arr[], int i) {
    int sum = ___?___;
    sum += arr[___?___];
    return ___?___;
}
----------
add_to:
  movslq %edx, %rdx
  movl %edi, %eax
  addl (%rsi,%rdx,4), %eax
  ret
                                        54
                Reverse Engineering 1
int add_to(int x, int arr[], int i) {
    int sum = ___?___;
    sum += arr[___?___];
    return ___?___;
}
----------
// x in %edi, arr in %rsi, i in %edx
add_to:
  movslq %edx, %rdx           // sign-extend i into full register
  movl %edi, %eax             // copy x into %eax
  addl (%rsi,%rdx,4), %eax    // add arr[i] to %eax
  ret
                                                                    55
                Reverse Engineering 1
int add_to(int x, int arr[], int i) {
    int sum = x;
    sum += arr[i];
    return sum;
}
----------
// x in %edi, arr in %rsi, i in %edx
add_to:
  movslq %edx, %rdx           // sign-extend i into full register
  movl %edi, %eax             // copy x into %eax
  addl (%rsi,%rdx,4), %eax    // add arr[i] to %eax
  ret
                                                                    56
                Reverse Engineering 2
int elem_arithmetic(int nums[], int y) {
    int z = nums[___?___] * ___?___;
    z -= ___?___;
    z >>= ___?___;
    return ___?___;
}
----------
elem_arithmetic:
  movl %esi, %eax
  imull (%rdi), %eax
  subl 4(%rdi), %eax
  sarl $2, %eax
  addl $2, %eax
  ret
                                           57
                Reverse Engineering 2
int elem_arithmetic(int nums[], int y) {
    int z = nums[___?___] * ___?___;
    z -= ___?___;
    z >>= ___?___;
    return ___?___;
}
----------
// nums in %rdi, y in %esi
elem_arithmetic:
  movl %esi, %eax          // copy y into %eax
  imull (%rdi), %eax       // multiply %eax by nums[0]
  subl 4(%rdi), %eax       // subtract nums[1] from %eax
  sarl $2, %eax            // shift %eax right by 2
  addl $2, %eax            // add 2 to %eax
  ret
                                                           58
                Reverse Engineering 2
int elem_arithmetic(int nums[], int y) {
    int z = nums[0] * y;
    z -= nums[1];
    z >>= 2;
    return z + 2;
}
----------
// nums in %rdi, y in %esi
elem_arithmetic:
  movl %esi, %eax          // copy y into %eax
  imull (%rdi), %eax       // multiply %eax by nums[0]
  subl 4(%rdi), %eax       // subtract nums[1] from %eax
  sarl $2, %eax            // shift %eax right by 2
  addl $2, %eax            // add 2 to %eax
  ret
                                                           59