A      B       C        D           E          F                Right shift (x >> n): fill in 0
10      11      12       13          14        15                (logical) or repeat MSB (arithmetic)
                                                                   && || != ! interpret entire variable as
  2
      5
          2
              6
                  2
                   7
                              2
                               8
                                       2
                                           9
                                                  2
                                                      10           TRUE (non-zero) or FALSE (zero)
  32      64      128      256         512       1024
                                                                   Overflow detection
 C type    # bytes # bits                                          Method 1:
 char      1        8                                              1. Perform standard arithmetic
 short     2        16                                             2. If result is not within
 int       4        32                                             representation range -> overflow
 long      8        64                                             Method 2:
All datatypes are signed by default.                               unsigned addition: result < op1 or op 2
                                                                   unsigned subtraction: result > minuend
B2U/U2B/H2U/U2H:                                                   signed addition: p + p = n or n + n = p
                        n −1                                       negation of TMin will overflow
(a n−1 … a i … a 0)r ⇒ ∑ ai∗r
                                   i
                        i=0
B2T:
                                                n−2
                                               + ∑ ai∗2
                                       n−1                 i
(a n−1 … a i … a 0)2 ⇒−a n−1∗2
                                                i=0
B2H/H2B: 4 bits <-> 1 hex digit
T2U/U2T: +/- 2^n when necessary
H2T=H2B+B2T;    T2B=T2U+U2B;    T2H                            =
T2U+U2H.
TMax=2n−1-1: 0111 1111 … 1111 (0x7F…F)
TMin=−2n−1: 1000 0000 … 0000 (0x80…0)
-1/UMax=2n−1 : 1111 1111 … 1111 (0xFF…
F)
                                                                   Memory: byte-addressable
                                                                   x86-64 data size:
                                                                   Byte (b) 1 byte 8 bits
                                                                   Word (w) 2 bytes 16 bits
                                                                   Double word (l) 4 bytes 32 bits
                                                                   Quad word (q) 8 bytes 64 bits
                                                                   starting address must be multiple of
                                                                   data size, e.g., valid starting address
                                                                   of a double word mod 4 = 0
x & mask: clear bits that are 0 in mask
x | mask: set bits that are 1 in mask                              Components of x86-64 ISA
x ^ mask: flip bits that are 1 in mask                             1. Data and address size
~x: flip all bits                                                  1/2/4/8 bytes data size; 64-bit address
~x + 1 = -x: negation = flip bits + 1                              2. Supported instructions
A - B = A + ~B + 1                                                 mov, add, jmp, ...
                                                                   3. Registers
0 & x = 0; 1 & x = x; x & x = x.                                   16 general-purposed registers and usage
0 | x = x; 1 | x = 1; x | x = x.                                   conventions (optional)
0 ^ x = x; 1 ^ x = ~x; x ^ x = 0.                                  4. Addressing Modes
                                                                   5. Length and format of instructions
Left shift (x << n): fill in 0
 q: 8 bytes   l: 4 bytes    w: 2 bytes    b: 1 byte
 %rax         %eax          %ax           %al
 %rbx         %ebx          %bx           %bl
 %rcx         %ecx          %cx           %cl
 %rdx         %edx          %dx           %dl
 %rsi         %esi          %si           %sil
 %rdi         %edi          %di           %dil
 %rbp         %ebp          %bp           %bpl
 %rsp         %esp          %sp           %spl
 %r8          %r8d          %r8w          %r8b
 %r9          %r9d          %r9w          %r9b
 %r10         %r10d         %r10w         %r10b
 %r11         %r11d         %r11w         %r11b
 %r12         %r12d         %r12w         %r12b
 %r13         %r13d         %r13w         %r13b
 %r14         %r14d         %r14w         %r14b
 %r15         %r15d         %r15w         %r15b
Move to register/memory (register names must match instruction suffix)
movb src, dst (1 byte)
movw src, dst (2 bytes)
movl src, dst (4 bytes / with register destination, upper 4 bytes set to 0)
movq src, dst (8 bytes)
movabsq imm, reg (8 bytes immediate to register)
movq only supports a 32-bit immediate and will sign extend the upper 4 bytes
either src or dst can be a memory location, not both; no imm as dst
Move from register/memory to register (zero extension)
movzbw src, reg (byte to word)
movzbl src, reg (byte to double word)
movzbq src, reg (byte to quad word)
movzwl src, reg (word to double word)
movzwq src, reg (word to quad word)
Same, but with sign extension (replicate MSB):
movsbw, movsbl, movsbq, movswl, movswq, movslq
 Type             Form                   Operand Value                Example
 Immediate        $Imm                   Imm                          $1234
 Register         R                      Reg[R]                       %rdx
 Memory           Imm                    Mem[Imm]                     0x4000
 Memory           (R)                    Mem[Reg[R]]                  (%rdx)
 Memory           Imm(Rb)                Mem[Imm+Reg[Rb]]             4(%rdx)
 Memory           (Rb, Ri)               Mem[Reg[Rb]+Reg[Ri]]         (%rdx, %rcx)
 Memory           Imm(Rb, Ri)            Mem[Imm+Reg[Rb]+Reg[Ri]]     4(%rdx, %rcx)
 Memory           (, Ri, S)              Mem[Reg[Ri]*S]               (, %rcx, 2)
 Memory           Imm(, Ri, S)           Mem[Imm+Reg[Ri]*S]           4(, %rcx, 2)
 Memory           (Rb, Ri, S)            Mem[Reg[Rb]+Reg[Ri]*S]       (%rdx, %rcx, 2)
 Memory           Imm(Rb, Ri, S)         Mem[Imm+Reg[Rb]+Reg[Ri]*S]   4(%rdx, %rcx, 2)
Imm: constant displacement; Rb: base register: any of 16 registers
Ri: index register: any, except %rsp; S: scale: 1, 2, 4, or 8
leaq src, dst calculates src, does not load Mem[src] to dst
leaq imm(reg1,reg2,c), reg3 saves imm+reg1+reg2*c into reg3
Unary (with b/w/l/q variants): C equivalent
incq dst: dst++
decq dst: dst—
negq dst: dst = -dst
notq dst: dst = ~dst
Binary (with b/w/l/q variants): C equivalent
addq src,dst: dst = dst + src
subq src,dst: dst = dst - src
andq src,dst: dst = dst & src
orq src,dst: dst = dst | src
xorq src,dst: dst = dst ^ src
salq src,dst: dst = dst << src
sarq src,dst: dst = dst >> src arithmetic
shrq src,dst: dst = dst >> src logical
32-bit variants set the upper 4 bytes to 0 for register destination
# memory access: leaq: 0; mov: 0 or 1; arithmetic & logical: 0 or 1 or 2
CF   =   Carry Flag: Set to 1 if unsigned overflow occurs
OF   =   Overflow Flag: Set to 1 if signed overflow occurs
SF   =   Sign Flag: Set to 1 if the result is negative
ZF   =   Zero Flag: Set to 1 if the result is 0
All integer arithmetic & logical instructions set condition codes, but
1. leaq does not change any condition codes
2. For bitwise operations (&, |, ^, ~) and test, CF = 0, OF = 0
3. For shift operations, CF = last bit shifted out, OF = 0
4. inc and dec will set OF, but leave CF unchanged
 Instruction       Jump Condition   Description
 jmp label         1                Unconditional
 je label          ZF               Equal / zero
 jne label         ~ZF              Not equal / not zero
 js label          SF               Negative
 jns label         ~SF              Non-negative
 jg label          ~(SF ^ OF) &     Greater (signed >)
                   ~ZF
 jge label         ~(SF ^ OF)       Greater or Equal (signed >=)
 jl label          (SF ^ OF)        Less (signed <)
 jle label         (SF ^ OF) | ZF   Less or equal (signed <=)
 ja label          ~CF & ~ZF        Above (unsigned >)
 jae label         ~CF              Above or equal (unsigned >=)
 jb label          CF               Below (unsigned <)
 jbe label         CF | ZF          Below or equal (unsigned <=)
cmp src1, src2 computes src2 – src1; test src1, src2 computes src2 & src1
Set condition codes, do not store result in src2
Jump based on relation of two variables        Jump based on testing of one variable
cmp b, a # compare a : b                       test a, a # test a : 0
j* label # jump if a * b                       j* label
 Instruction 1                                 cmp b, a            test b, a
 Instruction 2                                 Conditional         Conditional
 je label #equal / zero                        a == b              a & b == 0
 jne label #not equal / not zero               a != b              a & b != 0
 js label #negative                            a – b < 0           a & b < 0
 jns label #non-negative                       a – b >= 0          a & b >= 0
 jg label #greater (signed >)                  a > b (signed)      a & b > 0
 jge label #greater or equal (signed >=)       a >= b (signed)     a & b >= 0
 jl label #less (signed <)                     a < b (signed)      a & b < 0
 jle label #less or equal (signed <=)          a <= b (signed)     a & b <= 0
 ja label #above (unsigned >)                  a > b (unsigned)    a & b > 0U
 jae label #above or equal (unsigned >=)       a >= b (unsigned)   a & b >= 0U
 jb label #below (unsigned <)                  a < b (unsigned)    a & b < 0U
 jbe label #below or equal (unsigned <=)       a <= b (unsigned)   a & b <= 0U
a: address of x; a in %rdx; index i in %rcx; return value is stored in %rax
 Expression   Type     Value        Assembly
 x            int*     a            movq %rdx, %rax
 x[0]         int      3            movl (%rdx), %eax
 x+1          int*     a+4          leaq 4(%rdx), %rax
 *(x+1)       int      7            movl 4(%rdx), %eax
 x+i          int*     a+4*i        leaq (%rdx, %rcx, 4), %rax
 x[i]         int      Mem[a+4*i]   movl (%rdx, %rcx, 4), %eax
 Address space     M            M =2m
                   Bytes
 Address width     m   bits     m=log 2 M
 Block size        K    Bytes   K=2
                                      k
 Offset width      k   bits     k =log 2 K
 Cache size        C   Bytes    C=K × E × S
 Associativity     E            E=C/ K /S
 # sets            S            S=C/ K / E
 Index width       s   bits     s=
                                log 2 ( C /K /E )
 Tag width         t   bits     t=m−s−k
Average Memory Access Time = Hit time +
Miss rate × Miss penalty:
AMAT = HT + MR × MP
On write-hit
1. Write-through (update both cache and memory)
2. Write-back (only update cache, update memory
later, check dirty bit when a block is evicted)
On write-miss
1. No-write-allocate (directly update memory)
2. Write-allocate (only load into cache)
1. Compulsory miss occurs on first access to a block
2. Conflict miss occurs when the cache is large enough, but multiple data all map
to the same block e.g., 0, 8, 0, 8, ... could miss every time
Direct mapped caches have more conflict misses than 𝐸-way set-associative (𝐸 > 1)
Fully associative does not have conflict misses
3. Capacity miss occurs when the set of active cache blocks (the working set)
is larger than the cache (will not fit even if cache is fully associative)
                                 GOOD LUCK!