0% found this document useful (0 votes)

10 views34 pages

Asm Presentation

Uploaded by

billzheng888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views34 pages

Asm Presentation

Uploaded by

billzheng888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

X86-64 Architecture With a

Broadwell Accent
Registers
● Lower 32 bits of %rax is %eax Lower 16 bits is
%ax, lower 8 bits is %al
Same trick for vector registers
SSE->AVX
●
AT&T syntax
● If the registers are preceded by % signs. The
rightmost operand is the destination

● mov %rax, %rdx ; copy the value from %rax to

%rdx – misnamed instruction, cpy would have
been better!
● $ objudmp -DC ./bin/live_engine | less
● -D = disasemble, -C = demangle function
names
Some instructions
● addq %rax, %rbx
● subq, subbq
● cmp %rax, %rbx
● and
● test
● negb, negl
Control Flow
● jmp
● jmple
● jg
● jc
● Conditional moves work the same way!
● cmovle
● cmovg
The Pipeline

● Branch Prediction comes first, when wrong, the

pipeline is dumped
● Branch Target Buffer – to predict branches
Two level adaptive predictor
● Globally shared per core – also a loop predictor
Aside: Is this %rdx re-use slow?
● movq (%rbp), %rdx
● movq %rdx, (dest)
● movq (%r8), %rdx
● movq %rdx, (other_dest)
● what about if line 3 were mov src2, %eax?
● What about if it were movb src2, %al
● xor %eax, %eax ; zeros all of %rax
● xor %ax, %ax; leaves upper 48 bits as they
were
Instructions

● what are jmp, jl, jc?

● lea == calculate (an address?) and put the

calculation in a register - nothing loaded from
memory. Another misnamed instruction...

● fast multiply by 5
● lea (%rax, %rax, 4) %rbx
CISC? Or really RISC?
● addq %rax, (%rdi) ; this is CISC

● split into micro ops – which are RISC

● mov (%rdi), %register
● add %rax, %register
● mov %register, (%rdi)
Pipeline

Branch Prediction - work out what's coming next

Fetch - work out where the instructions are
Decode - decode them into micro ops – CISC becomes RISC
Rename - 100s of registers on chip, 16 architectural no deps can rename
Reorder Buffer Read - snoops output of completing micro-ops, fetches
inputs
Reservation Station - queue micro ops until inputs ready
Execution - micro ops with inputs go to an execution port! shit gets done yo.
Reorder Buffer Write - execution results written
Retire - results to register file, micro ops removed from reorder buffer
Execution ports
● We have 8 execution ports, this diagram has 6
but it’s close
Branch misprediction is expensive
● What to do?
● LIKELY() & UNLIKELY() macros
● Branchless implementations
− Replace jumps with conditional moves – but
dependency chains
− xor and bithacking
http://graphics.stanford.edu/~seander/bithacks.html
− uint64_t max(uint64_t a, uint64_t b) {
− uint64_t mask = -(!(b < a));
− return a ^ ((a ^ b) & mask);
What assembly does that
branchless max in C produce?
● branchless_demo.cpp – what instructions look
odd?
System V ABI
● How to call a function on anything that isn’t
windows
● %rdi, %rsi, %rdx, %rcx, %r8, %r9 integer
function args in order
● %xmm0 – %xmm7 for floating point args
● Integer return value in %rax
● Floating point return value in %xmm0
Lets see that max() again
● %rdi is first argument
● %rsi is the second
● %rax is the return value
● Looking at it can we do better still?
g++ -S
Hardcode a breakpoint

● if(some_complex_condition && is_met) {

● asm volatile(“int3”);
● }
● must run in gdb!
Stack
● %rbp is the stack frame base pointer
● %rsp is the stack pointer
● push %rax
● the same as:
● mov %rax, (%rsp)
● sub 8, %rsp
● Stack starts at a high address, adding to the
stack reduces the stack pointer address
● Can tell immediately by inspection of an
Function calls
● callq 0x1326bd(%rip)
● pushes the address of the instruction after the
call on the stack. Then jumps to the address.
● push(%rip + call instruction size)
● jmp function address

● retq pops that pushed address off the stack and

jumps (back) to it
● Implication: Any stack corruption will hurt, badly!
Function calls
● Label:
● push %rbp
● mov %rsp, %rbp
● … do stuff here, push values, use stack for
temp storage

● … end of function
● leaveq ; this instruction is really mov %rbp,
%rsp & pop %rpb
What about system calls?
● System V ABI - %rdi, %rsi, %rdx, %rcx, %r8,
%r9
● Kernel %rdi, %rsi, %rdx, %r10, %r8,
%r9
● Pretty similar
● Load the registers with arguments
● Put syscall number in %rax
● syscall instruction
● Return value in %rax
Full horror show...
●
Member Function
● struct Foo
● {
● uint64_t some_member_func(uint64_t bar)
● {
● return member + bar;
● }
● uint64_t member;
● };
C++ with no virtuals – syntactic
sugar over C
● struct Foo
● {
● uint64_t member;
● };
● uint64_t some_member_func(Foo* this,
uint64_t bar)
● {
● return this->member + bar;
● }
More instructions
● callq - push next instruction address after callq
& jmp to function address
● push
● pop
● set, setne, sete, setge etc
● shl - logical shift left
● shr
● sar - shift arithmetic right; through the carry flag
● sal
Memory Latency
● Main memory 240 cycles
● L3 at best 66 cycles on Broadwell
● L2 ~10 cycles
● L1 ~1 cycles
● An instruction that has to go to L2 for data takes
5 times as long as one that has data in L1 (~11
cycles vs 2 cycles given instruction takes 1
cycle itself).
● L3 is 30 times as long.
Cache
● Cache line size is 64 bytes. Size of 2 ymm
registers. 16 floats, 8 doubles or 8 int64_t
● L1 is 8 way, with 64 sets. Ie associative array
with 64 buckets and a maximum length of each
of 8 cache lines
● Bits 6-10 are the key
● Microbenchmarks usually have cache hot. It
may not be in the context of the whole program
in production.
● Data oriented programming – buzzwords up the
TLB
● Translation Lookaside Buffer
● Looks up the pagetable, inserts a mapping from
virtual to physical memory
● Associative array, virtual to physical address
● Program never sees a physical address!
● Limited shared resource – hardware support to
make TLB fills and pagetable lookups really fast
● But always: “Doing no work finishes faster than
doing work really fast...”
SIMD – aka “vectorisation”
● Where there is one there are many…
● Parallel execution
How floating point instructions look
● addXX
● subXX
● mulXX
● divXX
● movXX
● maxXX
● minXX

● XX =
Intrinsics
● Gives the compiler a chance and is less painful
than .s assembly or inline assembly
● gcc
● __builtin_popcountll()
● __builtin_ffsll()
● __builtin_clzll()
● __builtin_ctzll()
● __builtin_prefetch()
● https://gcc.gnu.org/onlinedocs/gcc/Other-Builtin
More Instrinsics
● https://software.intel.com/sites/landingpage/Intri
nsicsGuide/#techs=AVX,AVX2,FMA
● Available wherever C++ compilers are sold…
Struct of Arrays
● struct bad { uint64_t a; float b; double c;};
● bad bad_array[1024];

● struct good { uint64_t a[1024]; float b[1024];

double c[1024]; };

● makes simd, graphics card & fpga optimisations

possible
● don’t load unused struct members in cache,

Intel I
No ratings yet
Intel I
72 pages
Instruction Set Architecture and Principles
No ratings yet
Instruction Set Architecture and Principles
42 pages
Lecture 10
No ratings yet
Lecture 10
20 pages
01 Lecture02
No ratings yet
01 Lecture02
78 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
37 pages
x86 Review
No ratings yet
x86 Review
36 pages
Review of Assembly Language: Program "Text" Contains Binary Instructions
No ratings yet
Review of Assembly Language: Program "Text" Contains Binary Instructions
27 pages
Os Notes Mit
No ratings yet
Os Notes Mit
9 pages
x86 Assembly Tutorial: COS 318: Fall 2017
No ratings yet
x86 Assembly Tutorial: COS 318: Fall 2017
23 pages
Introduction To x86 Architecture
No ratings yet
Introduction To x86 Architecture
31 pages
6 Machine - Intro v2
No ratings yet
6 Machine - Intro v2
29 pages
Computer Architecture Students
No ratings yet
Computer Architecture Students
69 pages
ch4 Handouts
No ratings yet
ch4 Handouts
72 pages
Fundamentals - Assembly
No ratings yet
Fundamentals - Assembly
18 pages
Basic Architecture Ia32 x86
No ratings yet
Basic Architecture Ia32 x86
41 pages
A Crash Course On x86 Disassembly
100% (1)
A Crash Course On x86 Disassembly
23 pages
Slides 1up Animated
No ratings yet
Slides 1up Animated
115 pages
The Intel Assembly Manual - CodeProject
No ratings yet
The Intel Assembly Manual - CodeProject
28 pages
Lesson 2.1 - Intro + x86-x64 Assembly
No ratings yet
Lesson 2.1 - Intro + x86-x64 Assembly
33 pages
Master
No ratings yet
Master
2 pages
04 - Instruction Set Architecture-RV Part III
No ratings yet
04 - Instruction Set Architecture-RV Part III
56 pages
Csa Unit - 5
No ratings yet
Csa Unit - 5
17 pages
l5 Instruction Set and Addressing Modes
No ratings yet
l5 Instruction Set and Addressing Modes
48 pages
ARM K
No ratings yet
ARM K
32 pages
Mips Instructions
No ratings yet
Mips Instructions
30 pages
CA Mid 2 Summary Book
No ratings yet
CA Mid 2 Summary Book
11 pages
MIPS ABI and Stack Frame Guide
No ratings yet
MIPS ABI and Stack Frame Guide
24 pages
05 Machine Basics
No ratings yet
05 Machine Basics
44 pages
Lecture2-Appendix A Instruction Set Principles
No ratings yet
Lecture2-Appendix A Instruction Set Principles
36 pages
IA32
No ratings yet
IA32
45 pages
IA32
No ratings yet
IA32
45 pages
Design With Microprocessors: Year III Computer Science 1-st Semester Lecture 2: AVR Registers and Instructions
No ratings yet
Design With Microprocessors: Year III Computer Science 1-st Semester Lecture 2: AVR Registers and Instructions
26 pages
Lec 4
No ratings yet
Lec 4
42 pages
IT3106E SP 01 Machine Level Programming
No ratings yet
IT3106E SP 01 Machine Level Programming
296 pages
3 - ARMv8-A Architecture
No ratings yet
3 - ARMv8-A Architecture
67 pages
Assembly Language
No ratings yet
Assembly Language
26 pages
RISC ISA Development Project
No ratings yet
RISC ISA Development Project
28 pages
ARM Processor Instruction Set - Lecture 6
No ratings yet
ARM Processor Instruction Set - Lecture 6
43 pages
Cse331 L3 Arm Isa
No ratings yet
Cse331 L3 Arm Isa
103 pages
Instructions and Instruction Sequencing
100% (4)
Instructions and Instruction Sequencing
25 pages
Avr A & A: Rchitecture Ssembly
No ratings yet
Avr A & A: Rchitecture Ssembly
45 pages
Unit2 2
No ratings yet
Unit2 2
20 pages
Everything You Wanted To Know About RISC-V
No ratings yet
Everything You Wanted To Know About RISC-V
25 pages
x86 Architecture Overview For Verification Engineers
No ratings yet
x86 Architecture Overview For Verification Engineers
7 pages
L04 PipeliningII
No ratings yet
L04 PipeliningII
33 pages
OS Lecture21 Introduction To x86 Hardware
No ratings yet
OS Lecture21 Introduction To x86 Hardware
12 pages
5.MHN ARM InstructionSet
No ratings yet
5.MHN ARM InstructionSet
44 pages
x86 Disassembly Basics Guide
No ratings yet
x86 Disassembly Basics Guide
50 pages
2.avr Risc
No ratings yet
2.avr Risc
46 pages
ARM Reverse Engineering Guide
No ratings yet
ARM Reverse Engineering Guide
161 pages
05 PDF
No ratings yet
05 PDF
63 pages
Slides
No ratings yet
Slides
125 pages
Lecture 6
No ratings yet
Lecture 6
127 pages
14 Assembly Instructions
No ratings yet
14 Assembly Instructions
9 pages
x86 Assembly Reversing Cheat Sheet
No ratings yet
x86 Assembly Reversing Cheat Sheet
1 page
Lab 4: Introduction To x86 Assembly
No ratings yet
Lab 4: Introduction To x86 Assembly
14 pages
CO and AL Updated
No ratings yet
CO and AL Updated
18 pages
Understanding Cognitive Biases
100% (1)
Understanding Cognitive Biases
16 pages
Zimmer Emlenb
No ratings yet
Zimmer Emlenb
3 pages
Fundamentals of Genetics
No ratings yet
Fundamentals of Genetics
254 pages
Book Accountability Form: Subjects Book Title Code
No ratings yet
Book Accountability Form: Subjects Book Title Code
4 pages
Guppy Farming
No ratings yet
Guppy Farming
7 pages
PEDH
No ratings yet
PEDH
13 pages
Optimal Portfolio Diversification
No ratings yet
Optimal Portfolio Diversification
18 pages
Sage Intelligence Reporting - Beginner Training Manual
83% (6)
Sage Intelligence Reporting - Beginner Training Manual
48 pages
MSME Tech & Quality Upgrade Support
No ratings yet
MSME Tech & Quality Upgrade Support
2 pages
Yaeger - Apotheosis of Trash
No ratings yet
Yaeger - Apotheosis of Trash
20 pages
Pollab CV3
No ratings yet
Pollab CV3
4 pages
Bearing Clearance (Plastic Gauge)
No ratings yet
Bearing Clearance (Plastic Gauge)
3 pages
Dreadnought Naval Yard Design
No ratings yet
Dreadnought Naval Yard Design
6 pages
Pride and Prejudice Character Analysis
No ratings yet
Pride and Prejudice Character Analysis
1 page
Washing Machine User Guide
No ratings yet
Washing Machine User Guide
40 pages
Aruba 3810 Switch Series Data Sheet
No ratings yet
Aruba 3810 Switch Series Data Sheet
30 pages
ALL MCQ QUESTIONS & ANSWERS RELATED TO NCC 'A' CERTIFICATE EXAMINATION - 062416.hi - en
No ratings yet
ALL MCQ QUESTIONS & ANSWERS RELATED TO NCC 'A' CERTIFICATE EXAMINATION - 062416.hi - en
86 pages
SEC Sues Coinbase for Unregistered Trading
No ratings yet
SEC Sues Coinbase for Unregistered Trading
102 pages
OPGW Design Data Sheet (SFSJ-J-13123) Upated-A
No ratings yet
OPGW Design Data Sheet (SFSJ-J-13123) Upated-A
2 pages
Wastewater Plant Maintenance Guide
No ratings yet
Wastewater Plant Maintenance Guide
3 pages
Comparative Toxicological Evaluation of Natural and Artificial Sweeteners: Focus On Liver and Kidney Damage (WWW - Kiu.ac - Ug)
No ratings yet
Comparative Toxicological Evaluation of Natural and Artificial Sweeteners: Focus On Liver and Kidney Damage (WWW - Kiu.ac - Ug)
6 pages
AP Physics 1 - Student Workbook
No ratings yet
AP Physics 1 - Student Workbook
358 pages
C606 Bu Indicator
No ratings yet
C606 Bu Indicator
96 pages
Co1 Science 1ST Quarter
No ratings yet
Co1 Science 1ST Quarter
4 pages
On Training and Development of Bangladesh Development Bank Limited
No ratings yet
On Training and Development of Bangladesh Development Bank Limited
19 pages
Hybrid Inverter Datasheet 600kW
No ratings yet
Hybrid Inverter Datasheet 600kW
2 pages
Consonant Gradation PDF
No ratings yet
Consonant Gradation PDF
10 pages
Um Amor Perdido 1st Edition Anna Mcpartlin Download PDF
100% (2)
Um Amor Perdido 1st Edition Anna Mcpartlin Download PDF
49 pages
Material Handling1
No ratings yet
Material Handling1
2 pages
Ohm's Law and Resistance Experiments
No ratings yet
Ohm's Law and Resistance Experiments
17 pages

Asm Presentation

Uploaded by

Asm Presentation

Uploaded by

X86-64 Architecture With a

● mov %rax, %rdx ; copy the value from %rax to

● Branch Prediction comes first, when wrong, the

● what are jmp, jl, jc?

● lea == calculate (an address?) and put the

● split into micro ops – which are RISC

Branch Prediction - work out what's coming next

● if(some_complex_condition && is_met) {

● retq pops that pushed address off the stack and

● struct good { uint64_t a[1024]; float b[1024];

● makes simd, graphics card & fpga optimisations

You might also like