Chapter 2
Chapter 2
1
Outlines
2
Role of Assembler
Source Object
Assembler Linker
Program Code
Executable
Code
Loader
3
Introduction to Assemblers
•Fundamental functions
•translating mnemonic operation codes to their machine language
equivalents
•assigning machine addresses to symbolic labels
•Machine dependency
•different machine instruction formats and codes
4
Line Source statement
6
Assembler Directives
•Assembler directives
•Pseudo-Instructions
• Not translated into machine instructions
• Providing information to the assembler
•Basic assembler directives
START Specify name and sharing address for the program.
END Indicate the end of the source program and (optionally) specify the first
executable instruction in the program.
BYTE Generate character or hexadecimal constant, occupying as many bytes as
needed to represent the constant.
WORD Generate one-word integer constant.
RESB Reserve the indicated number if bytes fir a data area.
RESW Reserve the indicated number of words for a data area.
7
Example Program (Fig. 2.1)
8
2.1.1 A Simple SIC Assembler
•Assembler’s functions
1. Convert mnemonic operation codes to their machine language
equivalents – e.g., translate STL to 14 (line 10)
2. Convert symbolic operands to their equivalent machine
addresses – e.g., translate RETADR to 1033 (line 10)
3. Build the machine instructions in the proper format
4. Convert the data constants to internal machine
representations – e.g., translate EOF to 454F46 (line 80)
5. Write the object program and the assembly listing
9
Line Loc Source statement Object code
11
•Assemblers write the generated object code onto some
output device
•Object programs
•Load into memory for execution later.
•Contain three types of records
• Head
• Text
• End
12
Object Program
•Header
Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address (hex)
Col. 14~19 Length of object program in bytes (hex)
•Text
Col. 1 T
Col. 2~7 Starting address in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code (2 columns per byte of object code)
•End
Col. 1 E
Col. 2~7 Address of first executable instruction (hex)
(END program name)
13
Fig. 2.3
14
Two Pass Assembler
15
Two Pass Assembler
Source
program
Intermediate Object
Pass 1 Pass 2
file codes
16
2.1.2 Assembler Algorithm and Data Structure
•Simple assembler uses two major internal data structures
•Operation Code Table (OPTAB)
•Symbol Table (SYMTAB)
•OPTAB
•To look up mnemonic operation codes and translate them to their
machine language equivalents
•SYMTAB
•To store values (address) assigned to labels
•Content
•mnemonic, machine code (instruction format, length) etc.
•Characteristic
•static table
•Implementation
•hash table or other data structure (array), easy for search
18
SYMTAB (symbol table)
•Content
•label name, value, flag (to indicate error condition), other
information about data area or instruction labeled (type, length),
etc. COPY 1000
•Characteristic FIRST
CLOOP
1000
1003
•Dynamic table ENDFIL
EOF
1015
1024
• efficiency to insert and retrieval, delete rarely THREE 102D
ZERO 1030
RETADR 1033
•Implementation LENGTH
BUFFER
1036
1039
•hash table RDREC 2039
19
Algorithm for Pass 1 of Assembler
20
Pass 1:
21
Figure 2.4(a) Algorithm for Pass 1 of assembler
22
Pass 2:
23
Figure 2.4(b) Algorithm for Pass 2 of Assembler
24
Homework #1
25
Assembler Design
26
2.2 Machine-dependent Assembler Features
•Sec. 2-2
•Instruction formats and addressing modes
•Program relocation
27
2.2.1 Instruction Format and Addressing Modes
•SIC/XE
•PC-relative or Base-relative addressing: op m (format 4)
•Indirect addressing: op @m
•Immediate addressing: op #c
•Extended format: +op m
•Index addressing: op m, X
•register-to-register instructions
•larger memory -> multi-programming (program allocation)
•Example program
•Figure 2.5
28
Line Source statement
•Register translation
•register mnemonic (A, X, L, B, S, T, F, PC, SW) and their values
(0,1, 2, 3, 4, 5, 6, 8, 9)
•preloaded in SYMTAB
30
Address translation
•Most of the register-to-memory instructions are assembled
using either program-counter relative or base relative
addressing
•Assembler calculates a displacement to be assembled as part of the
object instructions
•The correct target address is displacement added to the contents of
program counter (PC) or base register (B)
•The displacement must be small enough to fit the 12-bit field in the
instruction, the displacement must be between
• 0 and 4095 (base relative mode)
• -2048 and +2047 (pc relative mode)
•If pc relative nor base relative mode addressing can not be used
(displacement is too large), then the 4-byte extended instruction format
(Format 4 20-bit address field) must be used
31
Line Loc Source statement Object code
33
mode Bit n Bit i Target address
immediate addressing 0 1 Operand value
indirect addressing 1 0 The word at the location given the target address
simple address 0 0 The target address is taken as the location of the
1 1 operand
format 3 6 1 1 1 1 1 1 12
e =0 : format 3
op n i x b p e disp e =1 : format 4
format 4 6 1 1 1 1 1 1 20
op n i x b p e address
35
Base-Relative Addressing Modes
•Base-relative
•base register is under the control of the programmer
12 0003 LDB #LENGTH
13 BASE LENGTH
160 104E STCH BUFFER, X 57C003
op(6) n i xbp e disp(12)
(54)16 111100 (003)16
1100 0000 0000 00112 = C00316
• NOBASE is used to inform the assembler that the contents of the base
register no longer be relied upon for addressing
36
Immediate Address Translation
•Immediate addressing
55 0020 LDA #3 010003
op(6) n i xbp e disp(12)
( 00 )16 010000 (003)16
37
Immediate Address Translation (Cont’d)
•Immediate addressing
12 0003 LDB #LENGTH 69202D
op(6) n i xbp e disp(12)
( 68)16 010010 (02D)16 PC relative
( 68)16 010010 (033)16 690033 immediate
38
Indirect Address Translation
•Indirect addressing
•target addressing is computed as usual (PC- relative or BASE-
relative)
•only the n bit is set to 1
70 002A J @RETADR 3E2003
op(6) n i xbp e disp(12)
(3C)16 100010 (003)16
39
2.2.2 Program Relocation
40
Example
41
Relocatable Program
•Modification record
•Col 1 M
•Col 2-7 Starting location of the address field to be modified,
relative to the beginning of the program
•Col 8-9 length of the address field to be modified, in half-bytes
42
Object Code
43
Homework #2
44
2.3 Machine-Independent Assembler Features
•Literals
•Symbol Defining Statement Expressions
•Program Blocks
•Control Sections and Program Linking
32
45
2.3.1 Literals
•Design idea
•Let programmers to be able to write the value of a constant
operand as a part of the instruction that uses it.
•This avoids having to define the constant elsewhere in the
program and make up a label for it.
•Example
e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
93 LTORG
002D * =C’EOF’ 454F46
e.g. 215 1062 WLOOP TD =X’05’ E32011
46
Line Source statement
•Immediate Operands
•The operand value is assembled as part of the machine instruction
e.g. 55 0020 LDA #3 010003
•Literals
•The assembler generates the specified value as a constant at
some other memory location
e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
•Compare (Fig. 2.6)
e.g.
45 001A ENDFIL LDA EOF 032010
80 002D EOF BYTE C 'EOF' 454F46
49
Literal - Implementation (1/3)
•Literal pools
•Normally literals are placed into a pool at the end of the program
• see Fig. 2.10 (END statement)
•In some cases, it is desirable to place literals into a pool at some
other location in the object program
• assembler directive LTORG
• reason: keep the literal operand close to the instruction
50
Literal - Implementation (2/3)
•Duplicate literals
e.g. 215 1062 WLOOP TD =X’05’
e.g. 230 106B WD =X’05’
•The assemblers should recognize duplicate literals and store only
one copy of the specified data value
•Comparison of the generated data value
• The benefits of using generated data value are usually not great enough to justify the
additional complexity in the assembler
•Comparison of the defining expression
• Same literal name with different value, e.g. LOCCTR=*
51
Literal - Implementation (3/3)
•LITTAB
•literal name, the operand value and length, the address assigned to the
operand
•Pass 1
•build LITTAB with literal name, operand value and length, leaving the
address unassigned
•when LTORG statement is encountered, assign an address to each literal
not yet assigned an address
•Pass 2
•search LITTAB for each literal operand encountered
•generate data values using BYTE or WORD statements
•generate modification record for literals that represent an address in the
program
52
2.3.2 Symbol-Defining Statements
53
2.3.2 Symbol-Defining Statements (cont’d)
•Example 1
MAXLEN EQU 4096
+LDT MAXLEN +LDT #4096
•Example 2
BASE EQU R1
COUNT EQU R2
INDEX EQU R3
54
ORG (origin)
55
ORG Example
56
2.3.3 Expressions
57
SYMTAB
•Program blocks
•refer to segments of code that are rearranged within a single
object program unit
•USE [blockname]
•At the beginning, statements are assumed to be part of the
unnamed (default) block
•If no USE statements are included, the entire program belongs to
this single block
•Example: Figure 2.11
•Each program block may actually contain several separate
segments of the source program
59
Line Source Statement
•Pass 1
•each program block has a separate location counter
•each label is assigned an address that is relative to the start of the
block that contains it
•at the end of Pass 1, the latest value of the location counter for
each block indicates the length of that block
•the assembler can then assign to each block a starting address in
the object program
•Pass 2
•The address of each symbol can be computed by adding the
assigned block starting address and the relative address of the
symbol to that block
61
Line Loc/Block Source Statement Object code
63
Program Readability
•Program readability
•No extended format instructions on lines 15, 35, 65
•No needs for base relative addressing (line 13, 14)
•LTORG is used to make sure the literals are placed ahead of any
large data areas (line 253)
•Object code
•It is not necessary to physically rearrange the generated code in
the object program
•see Fig. 2.13, Fig. 2.14
64
Figure 2.13
Figure 2.13Object
Objectprogram corresponding
program to Fig to
corresponding 2.12
Fig 2.11
65
Source program Object program Program
loaded in Relative
Line memory address
Figure 2.14 Program blocks from Fig 2.11 traced through the assembly and loading
processes
66
2.3.5 Control Sections and Program Linking
•Control Sections
•are most often used for subroutines or other logical subdivisions
of a program
•the programmer can assemble, load, and manipulate each of these
control sections separately
•instruction in one control section may need to refer to instructions
or data located in another section
•because of this, there should be some means for linking control
sections together
•Fig. 2.15, 2.16
67
Line Source statement
•External definition
•EXTDEF name [, name]
•EXTDEF names symbols that are defined in this control section
and may be used by other sections
•External reference
•EXTREF name [,name]
•EXTREF names symbols that are used in this control section and
are defined elsewhere
•Example
15 0003 CLOOP +JSUB RDREC 4B100000
160 0017 +STCH BUFFER,X 57900000
190 0028 MAXLEN WORD BUFEND-BUFFER 000000
69
Line Loc Source statement Object code
72
Figure 2.17 Object program corresponding to Fig. 2.15
73
External References in Expression
•Earlier definitions
•required all of the relative terms be paired in anexpression (an
absolute expression), or that all except one be paired (a relative
expression)
•New restriction
•Both terms in each pair must be relative within the same control
section
•Ex: BUFEND-BUFFER
•Ex: RDREC-COPY
•In general, the assembler cannot determine whether or not
the expression is legal at assembly time. This work will be
handled by a linking loader.
74
2.4 Assembler Design Options
•One-pass assemblers
•Multi-pass assemblers
•Two-pass assembler with overlay structure
54
75
Two-Pass Assembler with Overlay Structure
•For small memory
•pass 1 and pass 2 are never required at the same time
•three segments
• root: driver program and shared tables and subroutines
• pass 1
• pass 2
•tree structure
•overlay program
76
2.4.1 One-Pass Assemblers
•Main Problem
•forward references
• data items
• labels on instructions
•Solution
• data items: require all such areas be defined before they are referenced
• labels on instructions: no good solution
77
2.4.1 One-Pass Assemblers (cont’d)
•Main Problem
•forward references
• data items
• labels on instructions
•Two types of one-pass assembler
•load-and-go
• produces object code directly in memory for immediate execution
•the other
• produces usual kind of object code for later execution
78
Load-and-go Assembler
•Characteristics
•Useful for program development and testing
•Avoids the overhead of writing the object program out and
reading it back
•Both one-pass and two-pass assemblers can be designed as load-
and-go.
•However one-pass also avoids the overhead of an additional pass
over the source program
•For a load-and-go assembler, the actualaddress must be known at
assembly time, we can use an absolute program
79
Forward Reference in One-pass Assembler
80
Load-and-go Assembler (cont’d)
81
Line Loc Source statement Object code
Figure 2.19(a) Object code in memory and symbol table entries for the program in Fig 2.18
after scanning line 40
83
Memory Symbol Value
address Contents
Figure 2.19(b) Object code in memory and symbol table entries for the program in Fig 2.18
after scanning line 160
84
Producing Object Code
86
2.4.2 Multi-Pass Assemblers
87
Figure 2.21 Example of multi-pass assemble operation
88
Figure 2.21 Example of multi-pass assemble operation (cont’d)
89
Figure 2.21 Example of multi-pass assemble operation (cont’d)
90
Figure 2.21 Example of multi-pass assemble operation (cont’d)
91
Figure 2.21 Example of multi-pass assemble operation (cont’d)
92
Figure 2.21 Example of multi-pass assemble operation (cont’d)
93
2.5 Implement Examples
•MASM Assembler
•SPARC Assembler
•AIX Assembler
94
2.5.1 MASM Assembler
95
2.5.1 MASM Assembler (cont’d)
96
2.5.1 MASM Assembler (cont’d)
97
2.5.2 SPARC Assembler
98
2.5.2 SPARC Assembler (cont’d)
99
Delay Branch
•Original Instruction Sequence
LOOP: .
.
.
ADD %L2, %L3, %L4
CMP %L0, 10
BLE LOOP
NOP
•NOP (no-operation), to simplify debugging SPARC assembly
language programmer place NOP in delay slots
100
Delay Branch (cont’d)
101
Delay Branch (cont’d)
102
2.5.3 AIX Assembler
103
2.5.3 AIX Assembler (cont’d)
105
2.5.3 AIX Assembler (cont’d)
106