EC - 622 | 2020
Assembler Designing
MPA special Assignment
PRAVEEN KUKREJA | 17BEC073 & PRIYANSH SHASHIKANT RINGE | 16BEC024
Introduction
An assembler is system software which converts an assembly level language program into a
machine level language program.
An assembly language is a low level programming language for a computer system. It is specific
to a certain computer system. Hence it is a machine dependent in nature.
Assembly languages are close to a one to one correspondence between symbolic instructions
and executable machine codes. Assembly
languages also include directives to the
assembler, directives to the linker, directives for
organizing data space, and macros. Macros can
be used to combine several assembly language
instructions into a high level language-like
construct (as well as other purposes). There are
cases where a symbolic instruction is translated
into more than one machine instruction. But in
general, symbolic assembly language instructions
correspond to individual executable machine
instructions
High level languages are abstract. Typically a single high level instruction is translated into
several (sometimes dozens or in rare cases even hundreds) executable machine language
instructions. Some early high level languages had a close correspondence between high level
instructions and machine language instructions. Modern object oriented programming
languages are highly abstract.
EC - 622 | 2020
Assembly language is much harder to program than high level languages. The programmer
must pay attention to far more detail and must have an intimate knowledge of the processor in
use. But high quality hand crafted assembly language programs can run much faster and use
much less memory and other resources than a similar program written in a high level language.
Speed increases of two to 20 times faster are fairly common, and increases of hundreds of
times faster are occasionally possible
High level programming languages are much easier for less skilled programmers to work in and
for semi-technical managers to supervise. And high level languages allow faster development
times than work in assembly language, even with highly skilled programmers. Development time
increases of 10 to 100 times faster are fairly common. Programs written in high level languages
(especially object oriented programming languages) are much easier and less expensive to
maintain than similar programs written in assembly language (and for a successful software
project, the vast majority of the work and expense is in maintenance, not initial development).
Aim and Description of the Project
The main aim of our project is to convert an assembly level program in MIPS to its equivalent
opcode, only if the instructions used in the program use only the instructions, which are
designed in the project.
Proposed System
We developed a program to implement a 2-pass Assembler which reads the input text file
containing the assembly code and produces an object file in 2 passes. The Proposed system is
designed to consider most of the assembler features in a limited fashion.The features
implemented in this design are 1 byte, 2 byte, 3 byte instructions, register-register and register
and register to memory instruction formats. it also performs specific tasks for the assembler
directives as indicated in the module.
2 Pass Assembler
A two Pass Assembler scans the source code twice, once during first pass and the other during
the second pass. During the first pass, all the symbols defined in the code are stored in the
EC - 622 | 2020
symbol table (SYMTAB) and memory is allocated for the instructions. During the second pass
the machine codes are loaded into the memory allocated for each instruction.
Operation code table (OPTAB)
It is used to store the mnemonic operation code and its machine language equivalent. The
information stored in this table is predefined when the assembler itself is written, rather than
being loaded into the table at execution time.
Symbol table (SYMTAB)
It is used to store name, address, and other attributes associated with different labels used in
the source program. This is a dynamic table, that is, values are entered into the table during
assembling process, and its length cannot be predicated
Flow Chart
Fig 1. Pass-1 Assembler
EC - 622 | 2020
Fig 2. Pass-2 Assembler
Brief explanation
In pass 1 phase of the assembler, all instructions are checked and confirmed that they are legal
in the current assembly mode. Space is then allocated for instructions and storage spaces are
allocated that are requested. Addresses are assigned to all statements and noted down using a
location counter. A symbol table is created, here, for every symbol encountered, an entry is
made in this table. Values assigned to labels are saved in the symbol table for future use in
pass 2 phase of the assembler and processes assembler directives. After completion of this
stage, all necessary spaces have been allocated, each symbol defined in the program has been
associated with a location counter in the symbol table.
Pass 2 phase of the assembler begins when there are no more statements left to read in the
program. This phase starts at the beginning of the program. It examines the operands for
symbolic reference to storage locations and resolves them using information stored in the
symbol table. Pass 2 ensures that no invalid instructions form. It translates source statements
into machine code and constants, filling the allocated space with object code. Performs
processing of the assembler directives not done during pass 1 phase. It then writes the object
program to the destination .obj file and keeps track of errors and displays appropriate error
messages.
Error found in the first pass terminates the assembly process and doesn’t continue to the
second pass and these errors are displayed. If only warnings are generated, the process
EC - 622 | 2020
continues to the second pass and listing contains errors and warnings generated during the
second pass of the assembler. Warnings generated during the first pass are not displayed.
APPENDIX A
OPCODE TABLE
EC - 622 | 2020
MACHINE CODE GENERATED AS OUTPUT
C CODE :
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct Opcode //This node is used for Hashing using Chaining
{
char name[10];
EC - 622 | 2020
char code[35];
char format[5];
/**
rri = reg reg imm. addr.
rrr = reg reg reg
ri = reg imm.
rr = reg reg
a = add
z = zero
*/
struct Opcode *next;
};
struct Symbol //Symbol Table is made using Linked List to save space
{
char name[50];
int add;
struct Symbol *next;
};
typedef struct Opcode Opcode;
typedef struct Symbol Symbol;
Symbol *head=NULL;
Opcode* hash_table[13] = {NULL};
void reverseArray(int arr[], int start, int end)
{
int temp;
while (start < end)
{
temp = arr[start];
arr[start] = arr[end];
arr[end] = temp;
start++;
end--;
}
}
int* conBin(int num)
{
int t;
int i, j;
int *bin;
bin=(int*)malloc(10*sizeof(int));
for(i=0; i<10; i++)
EC - 622 | 2020
{
bin[i]=0;
}
i=9;
t = num;
while(t!=0)
{
bin[i--]= t % 2;
t = t / 2;
}
return bin;
}
char* convertTo5BitBinaryString(int decimal) //This decimal is between 0 and 31
{
printf("bitbinary function receives %d\n",decimal);
char *str = (char *)malloc(5*sizeof(char));
int d[5]={0};
int i=0,j=0;
while(decimal>0)
{
d[i]=decimal%2;
i++;
decimal=decimal/2;
}
int size = i;
int k=0;
/* for(j=i-1;j>=0;j--,k++)
{
mac[k] = d[j];
}*/
int s=0;
reverseArray(d,0,4);
for(s=0;s<5;s++)
{
printf("%d",d[s]);
str[s] = d[s] + '0';
}
printf("\n");
printf("%s",str);
return str;
EC - 622 | 2020
int getHashIndex(char name[])
{
int sum=0,i=0;
while(name[i]!='\0')
{
sum+=name[i++];
}
return sum%13;
}
void insertAtIndex(Opcode *Node,int index)
{
if(hash_table[index] == NULL)
{
hash_table[index] = Node;
Node->next = NULL;
}
else
{
Opcode* temp = hash_table[index];
while(temp->next != NULL)
{
temp = temp->next;
}
temp->next = Node;
Node->next=NULL;
}
}
void insertIntoHashMap(Opcode *Node)
{
int index = getHashIndex(Node->name);
insertAtIndex(Node,index);
}
int *getAddressCode(char* temp)
{
Symbol * t = head;
int * val;
int num;
while(t != NULL)
{
if(!strcmp(temp,t->name))
EC - 622 | 2020
{
num = t->add;
break;
}
t = t->next;
}
val = conBin(num);
return val;
}
char * getRegisterCode(char* temp)
{
char *s;
if (strcmp(temp,"R0") == 0)
{
s = "00000";
}
else if (strcmp(temp,"R1") == 0)
s = "00001";
else if (strcmp(temp,"R2") == 0)
s = "00010";
else if (strcmp(temp,"R3") == 0)
s = "00011";
else if (strcmp(temp,"R4") == 0)
s = "00100";
else if (strcmp(temp,"R5") == 0)
s = "00101";
else if (strcmp(temp,"R6") == 0)
s = "00110";
else if (strcmp(temp,"R7") == 0)
s = "00111";
else if (strcmp(temp,"R8") == 0)
s = "01000";
else if (strcmp(temp,"R9") == 0)
s = "01001";
else if (strcmp(temp,"R10") == 0)
s = "01010";
else if (strcmp(temp,"R11") == 0)
s = "01011";
else if (strcmp(temp,"R12") == 0)
s = "01100";
else if (strcmp(temp,"R13") == 0)
s = "01101";
else if (strcmp(temp,"R14") == 0)
EC - 622 | 2020
s = "01110";
else if (strcmp(temp,"R15") == 0)
s = "01111";
else if (strcmp(temp,"A1") == 0)
s = "10000";
else if (strcmp(temp,"A2") == 0)
s = "10001";
else if (strcmp(temp,"A3") == 0)
s = "10010";
else if (strcmp(temp,"A4") == 0)
s = "10011";
else if (strcmp(temp,"port0") == 0)
s = "10100";
else if (strcmp(temp,"port1") == 0)
s = "10101";
return s;
}
char *getConstantCode(int temp)
{
return convertTo5BitBinaryString(temp);
}
Opcode* getOpcodeNode(char *op)
{
Opcode* temp = NULL;
int index = getHashIndex(op);
if(hash_table[index] == NULL)
{
printf("Wrong Opcode");
return NULL;
}
else
{
temp = hash_table[index];
while(strcmp(temp->name,op)!=0 && temp!=NULL)
{
temp = temp->next;
}
if(temp == NULL)
{
printf("Opcode not found!");
return NULL;
EC - 622 | 2020
}
else
{
return temp;
}
}
}
char * getOpcodeFormat(Opcode* temp)
{
return temp->format;
}
int main()
{
FILE *input_opcode;
FILE *output_machine_code;
FILE *input_instructions;
int ilc=0; //Instruction Location Counter
int base = 0;
char c,c2,c3,temp;
char opcode[100];
char machine_code[100];
char format[5];
// printf("Give the base address of the program");
base = 0;
input_opcode = fopen("input_opcode.txt","r+"); //input_opcode contains a list of opcodes
//followed by their format and mac.code
if (input_opcode == NULL)
printf("FILE OPENING PROBLEM");
do
{
c = fscanf(input_opcode,"%s",opcode);//Assuming to get opcode as a string in opcode
//array
c2= fscanf(input_opcode,"%s",machine_code);//Assuming to get a the integer as a string in
//machine_array array
c3= fscanf(input_opcode,"%s",format);
//now we will create node of each string
Opcode* Node = malloc(sizeof(Opcode));
strcpy(Node->name,opcode);
//Name of the opcode is fed
EC - 622 | 2020
strcpy(Node->code,machine_code);
//Machine code of the opcode is fed
strcpy(Node->format,format);
//Format of the opcode is fed
// printf("BEFORE INSERTING NAME:: %s ,CODE:: %s and
format",Node->name,Node->code,Node->format);
insertIntoHashMap(Node);
}while(c!=EOF && c2!=EOF && c3!=EOF);
printf("Hash-map Created Successfully!\n");
/*TEST:: PRINTING HASHTABLE with hashcode*/
int i=0;
for(i =0;i<13;i++)
{
if(hash_table[i]!=NULL)
{
Opcode* temp = hash_table[i];
while(temp!=NULL)
{
// printf("%d\n",g++); //Just to check whether the chaining is printed or not
printf("NAME:: %s and CODE:: %s and format:: %s
\n",temp->name,temp->code,temp->format);
temp = temp->next;
}
}
}
printf("Now reading Opcodes and Converting them to machine codes\n");
input_instructions = fopen("input_instructions.txt","r+");
output_machine_code = fopen("output_machine_code.txt","w+");
char k;
char op[100];
while ( fgets ( op, sizeof op, input_instructions ) != NULL ) /* read a line */
{
int l=0;
EC - 622 | 2020
while(op[l+1]!='\0')
{
// printf("%c",opc[l] );
if(op[l]==':') //Its a label
{
// printf("Label");
printf("Label Found!");
Symbol *t;
Symbol *temp = malloc(sizeof(Symbol));
int i=0;
for(;i<l;i++)
temp->name[i] = op[i];
temp->name[i] = '\0';
temp->add = ilc + 1 + base;
temp->next = NULL;
if(head == NULL)
head = temp;
else
{
t= head;
while(t->next!=NULL)
t= t->next;
t->next = temp;
}
//handle label
}
l++;
}
ilc++;
}
fclose(input_instructions);
input_instructions = fopen("input_instructions.txt","r+");
int * binary;
int count;
do
{
k=fscanf(input_instructions,"%s",op);
printf("WORD SCANNED IS %s ",op);
/*check if opcode or label*/
int l=0;
while(op[l+1]!='\0')
{
l++;
EC - 622 | 2020
}
if(op[l]==':') //Its a label
{
//printf("Label Found!");
fprintf(output_machine_code,"\n");
//handle label
}
else
{
printf("Inside else\n");
char temp[100];
char temp2[100];
char temp3[100];
int temp4;
//handle opcode and print corresponding machine code
Opcode* current_node = getOpcodeNode(op);
fprintf(output_machine_code,"%s",current_node->code);
//print machine code of the opcode
if (strcmp("z",getOpcodeFormat(current_node))==0)
//ZERO OPERAND INSTRUCTION
{
fprintf(output_machine_code,"\n");//Do nothing
}
else if(strcmp("r",getOpcodeFormat(current_node))==0)
//ONE OPERAND REGISTER OPERAND INSTRUCTION
{
k = fscanf(input_instructions,"%s",temp);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"\n");
}
else if(strcmp("a",getOpcodeFormat(current_node))==0)
//ONE OPERAND ADDRESS OPERAND INSTRUCTION
{
k = fscanf(input_instructions,"%s",temp);
binary = getAddressCode(temp);
for(count=0;count<10;count++)
{
fprintf(output_machine_code,"%d",binary[count]);
}
fprintf(output_machine_code,"\n");
}
EC - 622 | 2020
else if(strcmp("rr",getOpcodeFormat(current_node))==0)
//TWO OPERAND REGISTER REGISTER OPERAND INSTRUCTION
{
//printf("inside two");
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%s",temp2);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"%s",getRegisterCode(temp2));
fprintf(output_machine_code,"\n");
}
else if(strcmp("ri",getOpcodeFormat(current_node))==0)
//TWO OPERAND REGISTER CONSTANT INSTRUCTION
{
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%d",&temp4);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
// fprintf(output_machine_code,"%s",getConstantCode(temp4));
binary = conBin(temp4);
for(count=0;count<10;count++)
{
fprintf(output_machine_code,"%d",binary[count]);
}
fprintf(output_machine_code,"\n");
}
else if(strcmp("rrr",getOpcodeFormat(current_node))==0)
//THREE OPERAND REGISTER-REGISTER-REGISTER INSTRUCTION
{
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%s",temp2);
k = fscanf(input_instructions,"%s",temp3);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"%s",getRegisterCode(temp2));
fprintf(output_machine_code,"%s",getRegisterCode(temp3));
fprintf(output_machine_code,"\n");
}
else if(strcmp("rri",getOpcodeFormat(current_node))==0)
//THREE OPERAND REGISTER-REGISTER-INTERMEDIATE INSTRUCTION
{
k = fscanf(input_instructions,"%s",temp);
k = fscanf(input_instructions,"%s",temp2);
k = fscanf(input_instructions,"%d",&temp4);
fprintf(output_machine_code,"%s",getRegisterCode(temp));
fprintf(output_machine_code,"%s",getRegisterCode(temp2));
EC - 622 | 2020
binary = conBin(temp4);
for(count=0;count<10;count++)
{
fprintf(output_machine_code,"%d",binary[count]);
}
fprintf(output_machine_code,"\n");
}
}
}while(k!=EOF);
printf("out\n");
fclose(input_instructions);
fclose(output_machine_code);
fclose(input_opcode);
Symbol *p;
p=head;
FILE *f = fopen("symbol_table.txt","w+");
while(p!=NULL)
{
printf("%s :: ",p->name);
fprintf(f,"%s :: ",p->name);
printf("%d\n",p->add);
fprintf(f,"%d\n",p->add);
p = p->next;
}
return 0;
}