BEE 425 Microprocessor System Design
Spring 2020
Lab 3: ARM Assembly language CPU diagnostic programming
Note: this is the class project, Milestone 2
1. Lab Objectives:
•       • Using Keil MDK integrated development environment, learned in Lab 2
•       • Programming in ARM assembly language
•       • Sesign and test CPU diagnostic code to use in Lab 4/Milestone 3.
2. Materials Needed:
•      • Keil MDK version 4.7
•      • Assembly code listed in the textbook Figure 7.60
•      • Your own class project Milestone 1 report on CPU enhancements
You should have installed Keil MDK on your home machine to complete Lab 2. You will reuse it here.
   1.
        The machine code is stored in a hexadecimal file called memfile.dat, which is loaded by the testbench
        during simulation. The file consists of the machine code for the instructions, one instruction per line. The
        testbench, top-level ARM module, and external memory HDL code are given in the following examples.
        The memories in this example hold 64 words each.
        Code:
                  00 MAIN SUB R0, R15, R15 ; R0 = 0 1110 000 0010 0 1111 0000 0000 0000 1111 E04F000F
                  04 ADD R2, R0, #5 ; R2 = 5 1110 001 0100 0 0000 0010 0000 0000 0101 E2802005
                  08 ADD R3, R0, #12 ; R3 = 12 1110 001 0100 0 0000 0011 0000 0000 1100 E280300C
                  0C SUB R7, R3, #9 ; R7 = 3 1110 001 0010 0 0011 0111 0000 0000 1001 E2437009
                  10 ORR R4, R7, R2 ; R4 = 3 OR 5 = 7 1110 000 1100 0 0111 0100 0000 0000 0010 E1874002
                  14 AND R5, R3, R4 ; R5 = 12 AND 7 = 4 1110 000 0000 0 0011 0101 0000 0000 0100
                  E0035004
                  18 ADD R5, R5, R4 ; R5 = 4 + 7 = 11 1110 000 0100 0 0101 0101 0000 0000 0100 E0855004
                  1C SUBS R8, R5, R7 ; R8 = 11 - 3 = 8, set Flags 1110 000 0010 1 0101 1000 0000 0000 0111
                  E0558007
                  20 BEQ END ; shouldn't be taken 0000 1010 0000 0000 0000 0000 0000 1100 0A00000C
                  24 SUBS R8, R3, R4 ; R8 = 12 - 7 = 5 1110 000 0010 1 0011 1000 0000 0000 0100 E0538004
                  28 BGE AROUND ; should be taken 1010 1010 0000 0000 0000 0000 0000 0000 AA000000
                  2C ADD R5, R0, #0 ; should be ski pped 1110 001 0100 0 0000 0101 0000 0000 0000
                  E2805000
                  30 AROUND SUBS R8, R7, R2 ; R8 = 3 - 5 = -2, set Flags 1110 000 0010 1 0111 1000 0000
                  0000 0010 E0578002
                  34 ADDLT R7, R5, #1 ; R7 = 11 + 1 = 12 1011 001 0100 0 0101 0111 0000 0000 0001
                  B2857001
                  38 SUB R7, R7, R2 ; R7 = 12 - 5 = 7 1110 000 0010 0 0111 0111 0000 0000 0010 E0477002
                  3C STR R7, [R3, #84] ; mem[12+84] = 7 1110 010 1100 0 0011 0111 0000 0101 0100
                  E5837054
                  40 LDR R2, [R0, #96] ; R2 = mem[96] = 7 1110 010 1100 1 0000 0010 0000 0110 0000
                  E5902060
                  44 ADD R15, R15, R0 ; PC = PC+8 (skips next) 1110 000 0100 0 1111 1111 0000 0000 0000
                  E08FF000
                  48 ADD R2, R0, #14 ; shouldn't happen 1110 001 0100 0 0000 0010 0000 0000 0001
                  E280200E
                  4C B END ; always taken 1110 1010 0000 0000 0000 0000 0000 0001 EA000001
                  50 ADD R2, R0, #13 ; shouldn't happen 1110 001 0100 0 0000 0010 0000 0000 0001
                  E280200D
                  54 ADD R2, R0, #10 ; shouldn't happen 1110 001 0100 0 0000 0010 0000 0000 0001
                  E280200A
                  58 STR R2, [R0, #100] ; mem[100] = 7 1110 010 1100 0 0000 0010 0000 0101 0100 E5802064
                  END
        Explanation:
        Summary of the process:
A data path contains all the functional units and connections necessary to implement an
instruction set architecture. For our single-cycle implementation, we use two separate memories,
an ALU, some extra adders, and lots of multiplexers. MIPS is a 32-bit machine, so most of the
buses are 32-bits wide. The control unit tells the data path what to do, based on the instruction
that’s currently being executed. Our processor has ten control signals that regulate the data path.
The control signals can be generated by a combinational circuit with the instruction’s 32-bit
binary encoding as input. Next, we’ll see the performance limitations of this single-cycle
machine and try to improve upon it. Last time we saw a MIPS single-cycle datapath and control
unit. Today, we’ll explore factors that contribute to a processor’s execution time, and specifically
at the performance of the single-cycle machine. Next time, we’ll explore how to improve on the
single cycle machine’s performance using pipelining.
            CPU timeX,P= Instructions executedP* CPIX,P* Clock cycle timeX
        In first step R0 become 0 when R0 is subtracted from R15 , then #5 is added to it and
        result became 2 in R2. R0 is added in R0, #12 the result become12 in R3. Then R7 is
        subtrated from R3, the result become 3 in R7. Then Logical OR is taken of R4.R7,R2
        resulting R4=3 | 5=7 . And operation is done on R5,R3,R4 resulting R5=4+2=11.
        Subtraction with carry is done on R8,R5,R7 Resulting R8=8 and it will set flag.
        Then Subtraction with carry is done on R8,R3,R4 Resulting R8=5 and it will set flag.
        Then branch first instruction was chosen. Adding R5 and R0 resulting equalization of
        both registers. Subtraction with carry is done on R8,R7,R2 Resulting R8=-2 and it will
        set flag. Subsequently, the ADDLT instruction is executed because LT condition is full
        filled when V != N (values of overflow and negative bits in the CPSR are different).
        Subtraction is done on R7,R7,R2 Resulting R7=12 .
        Then storing the value in R7 of [R3 + #84] resulting MEM[12+84]= 7
        Here LDR generate literal constants when an immediate value cannot be moved into a
        register because it is out of range of the MOV and MVN instructions.
        Then Adding R15,R15 resulting PC=PC+8. Then Adding R2,R0 Resulting R2=14.
       Then adding R2,R0, #13. Adding R2,R0, #10. And finally agaiun storing the final relust
       in register R2 resulting MEM[100] = 7
Which parts of the CPU are validated if the results match expectations?
     R2,R0
Which parts of the CPU are not tested at all? E.g. any unused registers.
     R1
Which parts of the ALU are tested?
     Sequential Logic
Have we tested all bits of the ALU in all operations? If not, which bits in
which functions?
     Yes we tested all.
Which ALU status bits are checked? Which are not checked?
     All status bits are checked
How have we tested memory access?
     By using STR and LDR operations
If we were to test memory access more thoroughly, how could we do
so?
     If your computer's CPU had to constantly access the hard drive to retrieve every piece of
     data it needs, it would operate very slowly. When the information is kept in memory, the
     CPU can access it much more quickly. Most forms of memory are intended to store data
     temporarily.
How have we tested branch instructions?
By using BEG END