EEC 214
Computer
      Microprocessor
          Lecture 3
                              CALL INSTRUCTIONS AND STACK
Another control transfer instruction is the CALL instruction, which is used to call a subroutine.
Subroutines are often used to perform tasks that need to be performed frequently. This makes a
program more structured in addition to saving memory space.
In the AVR there are four instructions for the call subroutine: CALL (long call) RCALL (relative
call), ICALL (indirect call to Z), and EICALL (extended indirect call to Z). The choice of which
one to use depends on the target address. Each instruction is explained next.
    Computer Microprocessor             Dr. Mohamed A. Torad                               2
                              Stack and stack pointer in AVR
Another control transfer instruction is the CALL instruction, which is used to call a subroutine.
Subroutines are often used to perform tasks that need to be performed frequently. This makes a
program more structured in addition to saving memory space.
In the AVR there are four instructions for the call subroutine: CALL (long call) RCALL (relative
call), ICALL (indirect call to Z), and EICALL (extended indirect call to Z). The choice of which
one to use depends on the target address. Each instruction is explained next.
How stacks are accessed in the AVR
If the stack is a section of RAM, there must be a register inside the CPU to point to it. The
register used to access the stack is called the SP (stack pointer) register. In I/O memory space,
there are two registers named SPL (the low byte of the SP) and SPH (the high byte of the SP).
The SP is implemented as two registers. The SPH register presents the high byte of the SP while
the SPL register presents the lower byte.
    Computer Microprocessor             Dr. Mohamed A. Torad                               3
                          Stack and stack pointer in AVR
Computer Microprocessor            Dr. Mohamed A. Torad    4
                          Stack and stack pointer in AVR
Computer Microprocessor            Dr. Mohamed A. Torad    5
                          Stack and stack pointer in AVR
Computer Microprocessor            Dr. Mohamed A. Torad    6
                               Initializing the stack pointer
When the AVR is powered up, the SP register contains the value 0, which is the address of R0.
Therefore, we must initialize the SP at the beginning of the program so that it points to
somewhere in the internal SRAM.
In AVR, the stack grows from higher memory location to lower memory location (when we push
onto the stack, the SP decrements). So, it is common to initialize the SP to the uppermost memory
location.
Different AVRs have different amounts of RAM. In the AVR assembler RAMEND represents the
address of the last RAM location. So, if we want to initialize the SP so that it points to the last
memory location, we can simply load RAMEND into the SP. Notice that SP is made of two
registers, SPH and SPL. So, we load the high byte of RAMEND into SPH, and the low byte of
RAMEND into the SPL.
Example 8 shows how to initialize the SP and use the PUSH and POP instructions. In the
example you can see how the stack changes when the PUSH and POP instructions are executed.
     Computer Microprocessor             Dr. Mohamed A. Torad                                7
                               CALL instruction and the role of the stack
When a subroutine is called, the processor first saves the address of the instruction just below the
CALL instruction on the stack, and then transfers control to that subroutine. This is how the CPU
knows where to resume when it returns from the called subroutine.
For the AVRs whose program counter is not longer than 16 bits (e.g., ATmega128, ATmega32),
the value of the program counter is broken into 2 bytes. The higher byte is pushed onto the stack
first, and then the lower byte is pushed.
For the AVRs whose program counters are longer than 16 bits but shorter than 24 bits, the value
of the program counter is broken up into 3 bytes. The highest byte is pushed first, then the middle
byte is pushed, and finally the lowest byte is pushed. So, in both cases, the higher bytes are
pushed first.
     Computer Microprocessor                  Dr. Mohamed A. Torad                            8
    RET instruction and the role of the stack
Computer Microprocessor         Dr. Mohamed A. Torad   9
                                The upper limit of the stack
As mentioned earlier, we can define the stack anywhere in the general purpose memory. So, in
the AVR the stack can be as big as its RAM.
Note that we must not define the stack in the register memory, nor in the I/O memory. So, the SP
must be set to point above 0x60.
In AVR, the stack is used for calls and interrupts.
We must remember that upon calling a subroutine, the stack keeps track of where the CPU should
return after completing the subroutine. For this reason, we must be very careful when
manipulating the stack contents.
     Computer Microprocessor               Dr. Mohamed A. Torad                          10
                          The upper limit of the stack
Computer Microprocessor           Dr. Mohamed A. Torad   11
                          The upper limit of the stack
Computer Microprocessor           Dr. Mohamed A. Torad   12
                     Calling many subroutines from the main program
It needs to be emphasized that in using CALL, the target address of the subroutine can be
anywhere within the 4M (word) memory space of the AVR. See Example 11. This is not the case
for the RCALL instruction, which is explained next.
    Computer Microprocessor            Dr. Mohamed A. Torad                         13
                                RCALL (relative call)
RCALL is a 2-byte instruction in contrast to CALL, which is 4 bytes.
Because RCALL is a 2-byte instruction, and only 12 bits of the 2 bytes are used for the address,
the target address of the subroutine must be within −2048 to +2047 words of memory relative to
the address of the current PC.
    Computer Microprocessor              Dr. Mohamed A. Torad                              14
                          RCALL (relative call)
Computer Microprocessor          Dr. Mohamed A. Torad   15
                          RCALL (relative call)
Computer Microprocessor          Dr. Mohamed A. Torad   16
                          RCALL (relative call)
Computer Microprocessor          Dr. Mohamed A. Torad   17
                                     ICALL (indirect call)
In this 2-byte (16-bit) instruction, the Z register specifies the target address. When the instruction
is executed, the address of the next instruction is pushed into the stack (like CALL and RCALL)
and the program counter is loaded with the contents of the Z register.
So, the Z register should contain the address of a function when the ICALL instruction is
executed. Because the Z register is 16 bits wide, the ICALL instruction can call the subroutines
that are within the lowest 64K words of the program memory. (The target address calculation in
ICALL is the same as for the IJMP instruction.)
     Computer Microprocessor               Dr. Mohamed A. Torad                                18
                                   ICALL (indirect call)
In the AVRs with more than 64K words of program memory, the EICALL (extended indirect call)
instruction is available. The EICALL loads the Z register into the lower 16 bits of the PC and the
EIND register into the upper 6 bits of the PC. Notice that EIND is a part of I/O memory. See
Figure 11.
The ICALL and EICALL instructions can be used to implement pointer to function.
     Computer Microprocessor             Dr. Mohamed A. Torad                              19
                       AVR TIME DELAY AND INSTRUCTION PIPELINE
In creating a time delay using Assembly language instructions, one must be mindful of two
factors that can affect the accuracy of the delay:
1. The crystal frequency: The frequency of the crystal oscillator connected to the XTAL1 and
   XTAL2 input pins is one factor in the time delay calculation. The duration of the clock period
   for the instruction cycle is a function of this crystal frequency.
2. Indeed, one way to increase performance without losing code compatibility with the older
    generation of a given family is to reduce the number of instruction cycles it takes to execute
    an instruction. One might wonder how microprocessors such as AVR are able to execute an
    instruction in one cycle. There are three ways to do that:
(a) Use Harvard architecture to get the maximum amount of code and data into the CPU,
(b) use RISC architecture features such as fixed-size instructions, and finally
(c) use pipelining to overlap fetching and execution of instructions.
     Computer Microprocessor             Dr. Mohamed A. Torad                              20
                                          Pipelining
In early microprocessors such as the 8085, the CPU could either fetch or execute at a given time.
In other words, the CPU had to fetch an instruction from memory, then execute it; and then fetch
the next instruction, execute it, and so on.
The idea of pipelining in its simplest form is to allow the CPU to fetch and execute at the same
time, as shown in Figure 12. (An instruction fetches while the previous instruction executes.)
In this way, the execution of many instructions is overlapped. One limitation of pipelining is that
the speed of execution is limited to the slowest stage of the pipeline. Compare this to making
pizza.
     Computer Microprocessor              Dr. Mohamed A. Torad                               21
                              AVR multistage execution pipeline
As shown in Figure 13, in the AVR, each instruction is executed in 3 stages: operand fetch, ALU
operation execution, and result write back. In step 1, the operand is fetched. In step 2, the
operation is performed; for
example, the adding of the two numbers is done. In step 3, the result is written into the
destination register. It should be noted that in many computer architecture books, the process
stage is referred to as execute and write back is called write.
    Computer Microprocessor              Dr. Mohamed A. Torad                           22
                               Instruction cycle time for the AVR
The crystal oscillator, along with on-chip circuitry, provide the clock source for the AVR CPU. In
the AVR, one machine cycle consists of one oscillator period, which means that with each
oscillator clock, one machine cycle passes.
     Computer Microprocessor              Dr. Mohamed A. Torad                             23
                               Instruction cycle time for the AVR
When a branch instruction is executed, the CPU starts to fetch codes from the new memory
location, and the code in the queue that was fetched previously is discarded. In this case, the
execution unit must wait until the fetch unit fetches the new instruction.
This is called a branch penalty. The penalty is an extra instruction cycle to fetch the instruction
from the target location instead of executing the instruction right below the branch. Remember
that the instruction below the branch has already been fetched and is next in line to be executed
when the CPU branches to a different address.
some instructions take two, three, or four machine cycles. These are JMP, CALL, RET, and all
the conditional branch instructions such as BRNE, BRLO, and so on.
The conditional branch instruction can take only one machine cycle if it does not jump. For
example, the BRNE will jump if Z = 0, and that takes two machine cycles. If Z = 1, then it falls
through and it takes only one machine cycle.
     Computer Microprocessor              Dr. Mohamed A. Torad                              24
                          Instruction cycle time for the AVR
Computer Microprocessor              Dr. Mohamed A. Torad      25
                          Instruction cycle time for the AVR
Computer Microprocessor              Dr. Mohamed A. Torad      26
                          Instruction cycle time for the AVR
Computer Microprocessor              Dr. Mohamed A. Torad      27
                          Loop inside a loop delay
Computer Microprocessor         Dr. Mohamed A. Torad   28
                          Loop inside a loop delay
Computer Microprocessor         Dr. Mohamed A. Torad   29
                          Loop inside a loop delay
Computer Microprocessor         Dr. Mohamed A. Torad   30
                          Loop inside a loop delay
Computer Microprocessor         Dr. Mohamed A. Torad   31
                          I/O PORT PROGRAMMING IN AVR
Computer Microprocessor            Dr. Mohamed A. Torad   32
                          I/O PORT PROGRAMMING IN AVR
Computer Microprocessor            Dr. Mohamed A. Torad   33
                              I/O PORT PROGRAMMING IN AVR
The 40-pin AVR has four ports. They are PORTA, PORTB, PORTC, and PORTD. To use any of
these ports as an input or output port, it must be programmed, as we will explain throughout this
section. In addition to being used for simple I/O, each port has some other functions such as
ADC, timers, interrupts, and serial communication pins.
Each port has three I/O registers associated with it, as shown in Table 2. They are designated as
PORTx, DDRx, and PINx. For example, for Port B we have PORTB, DDRB, and PINB. Notice
that DDR stands for Data Direction Register, and PIN stands for Port INput pins. Also notice
that each of the I/O registers is 8 bits wide, and each port has a maximum of 8 pins; therefore,
each bit of the I/O registers affects one of the pins (see Figure 2; the content of bit 0 of DDRB
represents the direction of the PB0 pin, and so on).
    Computer Microprocessor             Dr. Mohamed A. Torad                              34
                                DDRx register role in outputting data
Each of the ports A–D in the ATmega32 can be used for input or output. The DDRx I/O register
is used solely for the purpose of making a given port an input or output port. For example, to
make a port an output, we write 1s to the DDRx register. In other words, to output data to all of
the pins of the Port B, we must first put 0b11111111 into the DDRB register to make all of the
pins output.
The following code will toggle all 8 bits of Port B forever with some time delay between “on”
and “off” states:
            LDI R16,0xFF            ;R16 = 0xFF = 0b11111111
            OUT DDRB,R16            ;make Port B an output port (1111 1111)
L1:         LDI R16,0x55            ;R16 = 0x55 = 0b01010101
            OUT PORTB,R16           ;put 0x55 on port B pins
            CALL DELAY
            LDI R16,0xAA            ;R16 = 0xAA = 0b10101010
            OUT PORTB,R16           ;put 0xAA on port B pins
            CALL DELAY
            RJMP L1
It must be noted that unless we set the DDRx bits to one, the data will not go from the port
register to the pins of the AVR.
      Computer Microprocessor                Dr. Mohamed A. Torad                         35
                              DDR register role in inputting data
To make a port an input port, we must first put 0s into the DDRx register for that port, and then
bring in (read) the data present at the pins.
Notice that upon reset, all ports have the value 0x00 in their DDR registers.
    Computer Microprocessor               Dr. Mohamed A. Torad                            36
                               PIN register role in inputting data
To read the data present at the pins, we should read the PIN register. It must be noted that to bring
data into CPU from pins we read the contents of the PINx register, whereas to send data out to
pins we use the PORTx register.
There is a pull-up resistor for each of the
AVR pins. If we put 1s into bits of the
PORTx register, the pullup resistors are
activated. In cases in which nothing is
connected to the pin or the connected
devices have high impedance, the resistor
pulls up the pin. See Figure 4.
If we put 0s into the bits of the PORTx
register, the pull-up resistor is inactive.
     Computer Microprocessor              Dr. Mohamed A. Torad                                37
                          PIN register role in inputting data
Computer Microprocessor              Dr. Mohamed A. Torad       38
                              PIN register role in inputting data
Again, it must be noted that unless we clear the DDR bits (by putting 0s there), the data will not
be brought into the registers from the pins of Port C. To see the role of the DDRx register in
allowing the data to come into the CPU from the pins, examine Figure 3.
The pins of the AVR microcontrollers can be in four different states according to the values of
PORTx and DDRx, as shown in Figure 5.
    Computer Microprocessor              Dr. Mohamed A. Torad                              39
                                           Port A
Port A occupies a total of 8 pins (PA0–PA7). To use the pins of Port A as input or output ports,
each bit of the DDRA register must be set to the proper value. For example, the following code
will continuously send out to Port A the alternating values of 0x55 and 0xAA:
                          ;toggle all bits of PORTA
            .INCLUDE "M32DEF.INC"
            LDI R16,0xFF ;R16 = 11111111 (binary)
            OUT DDRA,R16 ;make Port A an output port
L1:         LDI R16,0x55 ;R16 = 0x55
            OUT PORTA,R16 ;put 0x55 on Port A pins
            CALL DELAY
            LDI R16,0xAA ;R16 = 0xAA
            OUT PORTA,R16 ;put 0xAA on Port A pins
            CALL DELAY
            RJMP L1
      Computer Microprocessor           Dr. Mohamed A. Torad                             40
                                       Port A as input
In order to make all the bits of Port A an input, DDRA must be cleared by writing 0 to all the bits.
In the following code, Port A is configured first as an input port by writing all 0s to register
DDRA, and then data is received from Port A and saved in a RAM location:
           .INCLUDE "M32DEF.INC"
           .EQU MYTEMP 0x100   ;save it here
           LDI R16,0x00        ;R16 = 00000000 (binary)
           OUT DDRA,R16        ;make Port A an input port (0 for In)
           NOP                 ;synchronizer delay
           IN R16,PINA         ;move from pins of Port A to R16
           STS MYTEMP,R16      ;save it in MYTEMP
Synchronizer delay
The input circuit of the AVR has a delay of 1 clock cycle. In other words, the PIN register
represents the data that was present at the pins one clock ago. In the above code, when the
instruction “IN R16,PINA” is executed, the PINA register contains the data, which was present
at the pins one clock before. That is why the NOP is put before the “IN R16,PINA” instruction.
(If the NOP is omitted, the read data is the data of the pins when the port was output.)
     Computer Microprocessor              Dr. Mohamed A. Torad                               41
                                           Port B
Port B occupies a total of 8 pins (PB0–PB7). To use the pins of Port B as input or output ports,
each bit of the DDRB register must be set to the proper value.
For example, the following code will continuously send out the alternating values of 0x55 and
0xAA to Port B:
            ;toggle all bits of PORTB
            .INCLUDE "M32DEF.INC"
            LDI R16,0xFF ;R16 = 11111111 (binary)
            OUT DDRB,R16 ;make Port B an output port (1 for Out)
L1:         LDI R16,0x55 ;R16 = 0x55
            OUT PORTB,R16 ;put 0x55 on Port B pins
            CALL DELAY
            LDI R16,0xAA ;R16 = 0xAA
            OUT PORTB,R16 ;put 0xAA on Port B pins
            CALL DELAY
            RJMP L1
      Computer Microprocessor           Dr. Mohamed A. Torad                             42
                                       Port B as input
In order to make all the bits of Port B an input, DDRB must be cleared by writing 0 to all the bits.
In the following code, Port B is configured first as an input port by writing all 0s to register
DDRB, and then data is received from Port B and saved in some RAM location:
.INCLUDE "M32DEF.INC"
.EQU MYTEMP=0x100 ;save it here
LDI R16,0x00      ;R16 = 00000000 (binary)
OUT DDRB,R16      ;make Port B an input port (0 for In)
NOP
IN R16,PINB       ;move from pins of Port B to R16
STS MYTEMP,R16    ;save it in MYTEMP
Dual role of Ports A and B
The AVR multiplexes an analog-to-digital converter through Port A to save I/O pins. The
alternate functions of the pins for Port A are shown in Table 3. Because many projects use an
ADC, we usually do not use Port A for simple I/O functions
     Computer Microprocessor              Dr. Mohamed A. Torad                               43
                                            Port C
Port C occupies a total of 8 pins (PC0–PC7). To use the pins of Port C as input or output ports,
each bit of the DDRC register must be set to the proper value. For example, the following code
will continuously send out the alternating values of 0x55 and 0xAA to Port C:
            ;toggle all bits of PORTB
            .INCLUDE "M32DEF.INC"
            LDI R16,0xFF ;R16 = 11111111 (binary)
            OUT DDRC,R16 ;make Port C an output port (1 for Out)
L1:         LDI R16,0x55 ;R16 = 0x55
            OUT PORTC,R16 ;put 0x55 on Port C pins
            CALL DELAY
            LDI R16,0xAA ;R16 = 0xAA
            OUT PORTC,R16 ;put 0xAA on Port C pins
            CALL DELAY
            RJMP L1
      Computer Microprocessor            Dr. Mohamed A. Torad                               44
                                       Port C as input
In order to make all the bits of Port C an input, DDRC must be cleared by writing 0 to all the bits.
In the following code, Port C is configured first as an input port by writing all 0s to register
DDRC, and then data is received from Port C and saved in a RAM location:
.INCLUDE "M32DEF.INC"
.EQU MYTEMP 0x100 ;save it here
LDI R16,0x00 ;R16 = 00000000 (binary)
OUT DDRC,R16 ;make Port C an input port (0 for In)
NOP
IN R16,PINC ;move from pins of Port C to R16
STS MYTEMP,R16 ;save it in MYTEMP
     Computer Microprocessor              Dr. Mohamed A. Torad                               45
                                           Port D
Port D occupies a total of 8 pins (PD0–PD7). To use the pins of Port D as input or output ports,
each bit of the DDRD register must be set to the proper value. For example, the following code
will continuously send out to Port D the alternating values of 0x55 and 0xAA:
            ;toggle all bits of PORTB
            .INCLUDE "M32DEF.INC"
            LDI R16,0xFF ;R16 = 11111111 (binary)
            OUT DDRD,R16 ;make Port D an output port (1 for Out)
L1:         LDI R16,0x55 ;R16 = 0x55
            OUT PORTD,R16 ;put 0x55 on Port D pins
            CALL DELAY
            LDI R16,0xAA ;R16 = 0xAA
            OUT PORTD,R16 ;put 0xAA on Port D pins
            CALL DELAY
            RJMP L1
      Computer Microprocessor           Dr. Mohamed A. Torad                             46
                                         Port D as input
In order to make all the bits of Port D an input, DDRD must be cleared by writing 0 to all the
bits. In the following code, Port D is configured first as an input port by writing all 0s to register
DDRD, and then data is received from Port D and saved in a RAM location:
.INCLUDE "M32DEF.INC"
.EQU MYTEMP 0x100 ;save it here
LDI R16,0x00 ;R16 = 00000000 (binary)
OUT DDRD,R16 ;make Port D an input port (0 for In)
NOP
IN R16,PIND ;move from pins of Port D to R16
STS MYTEMP,R16 ;save it in MYTEMP
     Computer Microprocessor               Dr. Mohamed A. Torad                                 47
Computer Microprocessor   Dr. Mohamed A. Torad   48
Computer Microprocessor   Dr. Mohamed A. Torad   49