Nikolay V. KIRIANAKI, Nestor O. SHPAK, Sergey Y.
YURISH
Institute of Computer Technologies,
Lviv, Ukraine
syurish@polynet.lviv.ua
Microcontroller Cores for
Metering Applications
Summary: The Paper describes research in the field of system design for
metering Application Specific Integrated Processor (ASIP). The proposed design
approach consists in change of a popular microcontroller architecture and cancel
the never used instructions for metering applications. Such modification
increases performance with reduced silicon area and permits to include sensors
and other important hardware.
The software level optimisation for a low-power design of microcontroller
core is also considered.
1. Introduction
The measuring engineering was and remains one of the main applications of embedded
microprocessors and microcontrollers. Aspiration to create the universal microcontroller for such
applications has brought in appearance of the MSP430 microcontroller family (Texas Instruments,
USA) for the metering applications [1]. It uses a "von-Neumann" architecture and RISC structure
(27 instructions). However, the well-known efficient family of microcontrollers MCS-51 (Intel),
which have appeared still in beginning of 80's, continues actively to be used for the different metering
applications. It is explained by the following reasons:
1) Low cost;
2) Availability of inexpensive debugging tools;
3) Availability a lot of well-trained engineers.
Besides that, as a standard library cell the 8051 compatible microcontrollers exist in many CAD-
tools for VLSI and ASIC. Less than for 20 year period dozens modifications of the classical
microcontroller 8051 with Harvard architecture [2] have appeared. Due to achievement of the
modern microelectronics, the maximum clock frequency of such microcontrollers is increased up to
50 MHz [3]. So far as the microcontrollers of MCS-51 family are general purpose
microcontrollers, their instruction set has the essential redundancy for metering applications. The set
of standard peripheral is also very frequently used not completely. The underlying idea of proposed
design approach is to short of the instruction set, to reduce the processor area and improve the
functional performances by including other system parts in the same chip.
2. Optimization of the instruction set
With aim to reduce the silicon area the static analysis was used [4]. It has revealed the possibility
to eliminate some instructions because they were not used at all in the metering applications. The
analysis was conducted on the software example for the developed specialised multifunctional
tachometer. It's software structure is shown in Figure 1. It includes: the test software; the software
for realisation of a method of measurement; the programs for indication (analog, digital, sound and
LEDs); arithmetic procedures. The software is constructed by the hierarchical principle and written
in Assembler. The program-oriented method for frequency measurements [5, 6] was used as method
of measurement. This method permits to receive high metrological performances at minimum
possible hardware.
Figure 1. Software structure
With aim to achieve the reasonable compromise between speed and code size the algorithm of
multiplication by the low-order positions ahead with the shift to right of partial product term for
multiplication of integer operands (32 x 24 bits) was used. For realisation of algorithm of division
for integer operands (64 / 32 bits) the method of division with the shift to left and restore of the
remainder was used. As algorithm of subtraction for 16 bits integer operands the method of
subtraction with transfer the subtrahend as negative number in the two's component with subsequent
addition was used. The main performances of the arithmetic subroutines are indicated in Table 1.
Table 1. Main performances of the arithmetic subroutines
Subroutine Purpose Word Length (bits) Size (bytes)
Operands Result
SUB Subtraction 16 16 14
Multiplicand - 32
M32X24 Multiplication Multiplier - 24 64 45
Dividend - 64 Quotient: integer and
DV6432 Division Divisor - 32 fractional parts - 24 140
Remainder - 24
BC-BCD BC to BCD conversion Binary code - 16 4 binary-decimal
digits 77
BCD-BC BCD to BC conversion BCD code - 4 16 binary digits 101
The total size of the software of the measuring instrument was ~1.1 Kbytes. The total number
of instructions is 722. Table 2 shows the percentage of used instructions in the measuring instrument
program after the static analysis was performed. Instructions are organised as groups. The MOV
group, for instance, includes all addressing modes of the MOVE instruction available in the 8051
microcontroller. As it can be seen, 9 instructions are responsible for more than 80 % of the code. The
total groups of instructions are 27.
Table 2. Groups of often used instruction for the metering applications
№ Type Number of instruction %
1. MOV 249 34.49
2. LCALL (ACALL) 92 12.74
3. MOVX 58 8.03
4. NOP 39 5.4
5. INC 37 5.13
6. AJMP (LJMP) 31 4.29
7. DJNZ 30 4.16
8. ANL 24 6.65
80.89 %
9. RET 24 6.65
10. ORL 20 2.77
11. CLR 19 2.63
12. XCH 18 2.49
13. JB 16 2.22
14. JZ 11 1.52
15. DEC 9 1.25
16. CPL 7 0.97
17. ADD 7 0.97
18. JNZ 6 0.83
19. ADDC 6 0.83
20. RL 4 0.55
21. RR 4 0.55
22. JNC 4 0.55
23. DA 2 0.28
24. RETI 2 0.28
25. RLC 1 0.14
26. RRC 1 0.14
27. XRL 1 0.14
TOTAL: 27 groups 100 %
In order to reduce the chip area used by the control part of the processor, the information that
was really needed at the first step was the number of never used instructions during the static
analysis. If an instruction set was not used by the processor, there would be no need to include it in
the processor hardware. These instructions can be eliminated from the processor control area, saving
space for other important hardware. In this case we expect an area saving almost a third in
comparison with the original processor. The saved area could be used to include some 8253
timer/counters, their addressing logic and sensors with frequency output (for example [7]). The 7
types of instructions from 9 possible, which take more than 80 % of used code are the same as in
the induction vector control software [4] after static analysis was performed.
3. Software level power optimization
As a trend continues the increase of operating frequencies in one-chip microcontrollers up to 50
MHz, power dissipation becomes a major concern in the metering applications. In microprocessor,
microcontroller and DSP based system, it is software that directs much of the hardware activity.
Consequently, the software can have a substantial impact on power dissipation.
The first main step toward software optimisation for a low-power design is to be able to estimate
the power dissipation for a piece of code. The instruction-level power estimation defines an empirical
method for characterising the power dissipation of instruction sequences, and for using these results
to estimate power (or energy) dissipation of a program.
The two main requirements for instruction-level power analyses are to characterise the power
dissipation associated with each individual instruction (base energy cost) and influence of the
instruction sequence on the energy cost (inter-instruction effects because of circuit states). The most
straightforward and precise method is to directly measure the current draw of the processor in the
target system as it executes various instruction sequences.
The base cost, which is independent of the prior state of the processor, can be determined by
putting several instances of that instruction into an infinite loop. The average power supply current is
measured while the loop executes. The loop should be made as long as possible so as to minimise an
estimation error due to the loop overhead (the jump statement at the end of the loop). An ammeter is
used to measure the current draw of the processor in the system. Its relative error must not be more
than 0.1 %. In order to minimise the estimation error for microcontrollers it is expedient to take
advantage of the more precise method of measurement for base energy costs.
For more accurate measurements of instruction base energy cost, the whole volume of the
internal ROM is filled in from zero to the greatest possible address by the same code, corresponding
to the chosen instruction. It will be automatically reset in zero at the execution of such identical
instruction sequence at attainment of the maximal address by the Program Counter and the execution
of the instruction sequence, recorded from zero address in ROM, begins all over again. It permits the
complete removed of the estimation error due to the loop overhead (the jump statement at the end of
the loop), and to measure not average but real power supply current for the instruction.
In Table 3 the fragments of the instruction sequence for traditional (I) and offered (II) methods of
measurement for the base energy cost for the instruction MOV A, R1 are adduced. The given
approach is fair at the measurement of the base energy cost for the data transfer instructions,
arithmetic and logic instructions and unconditional jump. However, while using any of the methods
of measurements mentioned above, it is necessary to take into account the features, inherent to the
execution of conditional jumps. During the execution of such instructions depending on the fact if
the given condition is taken or not, the base energy costs for the same command are various. Let us
consider some examples of the estimation of the base energy costs for the two-bytes instructions: JC
(jump if carry bit C is "0") and JNC (jump if bit C is not "0"). As a rule, after the initialisation of a
processor by the signal RESET, carry bit C is set in "0". In the first example all instructions JNC are
executed consistently, since the C = 0 and jump on the <label> does not happen. In this case, the
instruction sequence for measurements of the base energy cost is similar, as well as for the
considered instruction MOV A, R1.
Example 1:
0000 JNC < label >
0002 JNC < label >
...
4094 JNC < label >
Table 3. Code mapping in ROM for traditional (I) and proposed (II) methods
Address Code Instruction
Method 1
0000 E9 M1: MOV A, R1
0001 E9 MOV A, R1
... ... ...
4094 01 AJMP M1
4095 00
Method II
0000 E9 MOV A, R1
0001 E9 MOV A, R1
... ... ...
4095 E9 MOV A, R1
Let us investigate the situation, when the jump is taken. For this purpose, after processor
initialisation it is necessary at first to execute the instruction, which sets the C in "1". Then it is
enough to write the two instructions JNC with the crossed labels of a control transfer for organising,
the continuous cycle:
Example 2:
0000 CPL C; C = 1
0001 M0: JNC M1; because of C set in "0", jump to M1
0003 M1: JNC M0; because of C set in "0", jump to M0
As it is visible from Example 2 for an accurate estimation of the base energy cost for the JNC
instruction three instructions are sufficient. However, between these instructions there can be any
number of commands, which do not influence the results of the measurement, as the logic of this
program fragment makes these instructions non-executable ones.
Let us now consider the instruction JC (jump if C set in "0"). In the first case, if the jump is
taken, the test partial program is similar to Example 2. The exception is for the absence of the
condition initialisation command, since the processor initialisation C has already been set in "0":
Example 3:
0000 M0: JC M1
0002 M1: JC M0
Somewhat an other situation arises in the case, when the condition is not executed:
Example 4:
0000 CPL C
0001 JC < label >
0003 JC < label >
... ...
4094 JC < label >
As the condition C = 0 is not taken, the jump to the <label> does not happen and consistently
executed the next instruction JC and so on. There is an estimation error due to the loop overhead
(CPL C instruction at the beginning of the loop) in this case. However, this error is less than in the
case of usage of the jump statement at the end of the loop.
The base costs for each instruction are not always adequate for a precise software power
estimation. Additional instruction sequences are needed in order to take into account the effect of a
prior processor state on the power dissipation of each instruction.
The circuit state cost associated with each possible pair of the consecutive instruction is
characterised for instruction level power analysis by measuring the power supply current while the
execution of an alternating sequence of the two instructions in an infinite loop is being executed.
However, the circuit state effect is not symmetrical. The known measurement technique for the
circuit state overhead does not permit the separation of the cost of an A->B transition from a B-> A
transition, since the current measurement being an average over many execution cycles [8]. In this
case for a more accurate measurement it is expedient to take advantage of the proposed indirect
method of measurement.
The idea of the method is contained in the following. The instruction sequence consisting of the
two tested commands is written to the ROM. With the help of an additional T flip-flop, entered the
microcontroller circuit of a step-by-step mode, the division of the frequency of the ALE signal
(identifier of the machine cycle) into two is realised. It allows the execution of two instructions
during one "step". The voltage Umax on the resistor R0 during the execution of the two tested
instructions is measured by a Peak Detector. The maximal power supply current is determined as
Ic= Umax/ R0. If the estimated base energy cost for these instructions is Iest and being measured by the
method mentioned above is Im, then circuit state overhead will be given by
I oh
I m I est (1)
This method allows the measurement of the A->B as well as the B->A transition. Moreover, while
executing only one instruction per step, as it is stipulated for the standard usage of the step-by-step
mode, and having used the Peak Detector, it is also possible to measure the base energy cost for a
separate instruction.
The total energy for the partial program is
n k
(2)
I T
I bi
I ohj ,
i 1 j 1
where Ibi is the base energy cost for the i-th instruction; Iohj is the circuit state overhead for j-th pair
of instructions; n is the number of instructions in the tested partial program; k is the number of the
tested pairs. The number k is determined by the functioning logic of the tested partial program.
The evaluation stage of the worst case power consumption situations in the microcontroller is the
next important step for future optimisation. It is necessary to verify if a processor meets power
constraints. The known technique of this stage includes the following stages:
· construct a power consumption graph based on an instruction level power model [9];
· find the cycle in the graph with a maximum mean power consumption.
The oriented graph G(V, A) is used as a power consumption graph , where V = {vi} is the vertex
set; A = {gij} is the set of arches. There are the following model restrictions: 1) The graph has one
input v0 and not more than one output vk; 2) Not more than two arches emanate from each vertex;
3) The number of the arches entering the vertex is not limited. The graph corresponds to the
algorithm's decision tree; nodes - to the algorithm's instruction. In the traditional graph each node is
characterised by the additive elementary index di, which is connected with the base energy cost
only. However, with the aim of a more accurate estimation for the total energy, it is necessary to take
into account the estimated above the circuit state overheads. For this purpose, it is offered to modify
the graph, having connected each of the arches with the additive elementary index gj, which
characterises the circuit state overheads. The entered indexes from the set D = {di, gj} on the graph
(Figure 2). After that, the matrix will be built and Karp's Algorithm used for determination of the
maximum average weight for a critical cycle, according to the technique [6].
Figure 2. Power consumption graph
The graph G(V, A, D) can be used for static research of a power consumption for the various parts
of the algorithm. However, the execution of the partial program is determined by the choice of a
certain way on the graph. This choice is caused by the realisation of the control transfer in the logic
blocks, which are connected with the casual process of the ingress of various vectors of the data Y(0)
into a program. It results in a casual choice of ways on the graph. Thus, the tested partial program is
the complex system with a casual structure. From the point of view of the power optimisation, the
task is complicated by the fact that it is known beforehand how long and frequently the
microcontroller will execute this worst cycle. Consequently, to achieve the real power saving, the
partial program should be optimised the whole one. Let us consider the possible ways of power or
energy minimisation for microcontroller. We assume, that 1) for the solution of the task the adequate
algorithm and structure of the data is used; 2) the assembly language was used for a source-code
programming. It lets the opportunities for a programmer completely operate hardware, execution
time of a program and, as will be shown further, the power consumption.
There are usually many possible code sequences that accomplish the same task. It should be
possible to select a code sequence that minimises power and energy. It seems obvious, but to accept
the optimum decision is not so simple in the majority of real situations, on the face of it. One such
example is the choice between the two instructions adduced below. Each of them can be used for
data moving from the accumulator A to the register R0 (certainly, with the various side effects):
MOV R0, A or XCH A, R0. In the microcontrollers of the MCS-51 family (Intel) each of these
instructions is executed for one cycle. Another example is the choice of instruction for accumulator
cleaning: XRL A, A; CLR A or MOV A, #0h. The multiplication by degree of two can be executed as
through the arithmetic instruction for multiplication MUL AB as well as through the left shift
instruction. The condition of choice can be written as
K d , a min
(3)
i i 1
Instruction ordering for low power attempts to minimise the energy associated with the circuit state
effect without altering program behaviour.
The given approach for the software level power optimisation and some software "hints",
described in [10, 11], permit to reduce power consumption up to 10 ¸ 35 %.
4. Conclusions
The proposed approach permits to lower the time-to-market and considerably to simplify the
design process for Application Specific Integrated Processor. The saved chip area can be used to
integrate sensors. Large chip area enables manufacturing of sensors in the same chip create the basis
for novel powerful microsystems because of easy sensor’s interface, where the signal processing is
mostly digital.
References
1. MSP 430 Family Architecture Guide and Module Library, Data Book, Texas Instruments, USA,
1996
2. Neroda V.Y., Torbinskiy V.E., Shlykov E.L., One-chip Microcontrollers MCS-51, Digital
Components, Moscow, 1995 (In Russian)
3. Semiconductor Short Form, Temic, 1995
4. Krentz M., Carro L., Suzim A., System Integration with Dedicated Processor for Industrial
Applications, In Proceedings of 3rd IFAC Symposium on Intelligent Components and Instruments
for Control Applications, Annecy, France, June 9-11, 1997, pp. 333-337
5. Kirianaki N.V., Yurish S.Y. A New Method for Precise Frequency-Time and Phase Measuring
Conversion for the Information-Measuring Systems, Based on Microprocessors, In the book on
Methods and Technics of Signal Processing in Physical Measurements, edited by R.E. Œliwa,
Rzeszów, pp.117-133, 1996 (In Russian)
6. Yurish S. Y. Program-oriented Methods and Measuring Instruments for Frequency-Time
Parameters of Electric Signals (Based on Coincidence Measuring Method), Ph. D. Thesis, State
University Lviv Polytechnic, 1997
7. Deynega V. P., Kirianaki N. V., Yurish S. Y. Microcontroller Compatible Smart Sensor of Rotation
Parameters with Frequency Output, In Proceedings of 21st European Solid State Circuit
Conference (ESSCIRC'95), Lille, France, 19-21 September, 1995, pp. 346-349
8. Roy K., Johnson M. C. Software Design for Low Power, In book on Low Power Design in Deep
Submicron Electronics, Kluwer Academic Publishers, Dordrecht / Boston / London, pp. 433-460,
1997
9. Tiwari V., et al. Power Analysis of Embedded Software: A First Step Towards Software Power
Minimisation, IEEE Transaction on VLSI System, Vol. 2, No 4, December 1994, pp.437-445
10.Koval V.A., Shevchenko O.V., Shpak N.O., Yurish S.Y. Low Power Design Technique for
Embedded Microcontroller Cores, In Proceedings of the 4th International Workshop on Mixed
Design of Integrated Circuits and System (MIXDES'97), Poznan, Poland, 12-14 June, 1997, pp.
619-624
11.Kirianaki N.V., Yurish S.Y. Low-Power Embedded Microcontroller Design on Instruction Level,
In Proceedings of Conference on Design and Diagnostic of Electronic Circuits and Systems
(DDECS'97), Beskydy Mountains, Czech Republic, 12-16 May, 1997, pp. 219-223