0% found this document useful (0 votes)
66 views7 pages

Adapting An Ip Mc6805 Core For Multiprocessing and Multitasking

This document describes hardware modifications made to an MC6805 microcontroller core to allow it to be used as a multitasking device in a multiprocessor system on a chip (MPoC). Specifically, hardware was added to implement a task scheduler and communication channels between processors. The goal was to take advantage of using multiple specialized processors instead of a single general-purpose processor when the tasks are known beforehand. This approach aims to reduce latency and overhead compared to a traditional single-processor system by assigning tasks to the processor best suited to their requirements and allowing high-bandwidth communication between tightly-coupled tasks.

Uploaded by

felixsafar3243
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views7 pages

Adapting An Ip Mc6805 Core For Multiprocessing and Multitasking

This document describes hardware modifications made to an MC6805 microcontroller core to allow it to be used as a multitasking device in a multiprocessor system on a chip (MPoC). Specifically, hardware was added to implement a task scheduler and communication channels between processors. The goal was to take advantage of using multiple specialized processors instead of a single general-purpose processor when the tasks are known beforehand. This approach aims to reduce latency and overhead compared to a traditional single-processor system by assigning tasks to the processor best suited to their requirements and allowing high-bandwidth communication between tightly-coupled tasks.

Uploaded by

felixsafar3243
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ADAPTING AN IP MC6805 CORE FOR MULTIPROCESSING AND

MULTITASKING
Guillermo A. JAQUENOD

Horacio A. VILLAGARCA

Fac. Ingeniera, UNCPBA, ARGENTINA.


chipi@netverk.com.ar

Oscar N. BRIA

CICPBA Fac. Informtica, UNLP, ARGENTINA.


hvw@info.unlp.edu.ar

CONICET Fac. Informtica, UNLP, ARGENTINA.


onb@info.unlp.edu.ar

Marisa R. DE GIUSTI

CICPBA Fac. Informtica, UNLP, ARGENTINA.


marisadg@volta.ing.unlp.edu.ar

ABSTRACT
The availability of high-density field configurable devices provides the opportunity for designing
highly integrated solutions (SOPC: System On a Programmable Chip).
Among the SOPC solutions, a case is the integration of an embedded single processor equipped
with a multitasking operating system. As an alternative to a single processor the embedding of various
processors on a chip, even heterogeneous and with multitasking capacity, may be considered.
A distinctive characteristic of a SOPC device is that the tasks to be performed are well known
before the design starts. That feature is opposed to the traditional multiprocessing and multitasking
systems in which general purpose applications are adopted during design. The benefit of this
knowledge is that hardware as well as software can be adapted to fit the applications requirements.
This paper presents the hardware modifications performed on an microcontroller embedded core, to
allow its inclusion as a multitasking device in a multiprocessor on a chip, through the addition of a
hardware task manager (scheduler) and communication channels among processors.
RESUMEN
La disponibilidad de dispositivos de Lgica Programable de alta densidad de integracin permite
buscar soluciones integradas en un dispositivo SOPC (System On a Programmable Chip).
Un tema de creciente inters son los procesadores empotrados, siendo usual un nico procesador y
un sistema operativo con capacidad de multitarea.
Sin embargo, debe considerarse como alternativa insertar varios procesadores, no necesariamente
idnticos, que pueden a su vez atender varias tareas. En un SOPC, como diferencia fundamental con
los casos tradicionales de multiprocesamiento y multitarea, las tareas a realizar son conocidas antes de
comenzar el diseo, por lo tanto hardware como software se pueden configurar a medida de la
aplicacin, combinando la velocidad propia del primero, con la versatilidad del segundo.
Este artculo describe las modificaciones de hardware realizadas al ncleo IP (Intellectual Property)
de un procesador, de modo de permitir la inclusin de un administrador de tareas por hardware y de
canales de comunicacin interprocesadores.

ADAPTING AN IP MC6805 CORE FOR MULTIPROCESSING AND


MULTITASKING
Guillermo A. JAQUENOD

Fac. Ingeniera, UNCPBA, ARGENTINA.


chipi@netverk.com.ar

Oscar N. BRIA

CONICET Fac. Informtica, UNLP, ARGENTINA.


onb@info.unlp.edu.ar

ABSTRACT
The availability of high-density field configurable
devices provides the opportunity for designing
highly integrated solutions (SOPC: System On a
Programmable Chip).
Among the SOPC solutions, a case is the
integration of an embedded single processor
equipped with a multitasking operating system. As
an alternative to a single processor the embedding of
various processors on a chip, even heterogeneous
and with multitasking capacity, may be considered.
A distinctive characteristic of a SOPC device is
that the tasks to be performed are well known before
the design starts. That feature is opposed to the
traditional multiprocessing and multitasking systems
in which general purpose applications are adopted
during design. The benefit of this knowledge is that
hardware as well as software can be adapted to fit the
applications requirements.
This paper presents the hardware modifications
performed on an microcontroller embedded core, to
allow its inclusion as a multitasking device in a
multiprocessor on a chip, through the addition of a
hardware
task
manager
(scheduler)
and
communication channels among processors.
1. INTRODUCTION
The design of a computer processing system
[8][10][12] strongly depends upon the exact
knowledge of the characteristics of the problems to
solve:
When the tasks are unknown and diverse, the
solution is to use a general-purpose processor, e.g.,
a personal computer.

Horacio A. VILLAGARCA

CICPBA Fac. Informtica, UNLP, ARGENTINA.


hvw@info.unlp.edu.ar

Marisa R. DE GIUSTI

CICPBA Fac. Informtica, UNLP, ARGENTINA.


marisadg@volta.ing.unlp.edu.ar

When the system will be used to compute specific


but yet undefined tasks (e.g., image processing), it
is worth choosing specialized processors as DSPs
with a large amount of memory or particular I/O
features.
When the application is totally known before
design starts, the pertinent approach is to use the
best adapted hardware resources, and in such a case
even to use an ASIC (Application Specific
Integrated Circuit).
The System On a Chip (SOC) solution is the
answer to the actual demand for the integration of
full systems in small spaces, with a short time to
market effort. The design methodologies based on
SOC can take advantage of libraries of IP blocks that
have been already designed and verified. Actually,
the reusability of IP blocks allows the design of new
SOCs attending to the space and time demands
[6][7][9][11].
Moreover, in the field of programmable logic
devices, the trend is moving towards SOPC (System
On a Programmable Chip) alternatives. Besides,
there is a growing interest in the literature in
presenting IP blocks for specific functions [4] [15].
The leading companies are already offering some
commercial products including a single processor, a
real time operating system (RTOS) with multitasking
capabilities, and a set of programmable resources:
ATMEL is offering an 8-bit RISC processor
(AVR), with suitable amount of RAM and ROM
memory, and a 10K to 40K gates in a
programmable block.
TRISCEND is offering a 32-bit ARM7DMI, with
internal cache memory, interfaces to external
memory, peripheral devices (timers, UARTs,
interrupts), and a programmable matrix with an
equivalent complexity of 40K gates.

ALTERA is offering a softcore alternative called


NIOS [13], with configurable data bus width. A
hardcore alternative, belonging to the Excalibur
family, offers three ARM922T models and three
MIPS32 4Kc models [14].
XILINX has announced a 32-bit softcore
alternative called MicroBLAZE, which includes
UART, timer, parallel I/O, interrupt controller,
multimaster arbitrator, FLASH memory interface,
and different RAM types.
All the above solutions are based on a unique
powerful processor, their own peripheral devices,
and interconnection resources with a programmable
logic array.
As an alternative to the above-proposed singleprocessor solutions, it is possible to include several
processors [1][2] on a chip. Moreover, every
processor can be different from each other and
devoted to specific tasks, in an structure called
MPOC (Multi-Processors On a Chip).
The key difference of this approach is related to
the knowledge of the tasks to be performed:
In traditional multiprocessing / multitasking
designs, the features of the tasks are knows a
posteriori because they are oriented to generalpurpose applications.
Unlikely, in the MPOC design, the tasks are known
a priori, then the hardware as well as the software
can be tuned to meet the requirements of the
specific applications.
This paper describes the hardware modifications
performed into the IP core of an 8-bit MC6805
processor, to include a hardware multitask scheduler,
as well as interprocessor communication channels.
2. THE MPOC PROPOSAL
The MPOC (MultiProcessors On a Chip) proposal
is oriented to low cost applications [5], where a
structured methodology is suggested for the building
of multitasking / multiprocessing applications. In this
proposal tasks are assigned to processors according
to the type of processes and the inter-processes
communication rate. As a consequence, the use of
multiple (no necessarily identical) processors can
reduce the latencies and overheads of a
monoprocessor RTOS:
Tasks attending the same type of processes can
reside on the same processor. With the same
criteria, different types of tasks can reside on

different processors; choosing for each task the best


suited processor.
Tasks with a large rate of information interchange
can communicate between using high bandwidth
resources (e.g., shared memory areas or FIFOs).
Meanwhile, lightly coupled tasks can use simpler
channels (e.g., serial channels, such as TLINKs
[16]).
To operate in a MPOC environment, a processor
should have the following characteristics:
When attending a predefined number of known
tasks, the hardware & software overload for task
management and context switching has to be
minimum.
When interacting with other processors, the
hardware required for the communication facilities
has to be as reduced as possible.
Based on those requirements, an MPOC can be
seen as a hierarchical structure composed by
processors, tasks, channels, and I/O ports.
Figure 1 shows a schematic MPOC, as it has been
presented in [5]. In that system several processors
attend several tasks (some of them just one and other
more than one), and communicate among them using
point-to-point channels or some broadcasting
facility. Many of the tasks can communicate with
external world using I/O lines, while other ones are
just internal processing tasks.
MPOC
I/O

I/O

Task
Processor
Task

I/O

Task

Processor

Processor
Task

Task

Broadcast
Channel

Processor
Task

Peer-to Peer
channel

I/O

Figure 1

As an example, consider the design of a car


computer. In this case there are contextually different
tasks:
Related to the engine: combustion and ignition
control, temperature control, oil pressure control,
etc.
Related to the structure: adaptive damping control,
airbags, brakes (ABS) and traction control, etc.

Related to the comfort, navigation or others: airconditioned, navigation computer, audio devices,
centralized lights control, anti-burglar alarms, etc.
A quick analysis shows the following:
The tasks related to the engine are strongly related
among them, and the relation between these tasks
and those of the general type is almost nonexistent.
The tasks related to the engine require intensive
numerical computation, that could be solved by
DSPs.
The tasks related to the structure conforms also a
compact block sharing common sensors and
actuators. In this case common solutions are based
on fuzzy logic.
The general type tasks include a high amount of
I/O bit-level operations, resources for multiple
timers, and communication channels to peripheral
devices. A general-purpose processor could be used
in this case,
3. ADAPTING AN IP MC6805 CORE FOR
MULTITASKING
The MC6805 is widely used in low cost applications.
Their characteristics can be found in the technical
manual [17], nevertheless we present its main
aspects.
It is a fixed-point processor, with an 8-bit data
bus, and Von Neumann architecture. The CPU has a
few internal registers: a variable up to 16 bitsprogram counter (PC), an 8-bit accumulator (A), an
8-bit index register (X), a 5-bit stack pointer (SP),
and a 5-bit status register (CCR). Variables,
instructions, and I/O share the 64 Kbytes address
space, and can be referenced using ten different
addressing modes.
The design of a single task MC6805 processor
using Alteras FLEX10K devices has been presented
in [3]. This design uses a very reduced amount of
resources (about 500 logic elements), and has be
taken as the starting point for this work.
For multi-task support it is necessary to perform a
fast context switching, saving all the variable values
belonging to the leaving task, which will be used
during the next instance of this task. That implies the
saving of two resources:
The private data (variables stored in RAM).
The value of the processor registers.
The protection of the private data can take
advantage of the fact that the size of code and data

used by each task is known a priori, before the


synthesis of the processor core within the
programmable device. Due to that characteristic, it is
possible to use one common memory for all the
tasks, assigning slices of this memory to each task,
pointed by constant offsets.
Figure 2 shows the necessary changes to perform
over the MC6805 address computation unit
presented in [3]. The resources added are an adder
and a constant offsets table.
Offsets
table

Resources added for multitask

from
data bus

PCH

PCL

8
8

TMP1
TMP2

SBH

SBL

SP

vec
KH

address
bus

SAL

SUM
16

Figure 2

A later elaboration could be to differentiate the


access to either RAM or ROM, generating offsets
over different memory areas to optimize memory
usage. That distinction should be essential when
using external RAM/FLASH memories. Besides, this
multiple offset scheme can also be used for the
definition of shared areas of memory.
For up to 16 tasks of variable code length, the
generation of the offset table will use as much logic
elements as the wide of the address bus plus those
necessary for the adder. As an example, given 8
tasks, with less than 12-bit address buses each, the
generation of the final 15 bit address bus would
require only 30 additional logic elements.
The saving of the register values can be
performed in parallel or sequential form. In the
parallel case, each register of the original single-task
processor is replaced by circular buffer of registers,
one for each task.
Figure 3 shows the hypothetical case of a
processor attending 7 tasks, where it can be seen that
the active register behavior is independent of which
is the active register (selected by the multitask
control stage).
The circular nature of the registers buffer enables
the switching from one task to the next one in a
single clock cycle, with minimum time overhead. In

multitask scheduler control


Accum4

Accum5

Accum6
Accum0

Accum3

Accum2

Accum1
Register control
(processor side)

Figure 3

the worst case, if task 0 must be switched to task 6,


the context switching latency could be 6 clock
cycles.
To save the registers A, X, CCR, SP, and a 12 bit
PC, it is required to add 38 new logic elements for
each additional task.
In the sequential case, the register saving process
can take advantage of the fact that the MC6805
automatically stores the registers on the stack before
serving an interrupt request, and it reads they back
when returning from the interrupt routine. The only
register not preserved in that automatic saving is the
SP, which can be stored using a circular buffer.
Figure 4 shows the modifications made on the
original MC6805 control state machine, where only
one new state (s30-SCHED) was added to the 30
2
s12
WSP

s11
WSP

s18
RSI

s19
RSI
s17
RSI

s14
VEC

s10
WSP
EXTINT

s30
SCHED

s13
WSP

s20
RST

s16
INS

s9
WSP
s21
RTN

RESET

s15
VEC

s6
IX1

s8
JMP
s0
FOP

s27
EXT

s26
FTM

s5
IX0

s7
IX2

s3
WSP

s1
FTM

s2
FTM

s24
IDL
s23
EXT

s4
WSP

s22
IX2
s29
IX0

s28
IX1

s25
BRX

Figure 4

previous states (s0 to s29) to make possible the


context switch.
The scheduler begins the switch cycle requesting
an interrupt (marked as 1), which forces the registers

stacking (state sequence s9, s10, s11, s12, s13).


When the state machine reaches state s13 (marked as
2) during a context switching, it moves to state s30
(marked as 3) instead of s14, where an interrupt
vector is fetch. The first action in s30 is to save the
old SP in the circular buffer, loading it with the SP
value of the incoming task. Concurrently, offsets are
changed to point to the memory areas of the
activated task. In the next cycle the state machine
returns from interrupt (marked as 4) already in the
new context (state sequence s16, s17, s18, s19, s20,
s21, s8). The modification requires adding minimum
hardware: 5 logic elements for each new task (for
saving the stack pointer), and 5 logic elements for
adding state s30. In this case, minimum switching
latency is 13 clock cycles.
4. COMMUNICATION CHANNELS
From the MPOC point of view, a communication
channel is a hardware object describing a link among
processors. From the point of view of a processor, a
channel is seen as a peripheral, that can be a serial
transceptor, a parallel port, or any more complex
element, such as a shared memory area, or a queuing
buffer.
A transaction message is sent by one task and
received by another, and can be used for
synchronization. The transaction can be either
originated by the transmitter (writing new output
data) and closed when the receptor reads it, or started
by the receptor (requesting new input data) and
closed when the transmitter send it. In both cases, the
agent which triggers the transaction remains halted
until the transaction is closed, therefore it is
reasonable to include resources to take advantage of
that time to process other tasks.
Models for channel ports are presented in [5]
including synchronization signals (rdy). As an
example, a parallel port is the most simple hardware
scheme for implementing task communications
(Figure 5). The transmitter uses a register for storing
the data and a simple state machine for
synchronization. The receptor is implemented using
also another small state machine.
When the transmitter writes new data (ld active)
the rdy_tx line becomes inactive, meaning that it is
waiting, and the availability of data in the channel
is indicated by new active. When the receptor read
the data (rd active), the signal rdy_rx becomes

inactive, remaining in that state until the transmitter


writes new data. At the same time, rd activates ack,
which reinitializes the transmitter signals rdy_tx
and new.

ack
ne w

rd
rd y_rx

receiver
data

transmitter
data

ld
rd y_tx

6. CONCLUSIONS

F igu re 5
The other situation is when the receptor is willing
to read new data when it is not present (new
inactive). In that situation the signal rdy_rx
remains inactive until the transmitter writes new
data.
5. MULTITASKING SCHEDULER

rdy_0
rdy_1
...
rdy_n

ROUND ROBIN
Arbiter

There is not any predefined scheduler, because it


architecture depends on several items: the tasks
priority, the interrupts management, the existence or
not of a front-end interrupt processor, and other
conditions.

int_0
int_1
...
int_m

Slice
Timer

Figure 6

The simplest case is a multitasking system with


equal priority tasks, using a round-robin arbitration
scheme (Figure 6). In this case, the schedule of a
new task may have several causes:
The active task has triggered a communication
transaction, then passing to idle until that
transaction is closed.
The slice time available for the active task is
already exhausted and another task is awaiting.
An external interrupt is demanding attention, and
the scheduler is assigning the CPU to the
corresponding task.
In any case, the scheduler interrupts the
processor, and when in the SCHED state, it switches
the offsets and decides the time assigned to the new
task.

State machine
Interrupt
To the
offsets
table

TimeSlice
Duration

When an application in entirely known a priori


before the beginning of the design cycle, then
hardware and software can be optimized according
to the requirements. It has be shown that an IP core
for a conventional processor could be easily
extended to operate in a multiprocessing and
multitasking environment., just adding a few
hardware resources. That solution and the short
design cycle for programmable logic devices, allow a
minimal development time, an easy debugging, and
short time to market.
Supposing an 8-tasks multiprocessor where the
addressing bus width for each task is lower than 12bit, then 30 logic elements are needed for the
management of private memory areas; 45 logic
elements are needed for the modification of the state
machine and for the stack pointer buffer; and 50
logic elements for the round-robin scheduler. That
represents a 25% increase in hardware when
compared to the single task processor.
That increase can be reduced:
For this microcontroller core, if the private memory
areas have the same length and equal to 2N, there is
not need for the 30 logic elements required for the
management of the low order address bits of that
memory areas. In such a case, the 3 upper lines of
the final address bus come directly from the arbiter,
and the overhead is reduced to a 20%. Also, if the
time slice is identical for each task, the hardware
needed by the arbiter is also decreased.
For more complex and powerful processors, with

larger address and data buses, and requiring more


hardware resources, the logic complexity for a
multitask operation is almost the same than that for
the core described here. As a consequence the
percentage of hardware assigned for that
functionality is smaller.
7. REFERENCES
[1]. Dmer, R. et al, Specification and Design of
Embedded Systems, it+ti Magazine N 3,
Oldenbourg Verlag, Munich, Germany, June 1998.
[2]. Janka R.S., Wills L.M., A Novel Codesign
Methodology for Real-Time Embedded COTS
Multiprocessor-Based Signal Processing Systems,
Proc. of the 8th. Intl. Workshop on Hardware/
Software Codesign. San Diego, USA, May 2000,
pp.157-161.
[3]. Jaquenod G., "Diseo de un microcontrolador
MC6805 usando lgica programable FLEX de
ALTERA". VI Workshop IBERCHIP, Sao Paulo,
Brasil, Mar 2000, pp. 130-139.
[4]. Jaquenod G., De Giusti M., "Diseo de
microcontroladores
empotrados
mediante
procesamiento serial: anlisis usando FLEX10K para
sintetizar un microcontrolador tipo COP8Sax. VII
Workshop IBERCHIP, Montevideo, Uruguay, Mar
2001. Proc. on CDROM.
[5]. Jaquenod G., Villagarca H., De Giusti M.,
Towards a Field Configurable non-homogeneous
Multiprocessors Architecture. SCI 2001, Orlando,
Florida, USA, Jul 2001. Proc. Vol XIV pp. 248-253.
[6]. Keating M., Bricaud P., Reuse Methodology
Manual For System-On-A-Chip Designs, Second
Edition, Kluwer Academic Publishers 1999, USA,
ISBN 0-7923-8558-6.

[7]. Meerwein M. et al, Linking Codesign and


Reuse in Embedded Systems Design, Proc. of the
8th. Intl. Workshop on Hardware/Software Codesign.
San Diego, USA, May 2000, pp. 93-97.
[8]. Pollard
L.H.,
Computer
Design
and
Architecture, Prentice Hall 1990, USA, ISBN 0-13167255-X.
[9]. Seepold R, Martinez Madrid N. (Editores),
Virtual Components Design and Reuse, Kluwer
Academic Publishers 2000, USA, ISBN 0-79237261-1.
[10]. Smith M., Application Specific Integrated
Circuits, Addison Wesley 1997, USA, ISBN 0-20150022-1.
[11]. Villagarca H., Bria O., Diseo de bloques
IP: Programabilidad y Reutilizacin. WICC2001,
San Luis, Argentina, May 2001, pp.2-5.
[12]. Wolf W., Computer as Components:
Principles of Embedded Computer Systems Design,
Morgan Kaufmann 2000, USA, ISBN 1-55860-541X.
[13]. ALTERA Corp., NIOS Soft core Embedded
Processor Data Sheet. Version 1. San Jos, CA,
USA, 2000.
[14]. ALTERA Corp., ARM-based Embedded
Processor Device Overrview. Version 1.1, MIPSbased Embedded Processor Device Overrview.
Version 1.1. San Jos, CA, USA, 2000.
[15]. ALTERA Corp., Intellectual Property
Catalog, M-CAT-AIPS-01, Altera Corp., 1999,
USA.
[16]. INMOS Ltd., Transputer Reference Manual,
Prentice Hall 1988, UK, ISBN 0-13-929001-X
[17]. MOTOROLA, MC68HC705C8A/D, Rev.1,
Motorola Inc., 1996, USA.

You might also like