SIWADO

Figure 1: RTL view of SIWADO Architecture

Figure 2: FSM Bubble Diagram

Overview

We propose the design and full custom implementation of a 16 bit RISC style microcontroller optimized for high density silicon integration guided by a 1,000 gate target. The architecture utilizes a non pipelined, multi cycle execution model governed by a central finite state machine (FSM). This approach prioritizes hardware resource reuse, allowing for the implementation of complex mathematical functions without exceeding the transistor budget. The datapath consists of a 16 bit ALU, a 16 bit program counter, and a register file with eight general purpose registers (R0–R7). To maintain ISA consistency, all arithmetic operations utilize 16 bit two’s complement signed representation. Register R0 is hardwired to logic zero, simplifying common data movement operations. The system operates without external DRAM or an on-chip cache hierarchy. Data initialization is handled through immediate addressing instructions, and external communication is achieved via a memory-mapped I/O model with a dedicated 32-pad parallel interface, described in detail below.

Instruction Set Architecture (ISA)

The ISA utilizes 4 bit opcodes to support sixteen instructions, including ADD, SUB, AND, OR, XOR, and control flow (BEQ, BNE, HALT). To handle data initialization without external RAM, we include Load-Upper-Immediate (LUI) and Add-Immediate (ADDI). A core area saving strategy is the centralized sequential shifter. This single hardware block serves as the execution engine for four distinct instruction types: Logical Shift Left/Right (LSL/LSR), Multiply-Accumulate (MAC), and Count Leading Zeros (CLZ). These are implemented as blocking instructions: the FSM enters a dedicated execution loop for up to 8 cycles, during which the fetch cycle is suspended. This ensures deterministic timing and eliminates the need for complex resource arbitration. It is worth noting that the sequential shifter manages to deliver outputs in 8 cycles or less thanks to a dual-clock interleaved structure carefully designed to maximize throughput.

Hardware Acceleration: MAC and CLZ

The defining feature of the SIWADO architecture is its two hardware accelerated instructions:

Multiply Accumulate (MAC): Operands are treated as 16 bit signed values. The instruction follows the format Rd=Rd+(Rs1×Rs2), where the destination register serves as the accumulator. The unit calculates a 32 bit intermediate product using a sign extended sequential shift and add algorithm, truncating the result to the lower 16 bits for accumulation.
Count Leading Zeros (CLZ): This unit treats the input as an unsigned bit pattern. It shares the centralized shifter module and shift register hardware, with operation-specific accumulation logic, to detect the first logic “one” from the MSB, returning the count to the destination register. This implementation provides DSP level functionality for sparsity aware algorithms and fixed point normalization while maintaining a compact gate footprint.

Memory Mapped I/O and Physical Interface

A 4-word internal data memory supports general-purpose load/store operations. The system also employs a memory mapped I/O (MMIO) model to communicate with the external environment via a unified 16 bit address space. To ensure high observability during verification, we have reserved specific addresses for a 32 pad parallel interface:

0xFC00 (Output Port): A Store Word (SW) to this address latches 16 bits of data directly to 16 dedicated output pads.
0xFC02 (Input Port): A Load Word (LW) from this address samples the physical state of 16 dedicated input pads. This deterministic interface allows the processor to interact with external hardware or test equipment without the overhead of a serialized bus or complex handshake protocols.

Verification

Verification was driven by a custom Python assembler, a structured Verilog testbench, and a detailed assembly program covering all instruction types including arithmetic, logic, branch, memory, MMIO, and all four shifter operations. Functional validation was completed in Questa pre and post Design Compiler synthesis. Physical implementation was completed using Cadence Innovus for place and route and Magic for layout verification. Switch-level simulation was performed using IRSIM on the Magic-extracted netlist. Two parallel synthesis flows were maintained: one with a file-loaded instruction memory for Questa waveform verification, and one with hardcoded instructions for the Innovus and IRSIM physical design flow.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
assembler		assembler
assets		assets
implementation		implementation
magic		magic
src		src
synthesis		synthesis
testbenches		testbenches
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIWADO

Overview

Instruction Set Architecture (ISA)

Hardware Acceleration: MAC and CLZ

Memory Mapped I/O and Physical Interface

Verification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SIWADO

Overview

Instruction Set Architecture (ISA)

Hardware Acceleration: MAC and CLZ

Memory Mapped I/O and Physical Interface

Verification

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages