0% found this document useful (0 votes)
55 views43 pages

Elc2009 Qemu Cris

This document discusses using QEMU to debug and profile embedded Linux/CRIS systems. QEMU is an open source machine emulator and virtualizer that can emulate complete machines and cross-compile Linux programs. It works by dynamically translating guest machine code to host code. For the CRIS architecture, QEMU provides debugging features like a GDB stub, execution traces, an L1 cache model, and processor pipeline model. It can also generate profiling statistics compatible with Kcachegrind to analyze performance.

Uploaded by

ma haijun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views43 pages

Elc2009 Qemu Cris

This document discusses using QEMU to debug and profile embedded Linux/CRIS systems. QEMU is an open source machine emulator and virtualizer that can emulate complete machines and cross-compile Linux programs. It works by dynamically translating guest machine code to host code. For the CRIS architecture, QEMU provides debugging features like a GDB stub, execution traces, an L1 cache model, and processor pipeline model. It can also generate profiling statistics compatible with Kcachegrind to analyze performance.

Uploaded by

ma haijun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Debugging and

profiling embedded
Linux/CRIS systems
with QEMU

Edgar E. Iglesias <edgar@axis.com>


Talk

● Why
● CRIS & ETRAX
● Quick overview
● QEMU
● Overview
● Debugging & Profiling features
● Summary
● Questions
Why

● We didnt have an emulator


● Fast
● Easy to use and extend
● Powerful debug capabilities
● QEMU - fun hobby project
CRIS
● Code Reduced Instruction Set
● ISA designed for small footprint.
● GNU toolchain (binutils, GCC, GDB).
● CRISv8 (1999)
● Designed to be small and for low power
consumption.
● 2-stage pipeline @ 100Mhz.
● uClinux (no MMU).
● CRISv10 (2000)
● Standard Linux (with MMU).
● CRISv32 (2004)
● 5-stage pipeline @ 200Mhz.
CRIS
● Code Reduced Instruction Set
● ISA designed for small footprint.
● GNU toolchain (binutils, GCC, GDB).
● CRISv8 (1999)
● Designed to be small and for low power
consumption.
● 2-stage pipeline @ 100Mhz.
● uClinux (no MMU).
● CRISv10 (2000)
● Standard Linux (with MMU).
● CRISv32 (2004)
● 5-stage pipeline @ 200Mhz.
CRISv32
● 32-bit RISC architecture
● Variable length (16bit) insn encoding.
● 5-stage pipeline
● Load+operate
● Enforces dependencies by interlocks.
● 2-stage Multiplier shares stage with MEM

● Auto-increment has no regforwarding

● load/store multiples lack regforwarding

● Delayed branches
● MMU / TLB
● 16 segments (linear or 8Kb paged).
● 8-bit ASID, 64 entries.
● L1 Cache
● 2 x 16Kb 2way VIPT.
● Coherent (buggy).
● No performance counters
ETRAX
● Ethernet Token Ring AXis
● Family of networking chips
● Lot's of I/O
● SCSI, IDE

● Ethernet, TokenRing

● USB, Parallel ports, Serial ports

● Etc..

● Print Servers, Storage Servers, Scan Servers,


Network Cameras, Network Video Servers etc.

● ARTPEC
● AXIS family of video processing chips.
AXIS Communications

● Video surveillance
● Cameras, Video encoders, Decoders, SW etc

● Early with embedded linux


“QEMU is a generic
and open source
machine emulator
and virtualizer.”
“QEMU is a generic
and open source
machine emulator
and virtualizer.”
QEMU

● System emulation
● Emulates a complete machine.
● Cross run unmodified OS/Firmware.
● Can also emulate boot-roms including
different bootstrap methods.

Linux-user emulation
● Emulates the target processor.
● Cross run linux programs.
● Syscalls run natively on the host
(through an argument translator).
QEMU

● System emulation
● Emulates a complete machine.
● Cross run unmodified OS/Firmware.
● Can also emulato boot-rooms and
including different bootstrap methods.

Linux-user emulation
● Emulates the target processor.
● Cross run linux programs.
● Syscalls run natively on the host
(through an argument translator).
QEMU

Does not continously interpret


guest ISA. Instead it translates
guest machine code into host
code.
● Fetching only done at translation time.
● Instruction decoding only done at translation

time.
● Basic optimization at translation time.

● Lazy Condition Code flags evaluation.


Dynamic Translation

● On demand translation of instruction sequences


from target to host ISA. The result is refered as a
Translation Block.

●Translation is done through a portable intermediate


generic code generator, Tiny Code Generator (TCG).
Tiny Code Generator

●Per CPU target translators translate guest


code into TCG operations.

● TCG runs generic optimization passes.


● Basic stuff, Regalloc, liveness analysis etc.

● TCG backends emit host machine code.

CRIS TCG TCG


Translator Generic Backend
Optimizer
Translation
CRIS:
move.d $r9, $r10
ret
addq 3, $r10
...
TCG:
mov_i32 $r10, $r9
movi_i32 cc_x, $0x0
mov_i32 cc_result, $r10
---
mov_i32 btarget, $srp
movi_i32 tmp0, $0xfffffffe
and_i32 btarget, btarget, tmp0
movi_i32 btaken, $0x1
...
TCG x86 backend:
mov 0x24(%ebp), %eax
mov 0x6c(%ebp), %edx
...
TCG Helpers

● Subroutine calls from TB


● TCG needs to writeback and reload the
CPUState around the call due to aliasing
and helper side-effects.
● PURE | CONST helpers avoid wb &
reloading.
● Nice if you can easily identify a complex
target instruction sequence.

● Compiled by host compiler


● Optimization cost mostly taken at QEMU
compile time.
Lazy CC evalutions

● CRIS has implicit updates


● Emulators need to evaluate the condition
code flags after every insn.

● Lazy evaluation
● Save operation and operands.
● Evauate when there is a dependency to
the flags.
QEMU TCG

● Target ports (TCG translators):


● Alpha, ARM, CRIS, MIPS, m68k, PPC, SH,
SPARC32/64, x86 and x86_64.


Host ports (TCG backends):
● ARM, HPPA, PPC, SPARC32, x86 and
x86_64.
QEMU IO

● Memory accesses
● No cache models.
● No bus transfer models.
● Limited bus topology modeling.

SoftMMU
● QEMU fast TLB caches slower guest TLB.
● I faults taken between TB's.
● D faults abort and retranslate the current
TB with extra info to find the actual guest
insn that caused the exception.
● Interrupts
● Taken between TB's.
QEMU Peripherals

● Interrupt controllers
● DMA units
● Flash memories (NOR/NAND)
● Networking
● Flexible ways to connect to the host.
● Support for DMA and PHY control.
● IDE / SCSI controllers
● Serial ports
● Graphic adapters
● Audio adapters
● More..
QEMU Peripherals

● Provide registration function


● Register callbacks for control

register access
● Combinational logic

● Timers

● Interrupts

● QEMU I/O

● Networking
● Serial ports
● IPC
● etc...
QEMU Boards

● Instantiate CPU cores


● Define Address map
● Wire up all the devices
● Load kernel/OS images
Debugging and Profiling CRIS

● Builtin GDB stub


● Execution traces

● L1 Cache model

● Processor pipeline model

● Interrupt latency tracker

● Kcachegrind compatible statistics

● Track peripheral programming

inefficiencies and errors


Builtin GDB stub

● Non-intrusive
● Controllable from first executed insn

● HW Breakpoints

● HW Watchpoints

● VM time stops while halted

● Configurable interrupts while single-

stepping
● Experimental patch for tracepoints
CRIS Cache

● L1 cache model
● Controller and tag memories.
● Does not include the data path/memories.
● Snoops on other bus masters.

Address from CPU


Tag Index Line offset

Index Tag Valid Dirty Index Data


0 x 0 0 0 x
1 x 1 0 1 x
2 x 1 1 2 x
... x 0 0 ... x
CRIS Cache
● QEMU Cache tag memories
● Not really bound by size.
● Extended with debug info.
● Virtual Address for the access.

● Virtual PC for the access.

● One dirty bit per line word.

● Connected to GDB
● Stop execution on cache-miss (misspoints)

Index Tag Valid Dirty VPC Vaddr


0 x 0 000..
1 x 1 010..
2 x 1 110..
... x 0 000..
CRIS Cache

● Track wasted writeback cycles


● Due to fragmented store patterns
CRIS Cache

● Track wasted writeback cycles


● Due to fragmented store patterns
● Reorganize global data
● __read_mostly attribute
● Reorganize structures
CRIS Cache

● Cache snoops on DMA accesses


● Incoherence warnings.
● TODO: Debugger breakpoints
CRIS Pipeline

● Work in progress
● Intra TB
● Computed at translation time.
● Fast but not all locked cycles are seen.
● No branch prediction
● Logs PC address and symbol name
Interrupt Latency

● Track IRQ masking (CPU line).


● Log long paths.

● Time estimate based on core


frequency, instruction count, interlock
cycles and cache statistics.
● Helps reducing:
● Interrupt latency
● Jitter

c0010156 (badcode) -> c0010338 (badcode) lr=c0010330 9632 insns 10398 cycles 41592ns
Interrupt Latency

{
unsigned long flags;

spin_lock_irq_save(&lock1, flags);

/* code. */
If (something) {
spin_lock_irq_save(&lock2, flags);
/* More critical code. */
spin_unlock_irqrestore(&lock2, flags);
}
/* code. */

spin_unlock_irqrestore(&lock1, flags);
}
Kcachegrind

● Instruction count per function


● Instructions with interrupts masked
● Cycle estimate per function
● Cache model
● Pipeline model
● No callgraphs (TODO)
Peripheral Programming

Warn for control register programming


errors and innefficiencies:
● Duplex mismatch MAC / PHY.
● Illegal combinations/setups.
● Unnecessary control register accesses.
● Enforce reserved fields
Peripheral Programming

Simplified view
Peripheral Programming

struct ram_entry
{
u16 ctrl;
u16 pos;
...
};

volatile struct ram_entry *e = SOME_ADDRESS;


If (e->ctrl & 1) { … }
If (e->ctrl & 2) { … }
If (e->ctrl & 4) { … }
… more...


Compiler emits loads/stores resulting in
deep bus transfers for every access to
ctrl!
Axis Devices


ETRAX-FS
● Bare FS virtual machine
● Axis Devboard 88

ARTPEC-3
● P3301, Q7401
● ARTPEC-4
● Prototype of virtual machine
● ARTPEC-B
● M3011
Axis Devices


ETRAX-FS / ARTPEC-3
● CRIS core
● L1 Cache
● MMU
● PIC
● Timers
● DMA
● Ethernet with PHY models
● Asynch Serial Ports
● PIO (NAND flashes)
● NOR flashes
● GPIO
● Temperature sensors (i2c)
Axis Devices


ARTPEC-B
● ARM926 core
● MMU
● PIC
● Timers
● Ethernet with PHY models
● Asynch Serial Ports
● NAND Controller
● GPIO
● Stub for RASC interface
● Just enough to boot.
Future work


Emulate media sources
● Image and Audio pipelines.
● Codecs.

Linux aware debugging
● Track kernel memory allocations.
● Kernel modules debuginfo.
● Track user-space processes in
system emulation.

TLB profiler
Summary
● Early access to hardware
● Initial testing
● Debugging
● Early boot code
● Cache incoherence
● Debug the debug code
● Profiling
● Interrupt latency
● Improve cache performance
● Avoid interlock cycles in hot loops
● Testing (future)
● FW up/downgrades
● Instrumentation
● I/O Stimulus
Questions

Thanks for listening

URL: git://repo.or.cz/qemu/cris-port.git
edgar@axis.com

You might also like