Author: Leonardo Capossio - bard0 design - hello@bard0.com
Synthesizable MJPEG encoder written in behavioral Verilog 2001, with a native
VHDL-1993 port under rtl/vhdl/, AXI interfaces, and up to 1080p30 on low end AMD/Xilinx 7-Series FPGAs. Two operating modes: Full encodes with runtime quality control;
Lite encodes with ~47% smaller LUT footprint and fixed synthesis-time quality.
A Python reference encoder is included for validation and test vector generation.
- Architecture
- Interfaces
- Parameters
- Capabilities
- Performance
- Resource Usage
- Pipeline Modules
- Quick Start
- Integration Example
- Tested Hardware
- Applications
- Directory Structure
- Contributing
- License
Architecture ↑ Top
The editable diagram spec docs/architecture.json
and rendered SVG docs/architecture.svg were created
with the hdldiagZero skill.
Interfaces ↑ Top
Video Input — AXI4-Stream Slave ↑ Top
| Signal | Width | Direction | Description |
|---|---|---|---|
s_axis_vid_tdata |
16/24 | In | YUYV (16-bit) or RGB (24-bit, when RGB_INPUT=1) |
s_axis_vid_tvalid |
1 | In | Data valid |
s_axis_vid_tready |
1 | Out | Backpressure |
s_axis_vid_tlast |
1 | In | End of scanline |
s_axis_vid_tuser |
1 | In | Start of frame (first pixel) |
YUYV mode (RGB_INPUT=0, default): 16-bit words. Even-indexed words carry
{Cb, Y}, odd-indexed carry {Cr, Y}. One word per pixel.
RGB mode (RGB_INPUT=1): 24-bit words {R[23:16], G[15:8], B[7:0]}. One
word per pixel. An internal BT.601 color converter produces YUYV for the pipeline.
JPEG Output — AXI4-Stream Master (8-bit) ↑ Top
| Signal | Width | Direction | Description |
|---|---|---|---|
m_axis_jpg_tdata |
8 | Out | JPEG byte |
m_axis_jpg_tvalid |
1 | Out | Byte valid |
m_axis_jpg_tlast |
1 | Out | End of JPEG frame |
Output is a complete JFIF file (SOI through EOI) per frame. Byte stuffing (0xFF → 0xFF 0x00) is handled internally.
No backpressure. The output has no tready signal — the consumer must always
accept data when tvalid is asserted. This is safe because compression reduces
the data rate well below the input rate. If the downstream sink may stall
(e.g., shared DMA bus), place a small FIFO (256–512 bytes) between the encoder
output and the sink.
Control — AXI4-Lite Slave (32-bit) ↑ Top
| Offset | Name | Access | Description |
|---|---|---|---|
| 0x00 | CTRL | R/W | [0] enable, [1] soft_reset |
| 0x04 | STATUS | R/W1C | [0] busy, [1] frame_done |
| 0x08 | FRAME_CNT | RO | Completed frame count |
| 0x0C | QUALITY | R/W | JPEG quality factor (1–100, default 95) |
| 0x10 | RESTART | R/W | Restart interval in MCUs (0 = disabled) |
| 0x14 | FRAME_SIZE | RO | Byte count of last completed frame |
Parameters ↑ Top
| Parameter | Default | Description |
|---|---|---|
LITE_MODE |
1 | 0 = full (1080p30, runtime quality), 1 = lite (720p60) |
LITE_QUALITY |
95 | Synthesis-time quality 1–100, used when LITE_MODE=1 |
IMG_WIDTH |
1280 | Input image width in pixels (multiple of 16) |
IMG_HEIGHT |
720 | Input image height in pixels (multiple of 8) |
EXIF_ENABLE |
0 | 1 = embed APP1/EXIF segment immediately after APP0 |
EXIF_X_RES |
72 | EXIF XResolution numerator (DPI when EXIF_RES_UNIT=2) |
EXIF_Y_RES |
72 | EXIF YResolution numerator |
EXIF_RES_UNIT |
2 | EXIF ResolutionUnit: 1 = no unit, 2 = inch, 3 = cm |
RGB_INPUT |
0 | 1 = 24-bit {R,G,B} AXI4-Stream input; 0 = 16-bit YUYV (default) |
Capabilities ↑ Top
- Standard: Baseline JPEG (ITU-T T.81), JFIF 1.01 container
- Chroma: YUV 4:2:2 (H=2, V=1 subsampling)
- Tables: Standard Huffman tables (Annex K), standard quantization tables
- Quality: Runtime via AXI4-Lite register (1–100) in full mode; synthesis-time via
LITE_QUALITY(1–100, default 95) in lite mode - Resolution: Parameterizable; validated at 1920×1080, 1280×720, and 640×480
- Frame rate: 1080p30 (full mode), 720p60 (lite mode), both at 150 MHz
- Output: Complete JFIF files with SOI, APP0, [APP1/EXIF], DQT, SOF0, DHT, SOS, DRI/RST, EOI
- EXIF: Optional APP1/EXIF segment (
EXIF_ENABLE=1) with XResolution, YResolution, ResolutionUnit IFD0 tags - RGB input: Optional built-in BT.601 color converter (
RGB_INPUT=1) accepts 24-bit{R,G,B}and produces YUYV internally
Performance ↑ Top
Both modes run at 150 MHz, delivering 2,343,750 blocks/sec with ~1 MCU row latency (8 lines).
| Metric | Full Mode | Lite Mode |
|---|---|---|
| Use case | HD capture, quality tuning | Cost-sensitive streaming |
| Target resolution | 1920×1080 (1080p30) | 1280×720 (720p60) |
| Quality | Runtime adjustable (1–100) | Synthesis-time (1–100, Q95 default) |
| Pipeline headroom | 1080p30: 83% | 720p60: 74% |
Compression (Mandrill test image) ↑ Top
| Image | Quality | Uncompressed (RGB) | JPEG Output | Ratio | Bits/pixel | PSNR vs original |
|---|---|---|---|---|---|---|
| 512×512 | Q95 | 768 KB | 211 KB | 3.6:1 | 5.29 | 42.38 dB¹ |
| 1280×720 | Q95 | 2,700 KB | 569 KB | 4.7:1 | 4.93 | 37.77 dB |
| 1280×720 | Q75 | 2,700 KB | 230 KB | 11.8:1 | 2.04 | 38.45 dB |
¹ 42.38 dB is the coefficient-level PSNR of the RTL output vs the Python reference (measures how closely the RTL matches the reference encoder, not the original image).
Hardware verification — Mandrill 1280×720, Q75 (Original | HW output | RTL sim | Diff×8):
HW and RTL simulation outputs are byte-exact. The current Arty A7-100T post-fcapz Verilog and VHDL bitstreams both pass the Mandrill 720p Q75 hardware test with identical 235,118-byte JPEG output and Y-PSNR 38.45 dB versus the original image.
Resource Usage ↑ Top
The numbers below are for the mjpegzero_enc_top encoder core only. They do
not include board wrappers, debug bridges, ELAs, or demo readback buffers.
Reports are generated by scripts/synth/amd/run_core_synth.tcl for an
XC7A100T target at 150 MHz.
| Configuration | HDL | LUTs | FFs | BRAM tiles | DSPs | WNS |
|---|---|---|---|---|---|---|
Core, LITE_MODE=0, 1920x1080, runtime quality |
Verilog | 2,155 | 1,029 | 16 | 23 | +0.516 ns |
Core, LITE_MODE=0, 1920x1080, runtime quality |
VHDL | 2,146 | 1,031 | 16 | 23 | +0.326 ns |
Core, LITE_MODE=1, 1280x720, Q95 |
Verilog | 1,828 | 980 | 11 | 21 | +0.516 ns |
Core, LITE_MODE=1, 1280x720, Q95 |
VHDL | 1,820 | 973 | 11 | 21 | +0.326 ns |
Use python scripts/check_core_resources.py --run-synth to regenerate the
Verilog/VHDL apples-to-apples comparison.
Pipeline Modules ↑ Top
| Module | Verilog Source | VHDL Source | Description |
|---|---|---|---|
| RGB/YUYV Converter | rtl/rgb_to_ycbcr.v |
rtl/vhdl/rgb_to_ycbcr.vhd |
Optional BT.601 3-stage pipeline; enabled by RGB_INPUT=1 |
| Input Buffer | rtl/input_buffer.v |
rtl/vhdl/input_buffer.vhd |
YUYV de-interleave, 8-line BRAM buffer, MCU-order output |
| 1D DCT | rtl/dct_1d.v |
rtl/vhdl/dct_1d.vhd |
8-point forward DCT, matrix multiply with 13-bit cosine ROM |
| 2D DCT | rtl/dct_2d.v |
rtl/vhdl/dct_2d.vhd |
Row-column decomposition with transpose buffer |
| Quantizer | rtl/quantizer.v |
rtl/vhdl/quantizer.vhd |
Multiply-by-reciprocal, 4-stage pipeline |
| Zigzag Reorder | rtl/zigzag_reorder.v |
rtl/vhdl/zigzag_reorder.vhd |
ROM-based address remap, double-buffered |
| Huffman Encoder | rtl/huffman_encoder.v |
rtl/vhdl/huffman_encoder.vhd |
Multi-cycle FSM, full DC/AC standard tables |
| Bitstream Packer | rtl/bitstream_packer.v |
rtl/vhdl/bitstream_packer.vhd |
64-bit accumulator, byte stuffing |
| JFIF Writer | rtl/jfif_writer.v |
rtl/vhdl/jfif_writer.vhd |
Header ROM, SOI/APP0/[APP1-EXIF]/DQT/EOI state machine |
| AXI4-Lite Regs | rtl/axi4_lite_regs.v |
rtl/vhdl/axi4_lite_regs.vhd |
Control/status register file |
| SDP BRAM | rtl/bram_sdp.v |
rtl/vhdl/bram_sdp.vhd |
Vendor-neutral behavioral simple dual-port RAM |
| Top-Level | rtl/mjpegzero_enc_top.v |
rtl/vhdl/mjpegzero_enc_top.vhd |
Pipeline integration and frame control |
| Timing Wrapper | rtl/synth_timing_wrapper.v |
rtl/vhdl/synth_timing_wrapper.vhd |
I/O flip-flops for synthesis timing analysis |
The encoder is maintained in behavioural Verilog 2001 and native VHDL-1993. The encoder core has no vendor-specific RTL primitives. The simple dual-port RAM wrappers are behavioral in both Verilog and VHDL.
Quick Start ↑ Top
Prerequisites ↑ Top
- AMD/Xilinx Vivado 2020.2+ (tested with 2025.2)
- Python 3.8+ with NumPy, SciPy, Pillow (for reference encoder)
- FFmpeg (for validation)
pip install -r python/requirements.txtVerification ↑ Top
The verification suite is split into three tiers. The first two tiers require only Python and iverilog — they are what GitHub Actions CI runs on every push. The third tier requires Vivado and is for local full-frame validation.
Tier 1 — Python-only (no simulator, no Vivado) ↑ Top
# Huffman ROM tables match ITU-T T.81 Annex K
python python/verify_huffman_rom.py
# LITE_QUALITY quantisation & reciprocal tables match Python reference
python python/verify_lite_quality.py
# Python reference encoder: encode 720p mandrill, decode, report PSNR
python python/test_encoder.py
# Visual quality check: side-by-side Original | JPEG decoded | Difference×8
python python/mandrill_compare.py --quality 95
python python/mandrill_compare.py --quality 75 --out compare_q75.pngTier 2 — RTL simulation with iverilog ← CI path ↑ Top
Compiles all RTL with iverilog, runs the CI testbench, and compares output JPEG coefficients block-by-block against Python reference files for Q=50, 75, 95. Pass criterion: max coefficient difference ≤ 1 (fixed-point rounding tolerance).
# Full mode (LITE_MODE=0, runtime quality via AXI4-Lite)
python python/verify_rtl_sim.py
# Lite mode (LITE_MODE=1, synthesis-time quality tables)
python python/verify_rtl_sim.py --lite
# With VCD dump
python python/verify_rtl_sim.py --dump-vcd
# RGB_INPUT=1 functional test (24-bit RGB through built-in color converter)
python python/verify_rtl_sim.py --rgb
python python/verify_rtl_sim.py --lite --rgb
# Random input backpressure gaps (tests input_buffer gap handling)
python python/verify_rtl_sim.py --gaps
# Minimum-width 16×8 frame (1 MCU — corner case for MCU column counter)
python python/verify_rtl_sim.py --min-width
# EXIF APP1 segment validation (full mode, 72 DPI default)
python python/verify_exif.py
python python/verify_exif.py --lite --x-res 96 --y-res 96 --res-unit 2
# AXI4-Lite register coverage (2-frame encode, reads back QUALITY/FRAME_CNT/FRAME_SIZE/STATUS)
python python/verify_axi_regs.py
python python/verify_axi_regs.py --liteRequires: iverilog / vvp on PATH, Python ≥ 3.8 with NumPy.
RTL simulation uses the same behavioral rtl/bram_sdp.v as synthesis.
Verilator code coverage (optional, requires Verilator ≥ 4.2) ↑ Top
Compiles the RTL with --coverage, runs six scenarios designed to hit all major
code paths (Q=50/75/95, flat-gray image for DC/EOB paths, checkerboard image for
ZRL paths, and an EXIF_ENABLE=1 build for EXIF state coverage), merges the
coverage data, and generates an LCOV report.
# Full mode — Q=50/75/95 + flat + checkerboard + EXIF run
python python/run_coverage.py
# Lite mode
python python/run_coverage.py --lite
# With HTML report (requires lcov/genhtml)
python python/run_coverage.py --html
# Custom quality set
python python/run_coverage.py --qualities 75,95Coverage data is written to build/coverage/. LCOV info at
build/coverage/coverage.info; HTML report (if --html) at
build/coverage/html/index.html.
Tier 3 — Full 720p Vivado simulation (local only, requires Vivado) ↑ Top
python scripts/run_sim.py 720p # no waveforms
python scripts/run_sim.py 720p vcd # + VCD dump → build/sim/tb_mjpegzero_enc.vcd
python scripts/run_sim.py lite vcd # lite mode with VCDOutput JPEG is written to build/sim/sim_output.jpg. Verified PSNR vs original: 37.77 dB.
FuseSoC ↑ Top
The core is described in mjpegzero.core (CAPI2 format).
# Add core to local library
fusesoc library add mjpegzero .
# Run simulation (icarus, full mode)
fusesoc run --target sim bard0-design:mjpegzero:mjpegzero_enc
# Run simulation (lite mode)
fusesoc run --target sim_lite bard0-design:mjpegzero:mjpegzero_enc
# Lint with Verilator
fusesoc run --target lint bard0-design:mjpegzero:mjpegzero_enc
# Synthesize for AMD/Xilinx Arty A7-100T
fusesoc run --target synth_amd bard0-design:mjpegzero:mjpegzero_enc
# Override parameters
fusesoc run --target sim bard0-design:mjpegzero:mjpegzero_enc \
--LITE_MODE 0 --IMG_WIDTH 1920 --IMG_HEIGHT 1080Available targets: sim, sim_lite, lint, synth_amd, synth_amd_lite.
To use mjpegZero as a dependency in your own FuseSoC project, add to your .core file:
depend:
- bard0-design:mjpegzero:mjpegzero_enc:0.1.0LiteX Integration ↑ Top
A project-local LiteX wrapper is provided in
integrations/litex/mjpegzero.py. It adds
the Verilog sources to a LiteX platform, instantiates mjpegzero_enc_top,
exposes a LiteX video stream sink, exposes a JPEG byte stream source, and keeps
the core register file on AXI-Lite.
from integrations.litex.mjpegzero import MjpegZero, MjpegZeroConfig
encoder = MjpegZero(
platform,
config=MjpegZeroConfig(
lite_mode=1,
lite_quality=75,
img_width=1280,
img_height=720,
rgb_input=0,
),
vendor="xilinx7",
jpeg_fifo_depth=512,
)
# encoder.video_sink: data/valid/ready/last/user input stream
# encoder.jpeg_source: data/valid/ready/last JPEG byte stream
# encoder.axi_lite: AXI-Lite control/status register busThe encoder's native JPEG output has no tready. The LiteX wrapper therefore
inserts an optional stream FIFO and exposes jpeg_overflow as a sticky
indicator if the downstream consumer stalls longer than the FIFO can absorb.
Run Synthesis ↑ Top
# Using the master runner (recommended):
python scripts/run_all.py synth # Full mode, AMD/Xilinx (default)
python scripts/run_all.py synth --vendor amd
python scripts/run_all.py impl --vendor amd
# Direct Vivado invocation:
# Full mode (1920×1080, 150 MHz, runtime quality)
vivado -mode batch -source scripts/synth/amd/run_synth.tcl
# Lite mode (1280×720, 150 MHz, default Q95)
vivado -mode batch -source scripts/synth/amd/run_synth.tcl -tclargs lite
# Lite mode with custom quality (e.g., Q80)
vivado -mode batch -source scripts/synth/amd/run_synth.tcl -tclargs lite 80
# Native VHDL-1993 encoder synthesis
vivado -mode batch -source scripts/synth/amd/run_synth_vhdl.tcl
vivado -mode batch -source scripts/synth/amd/run_synth_vhdl.tcl -tclargs lite 80
# Core-only Verilog/VHDL resource comparison
vivado -mode batch -source scripts/synth/amd/run_core_synth.tcl -tclargs verilog
vivado -mode batch -source scripts/synth/amd/run_core_synth.tcl -tclargs vhdl
python scripts/check_core_resources.pyReports are written to build/synth/ or build/synth_lite/.
AMD/Vivado and Altera/Quartus scripts are fully implemented.
Synthesis scripts for Lattice Radiant, Microchip Libero, Efinix Efinity, and GoWin EDA
are scaffolded in scripts/synth/<vendor>/ — implement the tool-specific Tcl flow.
Contributions welcome — see CONTRIBUTING.md.
Run Implementation (Place & Route) ↑ Top
python scripts/run_all.py implReports are written to build/impl/.
Utility Scripts ↑ Top
| Script | Purpose |
|---|---|
python/mandrill_compare.py |
Encode/decode the mandrill image and produce a side-by-side PNG: Original | JPEG decoded | Difference×8. |
python/compare_jpeg_scan.py |
Block-by-block DCT coefficient comparison between two JPEG files. |
python/verify_exif.py |
RTL simulation test for the APP1/EXIF segment; validates all IFD0 fields byte-by-byte. |
python/verify_axi_regs.py |
AXI4-Lite register coverage test: QUALITY, FRAME_CNT, FRAME_SIZE, STATUS W1C, RESTART (2-frame encode). |
python/run_coverage.py |
Verilator --coverage driver: compiles RTL, runs Q=50/75/95 + flat/checker/EXIF scenarios, merges .dat files, produces LCOV report. |
python/generate_test_vectors.py |
Generates all simulation test vectors including yuyv_input.hex, yuyv_flat.hex (DC/EOB coverage), and yuyv_checker.hex (ZRL coverage). |
python/gen_huffman_rom.py |
Regenerate the Huffman ROM initial block in rtl/huffman_encoder.v from the standard BITS/VALS arrays. |
python/gen_lite_tables.py |
Regenerate the LITE_QUALITY quantisation table initial blocks in rtl/quantizer.v. |
python/yuyv_convert.py |
Shared RGB-to-YUYV conversion for RTL simulation and hardware tests. |
scripts/hw_test_mandrill.py |
End-to-end hardware verification through fcapz: converts mandrill 720p, runs RTL sim + HW encode, compares outputs. |
Integration Example ↑ Top
mjpegzero_enc_top #(
.IMG_WIDTH (1920),
.IMG_HEIGHT (1080),
.LITE_MODE (0), // 1 = fixed quality, 720p, ~47% fewer LUTs
.LITE_QUALITY (95), // Synthesis-time quality (1-100), lite mode only
// Optional: EXIF APP1 segment
.EXIF_ENABLE (1), // 0 = no EXIF (default)
.EXIF_X_RES (72), // XResolution numerator (DPI)
.EXIF_Y_RES (72), // YResolution numerator
.EXIF_RES_UNIT(2), // 2 = inch
// Optional: RGB input path (set to 0 for standard YUYV input)
.RGB_INPUT (0) // 1 = 24-bit {R,G,B} AXI4-Stream input
) u_mjpeg (
.clk (pixel_clk), // 150 MHz
.rst_n (sys_rst_n),
// Connect to video source (camera, framebuffer, etc.)
.s_axis_vid_tdata (video_tdata), // 16-bit YUYV
.s_axis_vid_tvalid (video_tvalid),
.s_axis_vid_tready (video_tready),
.s_axis_vid_tlast (video_tlast), // End of line
.s_axis_vid_tuser (video_tuser), // Start of frame
// Connect to DMA or output FIFO (no backpressure — always accept)
.m_axis_jpg_tdata (jpeg_tdata), // 8-bit JPEG bytes
.m_axis_jpg_tvalid (jpeg_tvalid),
.m_axis_jpg_tlast (jpeg_tlast), // End of JPEG frame
// Connect to AXI interconnect or tie off
.s_axi_awaddr (axi_awaddr),
.s_axi_awvalid (axi_awvalid),
.s_axi_awready (axi_awready),
.s_axi_wdata (axi_wdata),
.s_axi_wstrb (axi_wstrb),
.s_axi_wvalid (axi_wvalid),
.s_axi_wready (axi_wready),
.s_axi_bresp (axi_bresp),
.s_axi_bvalid (axi_bvalid),
.s_axi_bready (axi_bready),
.s_axi_araddr (axi_araddr),
.s_axi_arvalid (axi_arvalid),
.s_axi_arready (axi_arready),
.s_axi_rdata (axi_rdata),
.s_axi_rresp (axi_rresp),
.s_axi_rvalid (axi_rvalid),
.s_axi_rready (axi_rready)
);Tested Hardware ↑ Top
| Board | Part | Example project | Status |
|---|---|---|---|
| Digilent Arty A7-100T | XC7A100TCSG324-1 | example_proj/arty_a7_100t/ |
Verilog and VHDL post-fcapz bitstreams close timing and pass Mandrill 720p HW test byte-exact |
| Digilent Arty S7-50 | XC7S50CSGA324-1 | example_proj/arty_s7_50/ |
Build scaffolded; rebuild + HW verification pending |
Any AMD/Xilinx 7-Series device is a straightforward port — swap the XDC and adjust JPEG_WORDS
for available BRAM.
Applications ↑ Top
- Drone / UAV cameras — lightweight MJPEG stream over a low-bandwidth radio link
- IP security cameras — per-frame JPEG over Ethernet, no inter-frame dependency
- Machine vision — on-FPGA compression before USB/GigE transfer to host
- Medical imaging — lossless-adjacent quality (Q95+) with intra-frame-only coding
- Automotive — dashcam and surround-view recording with frame-accurate random access
- Industrial inspection — compress high-speed line-scan data in real time
- Broadcast contribution — MJPEG-over-RTP for low-latency studio feeds
- Frame grabbers — capture and compress SDI/HDMI input on an FPGA capture card
Directory Structure ↑ Top
mjpegZero/
rtl/ Synthesizable Verilog 2001 source
vhdl/ Native VHDL-1993 encoder sources
vendor/ Board-specific BRAM wrappers (AMD, Altera, Lattice, ...)
sim/ SystemVerilog testbench and test vectors
python/ Reference encoder, verification, test vector generation
scripts/ Vivado TCL scripts and Python runner
example_proj/ Ready-to-build board examples
common/ Shared demo top-level + Python host (used by every board)
arty_a7_100t/ Digilent Arty A7-100T (verified reference)
arty_s7_50/ Digilent Arty S7-50 (rebuild + HW test pending)
fcapz/ Git submodule: fpgacapZero EJTAG-AXI bridge + ELA + host
build/ Synthesis/implementation output (generated)
Contributing ↑ Top
Contributions are welcome. See CONTRIBUTING.md for details.
The most impactful contributions are board-level examples that show the encoder
running on hardware beyond the reference Arty A7-100T. All examples live under
example_proj/<board_name>/. New examples for Nexys Video,
ZedBoard, DE10-Nano, iCEBreaker, and others are welcome.
License ↑ Top
Apache License 2.0 + Commons Clause v1.0. See LICENSE for full terms.
Non-commercial use (research, education, hobby projects, open-source) is freely permitted under the Apache 2.0 terms.
Commercial use (integration into commercial products, services, or consulting engagements) requires written permission from the author. Contact: hello@bard0.com