Skip to content

lcapossio/mjpegZero

mjpegZero — FPGA Hardware Motion JPEG Encoder

CI License: Apache 2.0 + Commons Clause RTL: Verilog 2001 RTL: VHDL-1993 FuseSoC

Author: Leonardo Capossio - bard0 design - hello@bard0.com

Synthesizable MJPEG encoder written in behavioral Verilog 2001, with a native VHDL-1993 port under rtl/vhdl/, AXI interfaces, and up to 1080p30 on low end AMD/Xilinx 7-Series FPGAs. Two operating modes: Full encodes with runtime quality control; Lite encodes with ~47% smaller LUT footprint and fixed synthesis-time quality.

A Python reference encoder is included for validation and test vector generation.

Index

Architecture ↑ Top

mjpegZero encoder top-level architecture

The editable diagram spec docs/architecture.json and rendered SVG docs/architecture.svg were created with the hdldiagZero skill.

Interfaces ↑ Top

Video Input — AXI4-Stream Slave ↑ Top

Signal Width Direction Description
s_axis_vid_tdata 16/24 In YUYV (16-bit) or RGB (24-bit, when RGB_INPUT=1)
s_axis_vid_tvalid 1 In Data valid
s_axis_vid_tready 1 Out Backpressure
s_axis_vid_tlast 1 In End of scanline
s_axis_vid_tuser 1 In Start of frame (first pixel)

YUYV mode (RGB_INPUT=0, default): 16-bit words. Even-indexed words carry {Cb, Y}, odd-indexed carry {Cr, Y}. One word per pixel.

RGB mode (RGB_INPUT=1): 24-bit words {R[23:16], G[15:8], B[7:0]}. One word per pixel. An internal BT.601 color converter produces YUYV for the pipeline.

JPEG Output — AXI4-Stream Master (8-bit) ↑ Top

Signal Width Direction Description
m_axis_jpg_tdata 8 Out JPEG byte
m_axis_jpg_tvalid 1 Out Byte valid
m_axis_jpg_tlast 1 Out End of JPEG frame

Output is a complete JFIF file (SOI through EOI) per frame. Byte stuffing (0xFF → 0xFF 0x00) is handled internally.

No backpressure. The output has no tready signal — the consumer must always accept data when tvalid is asserted. This is safe because compression reduces the data rate well below the input rate. If the downstream sink may stall (e.g., shared DMA bus), place a small FIFO (256–512 bytes) between the encoder output and the sink.

Control — AXI4-Lite Slave (32-bit) ↑ Top

Offset Name Access Description
0x00 CTRL R/W [0] enable, [1] soft_reset
0x04 STATUS R/W1C [0] busy, [1] frame_done
0x08 FRAME_CNT RO Completed frame count
0x0C QUALITY R/W JPEG quality factor (1–100, default 95)
0x10 RESTART R/W Restart interval in MCUs (0 = disabled)
0x14 FRAME_SIZE RO Byte count of last completed frame

Parameters ↑ Top

Parameter Default Description
LITE_MODE 1 0 = full (1080p30, runtime quality), 1 = lite (720p60)
LITE_QUALITY 95 Synthesis-time quality 1–100, used when LITE_MODE=1
IMG_WIDTH 1280 Input image width in pixels (multiple of 16)
IMG_HEIGHT 720 Input image height in pixels (multiple of 8)
EXIF_ENABLE 0 1 = embed APP1/EXIF segment immediately after APP0
EXIF_X_RES 72 EXIF XResolution numerator (DPI when EXIF_RES_UNIT=2)
EXIF_Y_RES 72 EXIF YResolution numerator
EXIF_RES_UNIT 2 EXIF ResolutionUnit: 1 = no unit, 2 = inch, 3 = cm
RGB_INPUT 0 1 = 24-bit {R,G,B} AXI4-Stream input; 0 = 16-bit YUYV (default)

Capabilities ↑ Top

  • Standard: Baseline JPEG (ITU-T T.81), JFIF 1.01 container
  • Chroma: YUV 4:2:2 (H=2, V=1 subsampling)
  • Tables: Standard Huffman tables (Annex K), standard quantization tables
  • Quality: Runtime via AXI4-Lite register (1–100) in full mode; synthesis-time via LITE_QUALITY (1–100, default 95) in lite mode
  • Resolution: Parameterizable; validated at 1920×1080, 1280×720, and 640×480
  • Frame rate: 1080p30 (full mode), 720p60 (lite mode), both at 150 MHz
  • Output: Complete JFIF files with SOI, APP0, [APP1/EXIF], DQT, SOF0, DHT, SOS, DRI/RST, EOI
  • EXIF: Optional APP1/EXIF segment (EXIF_ENABLE=1) with XResolution, YResolution, ResolutionUnit IFD0 tags
  • RGB input: Optional built-in BT.601 color converter (RGB_INPUT=1) accepts 24-bit {R,G,B} and produces YUYV internally

Performance ↑ Top

Both modes run at 150 MHz, delivering 2,343,750 blocks/sec with ~1 MCU row latency (8 lines).

Metric Full Mode Lite Mode
Use case HD capture, quality tuning Cost-sensitive streaming
Target resolution 1920×1080 (1080p30) 1280×720 (720p60)
Quality Runtime adjustable (1–100) Synthesis-time (1–100, Q95 default)
Pipeline headroom 1080p30: 83% 720p60: 74%

Compression (Mandrill test image) ↑ Top

Image Quality Uncompressed (RGB) JPEG Output Ratio Bits/pixel PSNR vs original
512×512 Q95 768 KB 211 KB 3.6:1 5.29 42.38 dB¹
1280×720 Q95 2,700 KB 569 KB 4.7:1 4.93 37.77 dB
1280×720 Q75 2,700 KB 230 KB 11.8:1 2.04 38.45 dB

¹ 42.38 dB is the coefficient-level PSNR of the RTL output vs the Python reference (measures how closely the RTL matches the reference encoder, not the original image).

Hardware verification — Mandrill 1280×720, Q75 (Original | HW output | RTL sim | Diff×8):

HW vs Sim comparison

HW and RTL simulation outputs are byte-exact. The current Arty A7-100T post-fcapz Verilog and VHDL bitstreams both pass the Mandrill 720p Q75 hardware test with identical 235,118-byte JPEG output and Y-PSNR 38.45 dB versus the original image.

Resource Usage ↑ Top

The numbers below are for the mjpegzero_enc_top encoder core only. They do not include board wrappers, debug bridges, ELAs, or demo readback buffers. Reports are generated by scripts/synth/amd/run_core_synth.tcl for an XC7A100T target at 150 MHz.

Configuration HDL LUTs FFs BRAM tiles DSPs WNS
Core, LITE_MODE=0, 1920x1080, runtime quality Verilog 2,155 1,029 16 23 +0.516 ns
Core, LITE_MODE=0, 1920x1080, runtime quality VHDL 2,146 1,031 16 23 +0.326 ns
Core, LITE_MODE=1, 1280x720, Q95 Verilog 1,828 980 11 21 +0.516 ns
Core, LITE_MODE=1, 1280x720, Q95 VHDL 1,820 973 11 21 +0.326 ns

Use python scripts/check_core_resources.py --run-synth to regenerate the Verilog/VHDL apples-to-apples comparison.

Pipeline Modules ↑ Top

Module Verilog Source VHDL Source Description
RGB/YUYV Converter rtl/rgb_to_ycbcr.v rtl/vhdl/rgb_to_ycbcr.vhd Optional BT.601 3-stage pipeline; enabled by RGB_INPUT=1
Input Buffer rtl/input_buffer.v rtl/vhdl/input_buffer.vhd YUYV de-interleave, 8-line BRAM buffer, MCU-order output
1D DCT rtl/dct_1d.v rtl/vhdl/dct_1d.vhd 8-point forward DCT, matrix multiply with 13-bit cosine ROM
2D DCT rtl/dct_2d.v rtl/vhdl/dct_2d.vhd Row-column decomposition with transpose buffer
Quantizer rtl/quantizer.v rtl/vhdl/quantizer.vhd Multiply-by-reciprocal, 4-stage pipeline
Zigzag Reorder rtl/zigzag_reorder.v rtl/vhdl/zigzag_reorder.vhd ROM-based address remap, double-buffered
Huffman Encoder rtl/huffman_encoder.v rtl/vhdl/huffman_encoder.vhd Multi-cycle FSM, full DC/AC standard tables
Bitstream Packer rtl/bitstream_packer.v rtl/vhdl/bitstream_packer.vhd 64-bit accumulator, byte stuffing
JFIF Writer rtl/jfif_writer.v rtl/vhdl/jfif_writer.vhd Header ROM, SOI/APP0/[APP1-EXIF]/DQT/EOI state machine
AXI4-Lite Regs rtl/axi4_lite_regs.v rtl/vhdl/axi4_lite_regs.vhd Control/status register file
SDP BRAM rtl/bram_sdp.v rtl/vhdl/bram_sdp.vhd Vendor-neutral behavioral simple dual-port RAM
Top-Level rtl/mjpegzero_enc_top.v rtl/vhdl/mjpegzero_enc_top.vhd Pipeline integration and frame control
Timing Wrapper rtl/synth_timing_wrapper.v rtl/vhdl/synth_timing_wrapper.vhd I/O flip-flops for synthesis timing analysis

The encoder is maintained in behavioural Verilog 2001 and native VHDL-1993. The encoder core has no vendor-specific RTL primitives. The simple dual-port RAM wrappers are behavioral in both Verilog and VHDL.

Quick Start ↑ Top

Prerequisites ↑ Top

  • AMD/Xilinx Vivado 2020.2+ (tested with 2025.2)
  • Python 3.8+ with NumPy, SciPy, Pillow (for reference encoder)
  • FFmpeg (for validation)
pip install -r python/requirements.txt

Verification ↑ Top

The verification suite is split into three tiers. The first two tiers require only Python and iverilog — they are what GitHub Actions CI runs on every push. The third tier requires Vivado and is for local full-frame validation.

Tier 1 — Python-only (no simulator, no Vivado) ↑ Top

# Huffman ROM tables match ITU-T T.81 Annex K
python python/verify_huffman_rom.py

# LITE_QUALITY quantisation & reciprocal tables match Python reference
python python/verify_lite_quality.py

# Python reference encoder: encode 720p mandrill, decode, report PSNR
python python/test_encoder.py

# Visual quality check: side-by-side Original | JPEG decoded | Difference×8
python python/mandrill_compare.py --quality 95
python python/mandrill_compare.py --quality 75 --out compare_q75.png

Tier 2 — RTL simulation with iverilog ← CI path ↑ Top

Compiles all RTL with iverilog, runs the CI testbench, and compares output JPEG coefficients block-by-block against Python reference files for Q=50, 75, 95. Pass criterion: max coefficient difference ≤ 1 (fixed-point rounding tolerance).

# Full mode (LITE_MODE=0, runtime quality via AXI4-Lite)
python python/verify_rtl_sim.py

# Lite mode (LITE_MODE=1, synthesis-time quality tables)
python python/verify_rtl_sim.py --lite

# With VCD dump
python python/verify_rtl_sim.py --dump-vcd

# RGB_INPUT=1 functional test (24-bit RGB through built-in color converter)
python python/verify_rtl_sim.py --rgb
python python/verify_rtl_sim.py --lite --rgb

# Random input backpressure gaps (tests input_buffer gap handling)
python python/verify_rtl_sim.py --gaps

# Minimum-width 16×8 frame (1 MCU — corner case for MCU column counter)
python python/verify_rtl_sim.py --min-width

# EXIF APP1 segment validation (full mode, 72 DPI default)
python python/verify_exif.py
python python/verify_exif.py --lite --x-res 96 --y-res 96 --res-unit 2

# AXI4-Lite register coverage (2-frame encode, reads back QUALITY/FRAME_CNT/FRAME_SIZE/STATUS)
python python/verify_axi_regs.py
python python/verify_axi_regs.py --lite

Requires: iverilog / vvp on PATH, Python ≥ 3.8 with NumPy. RTL simulation uses the same behavioral rtl/bram_sdp.v as synthesis.

Verilator code coverage (optional, requires Verilator ≥ 4.2) ↑ Top

Compiles the RTL with --coverage, runs six scenarios designed to hit all major code paths (Q=50/75/95, flat-gray image for DC/EOB paths, checkerboard image for ZRL paths, and an EXIF_ENABLE=1 build for EXIF state coverage), merges the coverage data, and generates an LCOV report.

# Full mode — Q=50/75/95 + flat + checkerboard + EXIF run
python python/run_coverage.py

# Lite mode
python python/run_coverage.py --lite

# With HTML report (requires lcov/genhtml)
python python/run_coverage.py --html

# Custom quality set
python python/run_coverage.py --qualities 75,95

Coverage data is written to build/coverage/. LCOV info at build/coverage/coverage.info; HTML report (if --html) at build/coverage/html/index.html.

Tier 3 — Full 720p Vivado simulation (local only, requires Vivado) ↑ Top

python scripts/run_sim.py 720p           # no waveforms
python scripts/run_sim.py 720p vcd       # + VCD dump → build/sim/tb_mjpegzero_enc.vcd
python scripts/run_sim.py lite vcd       # lite mode with VCD

Output JPEG is written to build/sim/sim_output.jpg. Verified PSNR vs original: 37.77 dB.

FuseSoC ↑ Top

The core is described in mjpegzero.core (CAPI2 format).

# Add core to local library
fusesoc library add mjpegzero .

# Run simulation (icarus, full mode)
fusesoc run --target sim bard0-design:mjpegzero:mjpegzero_enc

# Run simulation (lite mode)
fusesoc run --target sim_lite bard0-design:mjpegzero:mjpegzero_enc

# Lint with Verilator
fusesoc run --target lint bard0-design:mjpegzero:mjpegzero_enc

# Synthesize for AMD/Xilinx Arty A7-100T
fusesoc run --target synth_amd bard0-design:mjpegzero:mjpegzero_enc

# Override parameters
fusesoc run --target sim bard0-design:mjpegzero:mjpegzero_enc \
  --LITE_MODE 0 --IMG_WIDTH 1920 --IMG_HEIGHT 1080

Available targets: sim, sim_lite, lint, synth_amd, synth_amd_lite.

To use mjpegZero as a dependency in your own FuseSoC project, add to your .core file:

depend:
  - bard0-design:mjpegzero:mjpegzero_enc:0.1.0

LiteX Integration ↑ Top

A project-local LiteX wrapper is provided in integrations/litex/mjpegzero.py. It adds the Verilog sources to a LiteX platform, instantiates mjpegzero_enc_top, exposes a LiteX video stream sink, exposes a JPEG byte stream source, and keeps the core register file on AXI-Lite.

from integrations.litex.mjpegzero import MjpegZero, MjpegZeroConfig

encoder = MjpegZero(
    platform,
    config=MjpegZeroConfig(
        lite_mode=1,
        lite_quality=75,
        img_width=1280,
        img_height=720,
        rgb_input=0,
    ),
    vendor="xilinx7",
    jpeg_fifo_depth=512,
)

# encoder.video_sink:  data/valid/ready/last/user input stream
# encoder.jpeg_source: data/valid/ready/last JPEG byte stream
# encoder.axi_lite:    AXI-Lite control/status register bus

The encoder's native JPEG output has no tready. The LiteX wrapper therefore inserts an optional stream FIFO and exposes jpeg_overflow as a sticky indicator if the downstream consumer stalls longer than the FIFO can absorb.

Run Synthesis ↑ Top

# Using the master runner (recommended):
python scripts/run_all.py synth               # Full mode, AMD/Xilinx (default)
python scripts/run_all.py synth --vendor amd
python scripts/run_all.py impl  --vendor amd

# Direct Vivado invocation:
# Full mode (1920×1080, 150 MHz, runtime quality)
vivado -mode batch -source scripts/synth/amd/run_synth.tcl

# Lite mode (1280×720, 150 MHz, default Q95)
vivado -mode batch -source scripts/synth/amd/run_synth.tcl -tclargs lite

# Lite mode with custom quality (e.g., Q80)
vivado -mode batch -source scripts/synth/amd/run_synth.tcl -tclargs lite 80

# Native VHDL-1993 encoder synthesis
vivado -mode batch -source scripts/synth/amd/run_synth_vhdl.tcl
vivado -mode batch -source scripts/synth/amd/run_synth_vhdl.tcl -tclargs lite 80

# Core-only Verilog/VHDL resource comparison
vivado -mode batch -source scripts/synth/amd/run_core_synth.tcl -tclargs verilog
vivado -mode batch -source scripts/synth/amd/run_core_synth.tcl -tclargs vhdl
python scripts/check_core_resources.py

Reports are written to build/synth/ or build/synth_lite/.

AMD/Vivado and Altera/Quartus scripts are fully implemented. Synthesis scripts for Lattice Radiant, Microchip Libero, Efinix Efinity, and GoWin EDA are scaffolded in scripts/synth/<vendor>/ — implement the tool-specific Tcl flow. Contributions welcome — see CONTRIBUTING.md.

Run Implementation (Place & Route) ↑ Top

python scripts/run_all.py impl

Reports are written to build/impl/.

Utility Scripts ↑ Top

Script Purpose
python/mandrill_compare.py Encode/decode the mandrill image and produce a side-by-side PNG: Original | JPEG decoded | Difference×8.
python/compare_jpeg_scan.py Block-by-block DCT coefficient comparison between two JPEG files.
python/verify_exif.py RTL simulation test for the APP1/EXIF segment; validates all IFD0 fields byte-by-byte.
python/verify_axi_regs.py AXI4-Lite register coverage test: QUALITY, FRAME_CNT, FRAME_SIZE, STATUS W1C, RESTART (2-frame encode).
python/run_coverage.py Verilator --coverage driver: compiles RTL, runs Q=50/75/95 + flat/checker/EXIF scenarios, merges .dat files, produces LCOV report.
python/generate_test_vectors.py Generates all simulation test vectors including yuyv_input.hex, yuyv_flat.hex (DC/EOB coverage), and yuyv_checker.hex (ZRL coverage).
python/gen_huffman_rom.py Regenerate the Huffman ROM initial block in rtl/huffman_encoder.v from the standard BITS/VALS arrays.
python/gen_lite_tables.py Regenerate the LITE_QUALITY quantisation table initial blocks in rtl/quantizer.v.
python/yuyv_convert.py Shared RGB-to-YUYV conversion for RTL simulation and hardware tests.
scripts/hw_test_mandrill.py End-to-end hardware verification through fcapz: converts mandrill 720p, runs RTL sim + HW encode, compares outputs.

Integration Example ↑ Top

mjpegzero_enc_top #(
    .IMG_WIDTH    (1920),
    .IMG_HEIGHT   (1080),
    .LITE_MODE    (0),         // 1 = fixed quality, 720p, ~47% fewer LUTs
    .LITE_QUALITY (95),        // Synthesis-time quality (1-100), lite mode only
    // Optional: EXIF APP1 segment
    .EXIF_ENABLE  (1),         // 0 = no EXIF (default)
    .EXIF_X_RES   (72),        // XResolution numerator (DPI)
    .EXIF_Y_RES   (72),        // YResolution numerator
    .EXIF_RES_UNIT(2),         // 2 = inch
    // Optional: RGB input path (set to 0 for standard YUYV input)
    .RGB_INPUT    (0)          // 1 = 24-bit {R,G,B} AXI4-Stream input
) u_mjpeg (
    .clk               (pixel_clk),        // 150 MHz
    .rst_n             (sys_rst_n),

    // Connect to video source (camera, framebuffer, etc.)
    .s_axis_vid_tdata  (video_tdata),       // 16-bit YUYV
    .s_axis_vid_tvalid (video_tvalid),
    .s_axis_vid_tready (video_tready),
    .s_axis_vid_tlast  (video_tlast),       // End of line
    .s_axis_vid_tuser  (video_tuser),       // Start of frame

    // Connect to DMA or output FIFO (no backpressure — always accept)
    .m_axis_jpg_tdata  (jpeg_tdata),        // 8-bit JPEG bytes
    .m_axis_jpg_tvalid (jpeg_tvalid),
    .m_axis_jpg_tlast  (jpeg_tlast),        // End of JPEG frame

    // Connect to AXI interconnect or tie off
    .s_axi_awaddr      (axi_awaddr),
    .s_axi_awvalid     (axi_awvalid),
    .s_axi_awready     (axi_awready),
    .s_axi_wdata       (axi_wdata),
    .s_axi_wstrb       (axi_wstrb),
    .s_axi_wvalid      (axi_wvalid),
    .s_axi_wready      (axi_wready),
    .s_axi_bresp       (axi_bresp),
    .s_axi_bvalid      (axi_bvalid),
    .s_axi_bready      (axi_bready),
    .s_axi_araddr      (axi_araddr),
    .s_axi_arvalid     (axi_arvalid),
    .s_axi_arready     (axi_arready),
    .s_axi_rdata       (axi_rdata),
    .s_axi_rresp       (axi_rresp),
    .s_axi_rvalid      (axi_rvalid),
    .s_axi_rready      (axi_rready)
);

Tested Hardware ↑ Top

Board Part Example project Status
Digilent Arty A7-100T XC7A100TCSG324-1 example_proj/arty_a7_100t/ Verilog and VHDL post-fcapz bitstreams close timing and pass Mandrill 720p HW test byte-exact
Digilent Arty S7-50 XC7S50CSGA324-1 example_proj/arty_s7_50/ Build scaffolded; rebuild + HW verification pending

Any AMD/Xilinx 7-Series device is a straightforward port — swap the XDC and adjust JPEG_WORDS for available BRAM.

Applications ↑ Top

  • Drone / UAV cameras — lightweight MJPEG stream over a low-bandwidth radio link
  • IP security cameras — per-frame JPEG over Ethernet, no inter-frame dependency
  • Machine vision — on-FPGA compression before USB/GigE transfer to host
  • Medical imaging — lossless-adjacent quality (Q95+) with intra-frame-only coding
  • Automotive — dashcam and surround-view recording with frame-accurate random access
  • Industrial inspection — compress high-speed line-scan data in real time
  • Broadcast contribution — MJPEG-over-RTP for low-latency studio feeds
  • Frame grabbers — capture and compress SDI/HDMI input on an FPGA capture card

Directory Structure ↑ Top

mjpegZero/
  rtl/              Synthesizable Verilog 2001 source
    vhdl/           Native VHDL-1993 encoder sources
    vendor/         Board-specific BRAM wrappers (AMD, Altera, Lattice, ...)
  sim/              SystemVerilog testbench and test vectors
  python/           Reference encoder, verification, test vector generation
  scripts/          Vivado TCL scripts and Python runner
  example_proj/     Ready-to-build board examples
    common/         Shared demo top-level + Python host (used by every board)
    arty_a7_100t/   Digilent Arty A7-100T (verified reference)
    arty_s7_50/     Digilent Arty S7-50 (rebuild + HW test pending)
  fcapz/            Git submodule: fpgacapZero EJTAG-AXI bridge + ELA + host
  build/            Synthesis/implementation output (generated)

Contributing ↑ Top

Contributions are welcome. See CONTRIBUTING.md for details.

The most impactful contributions are board-level examples that show the encoder running on hardware beyond the reference Arty A7-100T. All examples live under example_proj/<board_name>/. New examples for Nexys Video, ZedBoard, DE10-Nano, iCEBreaker, and others are welcome.

License ↑ Top

Apache License 2.0 + Commons Clause v1.0. See LICENSE for full terms.

Non-commercial use (research, education, hobby projects, open-source) is freely permitted under the Apache 2.0 terms.

Commercial use (integration into commercial products, services, or consulting engagements) requires written permission from the author. Contact: hello@bard0.com

About

Open source synthesizable MJPEG encoder written in behavioral Verilog 2001 with AXI interfaces, up to 1080p30 on low end AMD/Xilinx 7-Series FPGAs. Two operating modes: Full encodes with runtime quality control; Lite encodes with ~47% smaller LUT footprint and fixed synthesis-time quality.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors