Skip to content
View Mummanajagadeesh's full-sized avatar
🤖
Focusing
🤖
Focusing

Highlights

  • Pro

Organizations

@rignitc @cARMa360

Block or report Mummanajagadeesh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Mummanajagadeesh/README.md

Hallo Welt! This is Jagadeesh.

Portfolio Portfolio Blog Blog LinkedIn LinkedIn Instagram Instagram Discord Discord GitHub GitHub Email Email Reddit Reddit Old Site Old Site YouTube YouTube Cube Cube Duolingo Duolingo Google Play Games Google Play Games

$ less ~/workspace/profile/config.yaml

  FOCUS:         "learning so slow my blog can’t keep up"
KNOWS: "anything bw sand nd the entity thinking back at us rn"
ASKME: "stuff i know, or stuff i can google before u blink"
BUILDING: "hw accel arch for img classif; it can sǝǝ john cena"
CLUB: "feeding robots my gpa @rignitc"
COMPS: ["e-yantra MB", "openpower hw"]
MAJOR: "electronics nd communication engg"
COLLEGE: "national institute of technology calicut"
CONTACT: "carrier pigeons :p"
HOBBIES: ["speedcubing", "movies", "4k day dreaming"]
FUNFACT: "i'm discount batman"
SECRET: "i pay the moon to follow me around"
WHATIF: "if i eat myself, do I become twice as big or disappear?"
RickRoll GIF
GitHub Snake Game: snake eating contributions on green chart



$ cat ~/about_me.txt

Hey, I'm Jagadeesh - I work across VLSI design, robotics, and AI hardware. My work spans digital/analog design, circuit-level implementation, and low-level arch & optimization. Currently learning open-source IC design. I also work with embedded systems, MCUs, and SBCs, applying them in robotics and prototyping. Beyond that, I take part in comps, open-source projects, and hardware/software co-design challenges.

MY Key Interests:

  • VLSI Design: Analog Circuits & Digital Design
  • Computer Arch and Protocols
  • Neural Networks Hardware Acceleration
  • Microcontrollers & Electronics
  • Computer Vision, Sensor Fusion
  • Robotics, AMR Path Planning & Navigation

Feel free to check out my projects below, and if you’re interested in collaborating or discussing hardware, AI, or robotics, let’s connect!


$ ls ~/projects --filter=feat | All Projects

(Click sections below to expand)

Digital Design/Verfication & Compute Architectures
INT8 Fixed-Point CNN Hardware Accelerator and Image-Processing Suite | Link

Design of High-Performance Q1.7 Fixed-Point Quantized CNN Hardware Accelerator with Microarchitecture Optimization of 3-Stage Pipelined Systolic MAC Arrays for Lightweight Inference

A fully synthesizable INT8 CNN accelerator for CIFAR-10, built around a 3-stage pipelined systolic-array MAC microarchitecture optimized from a 6-design PPA benchmark. Includes complete quantization workflow (PTQ/QAT), 2-cycle ready/valid protocol, ROM automation, FP32↔RTL accuracy checks, and a hardware image-processing suite

" I tried to ImProVe, but NeVer really did - so I MOVe-d on ¯\_(ツ)_/¯ "

Current Project Overview

Duration: Individual, Ongoing
Tools: Verilog (Icarus Verilog, Yosys) | Python (TensorFlow, NumPy) | Scripting (TCL, Perl)


8-bit Quantized CNN Hardware Accelerator: Open-source, Modular, & Optimized for Inference

Project Link Verilog | Basic Architecture | Digital Electronics

  • Designed a fully synthesizable shallow Res-CNN for CIFAR-10, evaluated against eight reference CNNs, and achieved a Pareto-optimal trade-off across throughput, latency, and accuracy.

  • Implemented systolic-array processing elements using 8-bit CSA–MBE MAC units with FSM-driven control logic and a 2-cycle read/valid handshake. Verified end-to-end datapath behavior through structured testbenches.

  • Performed detailed post-training quantization (PTQ) and quantization-aware training (QAT) studies. Quantizing from Q1.31 to Q1.3 allowed exploration of precision vs. accuracy trends, and a Q1.7 PTQ model retained ~84% accuracy (less than 1% drop) while reducing the model memory footprint by 4× (≈52 kB).

  • Developed automated scripts (TCL + Python) to generate 14 coefficient ROMs and 3 RGB input ROMs, enabling seamless hardware ingestion of model parameters and images. Verified TensorFlow / FP32 ↔ RTL output consistency and automated full-pipeline inference execution.

  • Built a compact digital image-processing toolkit (edge detection, denoising, filtering, enhancement) and an MLP classifier for (E)MNIST datasets. Achieved >75% accuracy with real-time GUI visualization for interactive experimentation.


Technical Summary
  • Designed a fully synthesizable INT8 CNN accelerator (Q1.7 PTQ) for CIFAR-10, optimized for throughput, latency determinism, and precision efficiency. Implemented a 2-cycle ready/valid handshake for all inter-module transactions and FSM-based control sequencing for deterministic pipeline timing. Trained 8 CNNs (TensorFlow, identical augmentation & LR scheduling w/ vanilla Adam optimizer) and performed architecture-level DSE via Pareto analysis, selecting 2 optimal variants including a ResNet-style residual CNN.

  • PTQ/QAT comparisons were conducted across Q1.31, Q1.15, Q1.7, and Q1.3; Q1.7 PTQ (1-int, 7-frac | 0.0078 step) gave the best accuracy–memory trade-off { Q1.31 ~84% ~210kB | Q1.7 ~83% ~52kB | Q1.3 ~78% ~26kB }, achieving ~84% top-1 accuracy, <1% loss, and ≈52 KB total (≈17×3 KB RGB inputs)

  • The 3-stage pipelined systolic-array convolution core employs Processing Elements (PEs) built around MAC units composed of 8-bit signed Carry-Save Adders (CSA) and Modified Booth-Encoded (MBE) multipliers, arranged in a 2D grid for high spatial reuse and single-cycle accumulation. All 14 coefficient ROMs and 3 RGB input ROMs were auto-generated via a Python/TCL automation flow handling coefficient quantization, packing, and open-source EDA simulation. Verified bit-accurate correlation between TensorFlow FP32 and RTL fixed-point inference layer-wise; an IEEE-754 single-precision CNN variant validated numeric consistency.

  • Integrated image-processing modules (edge detection, denoising, filtering, contrast enhancement) form a Verilog-based hardware preprocessing pipeline, feeding an MLP classifier evaluated on the (E)MNIST (52+)10 ByClass datasets. The MLP shares the preprocessing and automation flow, with an additional IEEE-754 64-bit FP variant for precision benchmarking. A Tkinter GUI enables interactive character input, and preprocessing visualization via Matplotlib


High-Speed 3-Stage Pipelined Systolic Array-Based MAC Architectures

Digital Logic Design | Synthesis

  • Benchmarked six different 8-bit signed adder–multiplier architectures using PPA metrics (latency, throughput, and area) on the Sky130 PDK, and analyzed their architectural trade-offs in the context of low-power convolution workloads.

  • Designed a 3-stage pipelined systolic MAC based on a CSA–MBE multiplier structure, achieving substantial improvements over a naïve conv3 baseline: 66.3% lower delay, 3.1× higher area efficiency, and 82.2% lower typical power.

  • Implemented a 2D systolic PE-grid supporting general convolution and GEMM operations, with verified behavior under zero-padding and same-padding configurations. GEMM optimization reduced power consumption by 44.6% for matrix dimension (N = 3).

  • Integrated a 648-bit scan chain spanning all pipeline and control registers, enabling full DFT/ATPG support with only 14.5% cell-area overhead, ensuring manufacturability and high test coverage.


Technical Summary
  • Benchmarked six 8-bit signed adder and multiplier architectures for systolic-array MACs targeting CNN/GEMM workloads using a fully open-source ASIC flow (Yosys + OpenROAD/OpenLane) on the Google-SkyWater 130nm PDK (Sky130HS PDK @25°C_1.8V). Evaluated PPA (Power, Performance, Area) and latency/throughput/area metrics under a constant synthesis and layout environment with fixed constraints and floorplan parameters (FP_CORE_UTIL = 30 %, PL_TARGET_DENSITY = 0.36, 10 ns clock, CTS/LVS/DRC/Antenna enabled)

  • Adders:

    • CSA – 5.07 ns CP, 197 MHz Fmax, 2.52 k µm² core, 0.083 mW (best speed/resource trade-off)
    • Kogge–Stone – 6.21 ns, 161 MHz, 3.69 k µm² (area-heavy)
    • RCA – 7.14 ns, 140 MHz, 1.13 k µm², 0.032 mW (most power/area-efficient)
  • Multipliers:

    • MBE – 8.84 ns, 113 MHz, 9.6 k µm², 0.379 mW (best energy/area efficiency = 3.35 × 10³ pJ/op, 8.64 × 10³ ops/s/µm²)
    • Baugh–Wooley – 8.63 ns, 115.9 MHz (fastest)
    • Booth (Radix-2) – 12.5 ns, 80 MHz (highest area/power)
  • Final MAC integrates an 8-bit signed CSA adder and 8-bit signed MBE multiplier in a 3×3 convolution/GEMM core using a 3-stage pipelined systolic array (sampling → truncation/flipping → MAC accumulation). Verified via RTL testbench and post-synthesis timing across zero/same-padding modes. Automated GDS/DEF generation and PPA reporting for all architectures ensured fully reproducible, environment-consistent results

  • Comparative study (Small Scale Ops) :

    • CSA-MBE pair Systolic Array Conv vs Naïve Conv (3x3 Kernel on 5x5 Image @CLK_PERIOD_20_ns)

      • Latency ↓ 66.3% Throughput ↑ 196.6% Speed ↑ 196.6% Area ↓ 67.8% Power ↓ 82.2%
    • Single MAC re-use vs Systolic 4-PE Grid (2x2 matrix multiplication)

      • Latency ↓ 0.9% Throughput ≈ same Speed ≈ same Area ≈ same Power ↓ 15% Energy/op ↓ 14.6%
    • Single MAC re-use vs Systolic 9-PE Grid (3x3 matrix multiplication)

      • Latency ↓ 1% Throughput ≈ same Speed ≈ same Area ≈ same Power ↓ 44.6% Energy/op ↓ 44%

DFT (Scan Chain) Add-On

  • Integrated a 648-bit full-scan chain across all pipeline, control, and output registers in the 3×3 systolic MAC/convolution datapath. Every state element is replaced with a single-bit scan DFF (SE/SI/SO), enabling serial load/unload of internal state.

  • Flattened register groups (kernel flip buffer, px16/px8 slices, ker8 slices, prod_s2 pipeline, row/col counters, valid pipe, output registers, done flag) into one contiguous chain, ensuring deterministic bit ordering and simple scan stitching.

  • Verified scan behavior through shift–capture–shift TB patterns and ensured functional transparency: scan mode (SE=1) freezes datapath updates, while normal mode (SE=0) preserves original functionality.

  • Total overhead from scan insertion is ~14.5% in standard cells, with no change to functional timing or systolic pipeline throughput at the target clock.


Repositories

Verilog for Image processing and Simulation-based Inference Of Neural Networks Verilog for Image processing and Simulation-based Inference Of Neural Networks

ViSiONVerilog for Image Processing and Simulation-based Inference Of Neural Networks

This repo includes all related projects as submodules in one place




Image processing algorithms Image processing algorithms NEural NEtwork on VERilog | MLP for (E)MNIST | CNN for CIFAR10 NEural NEtwork on VERilog | MLP for (E)MNIST | CNN for CIFAR10

ImProVeIMage PROcessing using VErilog: A collection of image processing algorithms implemented in Verilog, including geometric transformations, color space conversions, and other foundational operations.

NeVerNEural NEtwork on VERilog: A hardware-implemented MLP in Verilog for character recognition on (E)MNIST, alongside a lightweight CNN for CIFAR-10 image classification


MOVeMath Ops in VErilog

Verilog-based implementation of the CORDIC algorithm for efficient estimation of mathematical functions Verilog-based implementation of the CORDIC algorithm for efficient estimation of mathematical functions Systolic array of MAC PEs with Booth multipliers and carry-save adders, supporting both GEMM and 3×3 CNN convolutions for hardware-accelerated deep learning and linear algebra Systolic array of MAC PEs with Booth multipliers and carry-save adders, supporting both GEMM and 3×3 CNN convolutions for hardware-accelerated deep learning and linear algebra
Implements and compares 8-bit multipliers and 16-bit adders in synthesizable Verilog, analyzing their area, timing, and power characteristics in MAC datapath architectures Implements and compares 8-bit multipliers and 8-bit adders in synthesizable Verilog, analyzing their area, timing, and power characteristics in MAC datapath architectures Posit number system: An alternative to IEEE 754 for efficient arithmetic Posit number system: An alternative to IEEE 754 for efficient arithmetic


  • CORDIC Algorithm – Implements Coordinate Rotation Digital Computer (CORDIC) algorithms in Verilog for efficient hardware-based calculation of sine, cosine, tangent, square root, magnitude, and more.

  • Systolic Array Matrix Multiplication – Verilog implementation of matrix multiplication using systolic arrays to enable parallel computation and hardware-level performance optimization. Each processing element leverages a Multiply-Accumulate (MAC) unit for core operations.

  • Hardware Multiply-Accumulate Unit – Implements and compares 8-bit multipliers and 8-bit adders in synthesizable Verilog, analyzing their area, timing, and power characteristics in MAC datapath architectures.

  • Posit Arithmetic (Python) – Currently using fixed-point arithmetic; considering Posit as an alternative to IEEE 754 for better precision and dynamic range. Still working through the trade-off.


Storage and Buffer Modules

  • RAM1KB – A 1KB (1024 x 8-bit) memory module in Verilog with write-once locking for even addresses. Includes a randomized testbench. Also forms the base for a ROM3KB variant to store 32×32 RGB CIFAR-10 image data.

  • FIFO Buffer – Not started. Planned as a synchronous FIFO with fixed depth, single clock domain, and standard full/empty flag logic.


RISC-V & MIPS Microarchitectures - SC / MC / Pipelined / Dual-Issue Superscalar | Link

RV32I RISC-V Core (TL-Verilog, Single-Cycle Implementation)

Tools: Makerchip | TL-Verilog | Verilator

  • Implemented a RV32I single-stage core supporting all base integer instructions across I, S, B, U, J, and R formats.
  • Designed a 32×32 register file with dual-read / single-write ports, enforcing x0 = 0 invariance.
  • Implemented ALU operations covering arithmetic, logical, shift, and compare paths with correct immediate decode (opcode / funct3 / funct7).
  • Verified via a test program summing integers 1–9, completing within ~50 cycles, updating pass/fail status in x30 and x31.
  • Integrated interactive simulation support via m4+cpu_viz(), enabling cycle-accurate register/memory visualization.

RV32I RISC-V Core (Verilog, Single-Cycle Implementation)

Tools: Verilog | Icarus Verilog | ModelSim | Quartus Prime

  • Designed a 32-bit single-cycle RV32I core with modular datapath: ALU, control, immediate generator, PC logic, instruction & data memories.
  • Implemented all 38 base instructions including full load/store support: LB, LBU, LH, LHU, LW, SB, SH, SW, with correct zero/sign extension and RMW correctness.
  • Verified correctness using self-checking ModelSim testbenches and synthesized the core on Quartus Prime.

MIPS Microarchitectures - SC / MC / 5-Stage Pipeline

Tools: Verilog | Icarus Verilog | ModelSim | GTKWave

  • Implemented three 32-bit MIPS processors:

    • Single-Cycle: CPI = 1.0, PC increments by +4 each cycle.

    • Multi-Cycle: Instruction class cycle counts:

      • R-type: 4 cycles
      • I-type arithmetic: 4 cycles
      • Load: 5 cycles
      • Store: 4 cycles
      • Branch: 3 cycles
      • Jump: 3 cycles → Benchmark CPI ≈ 4.1
    • Pipeline (IF–ID–EX–MEM–WB): forwarding, hazard detection, 1-cycle load-use stall, 1-cycle taken-branch flush. → Benchmark CPI ≈ 1.1–1.2

  • Executed Harris & Harris benchmark (18 instructions). Correctly wrote 0x00000007 to memory addresses 0x50 and 0x54.

  • Included self-checking benches, .mem loading infrastructure, full waveforms, and verification logs.


Dual-Issue 16-bit RISC Superscalar Processor (In-Order)

Tools: Verilog | Icarus Verilog | GTKWave

  • Implemented a two-wide in-order superscalar processor with parallel IF–ID–EX–MEM–WB lanes and independent pipeline registers per lane.
  • Instruction fetch returns 32 bits = 2×16-bit instructions; dependency checks suppress lane-1 when hazards exist.
  • Register file: 4 read ports + 2 write ports, with r0=0 hardwired.
  • Hazard handling: RAW/WAW detection, load-use stall insertion, inter-lane dependency checks, branch squashing.
  • Memory system: multi-port (three-port ARAM) enabling simultaneous instruction fetch + data access.
  • Verified using program that sums 1–10 → final register value r1 = 0x0037 (55) with correct inter-lane suppression due to true dependencies.


Repositories
MIPS-RISC-Microarchitectures: Single-cycle, multi-cycle, and pipelined CPU implementations with verification and CPI benchmarking MIPS-RISC-Microarchitectures: Single-cycle, multi-cycle, and pipelined CPU implementations with verification and CPI benchmarking 16-bit Dual-Issue Superscalar RISC Processor: Two-lane in-order pipeline, forwarding, hazard detection, multi-port memory 16-bit Dual-Issue Superscalar RISC Processor: Two-lane in-order pipeline, forwarding, hazard detection, multi-port memory RoSe RV32I Processor: TL-Verilog and Verilog RISC-V single-cycle CPU implementations RoSe RV32I Processor: TL-Verilog and Verilog RISC-V single-cycle CPU implementations
Fixed-Point CORDIC Trigonometric Soft-Core IP | Link

A synthesizable CORDIC-based trigonometric IP supporting sin, cos, and tan evaluation using a parameterized fixed-point rotation core. Includes standalone wrappers, arctan lookup ROM, and a verification environment with angle sweeps and floating-point correlation.


CORDIC Trigonometric Soft IP - Parametric Fixed-Point Core + Wrapper Functions

Duration: Individual
Tools: Verilog | Icarus Verilog | FuseSoC

  • Implemented a fully synthesizable fixed-point CORDIC core (iterative rotation mode) supporting configurable datapath width, iteration count, and shift-add update logic.
  • Created sin, cos, tan wrappers around the base CORDIC core, each applying pre-scaled initial vectors to cancel CORDIC gain and produce Q-format outputs (Q1.15 for sin/cos, Q3.28 for tan).
  • Integrated a precomputed 16-entry atan(2⁻ᵢ) ROM table using signed 32-bit constants in Q3.29 format for angle accumulator updates.
  • Designed the core using shift-add micro-rotations only (no multipliers), enabling low-resource FPGA use and ASIC DSP embedding.
  • Verification testbench sweeps angles from −1.5 to +1.5 rad in 0.1 increments, comparing fixed-point outputs with double-precision math.
    • Max error: sin ≈ 3.9×10⁻⁵, cos ≈ 4.5×10⁻⁵
    • RMS error: sin ≈ 2.0×10⁻⁵, cos ≈ 2.8×10⁻⁵
    • tan diverges as expected near ±π/2 (max ≈ 6.10).
  • Supports FuseSoC integration for one-command builds and simulation.

Technical Summary

The CORDIC soft IP implements a configurable micro-rotation datapath for evaluating trigonometric functions in fixed-point arithmetic using only shift and add operations.
Each iteration updates the state $(x_i, y_i, z_i)$ based on the rotation direction $d_i$.

The core update equations are:

$x_{i+1} = x_i - d_i , (y_i \gg i)$

$y_{i+1} = y_i + d_i , (x_i \gg i)$

$z_{i+1} = z_i - d_i , \arctan(2^{-i})$

The rotation direction is defined by:

$d_i = +1$ if $z_i \ge 0$
$d_i = -1$ if $z_i &lt; 0$

This produces monotonic convergence of $z_i$ toward zero and rotates the state vector toward the target angle with deterministic iteration count and timing.

The internal datapath uses 32-bit signed fixed-point values for $x$, $y$, and $z$. The wrappers apply standardized output formats:

  • sin/cos outputs: Q1.15
  • tan output: Q3.28

This maintains high internal precision and ensures typical sine/cosine error below $4 \times 10^{-5}$.

The implementation includes a 16-entry arctangent ROM storing:

$\arctan(2^{-i})$ encoded in Q3.29

for $i = 0 \dots 15$. Values range from large initial angles down to small micro-rotations.

CORDIC introduces a scale factor

$K_N = \prod_{i=0}^{N-1} \sqrt{1 + 2^{-2i}}$

which is cancelled by choosing pre-scaled starting values:

$x_0 = 1 / K_N$
$y_0 = 0$
$z_0 = \theta_{\text{input}}$

The core exposes three wrapper modules providing:

  • sin: final $y_N$
  • cos: final $x_N$
  • tan: extended-precision $(y_N / x_N)$ in Q3.28

Verification sweeps 31 angles from −1.5 to +1.5 radians in 0.1 increments.
Observed errors:

  • sin max error: $3.9 \times 10^{-5}$
  • cos max error: $4.5 \times 10^{-5}$
  • tan diverges near $\pm \pi/2$ as expected
  • RMS errors: sin ≈ $2 \times 10^{-5}$, cos ≈ $2.8 \times 10^{-5}$

The IP is packaged as a portable RTL component with FuseSoC metadata supporting automated builds, simulation, and SoC integration.


Repository

CORDIC Algorithm Verilog Implementation - Fixed-point soft IP core for sin/cos/tan with wrappers and verification CORDIC Algorithm Verilog Implementation - Fixed-point soft IP core for sin/cos/tan with wrappers and verification

Peripheral Serial Communication Protocols - I2C / SPI / UART-TX Link

Implemented a collection of peripheral serial communication interfaces in Verilog, focusing on synthesizable, parameterizable controllers suitable for FPGA/ASIC integration. Each protocol includes a cleanly modularized interface, configurable timing parameters, and testbench-driven validation with waveform inspection.

  • I2C Master Controller – Implements a single-master, multi-slave I²C bus supporting standard-mode timings, programmable SCL low/high periods, ACK/NACK handling, and clock stretching detection. Features deterministic START/STOP generation, byte-wise transfers, and address+data framing logic.

  • SPI Master (Modes 0–3) – Supports CPOL/CPHA mode selection, 8-bit full-duplex transfers, configurable SCLK division, selectable slave-select behavior, and MSB-first shifting. Designed with a compact FSM and separate TX/RX shift registers for predictable cycle behavior.

  • UART TX Soft-Core IP – Lightweight serial transmitter with baud-rate generator, start/stop framing, data-valid gating, and FIFO-less single-byte serialization. Fully synthesizable and intended as a drop-in peripheral for SoCs, teaching cores, or FPGA peripheral sets.


Repositories
SPI Protocol Verilog Implementation: CPOL/CPHA modes, full-duplex 8-bit transfers SPI Protocol Verilog Implementation: CPOL/CPHA modes, full-duplex 8-bit transfers I2C Protocol Verilog Implementation: Single-master controller with clock stretching I2C Protocol Verilog Implementation: Single-master controller with clock stretching UART TX Soft-Core IP: Parameterized baud generator and 8-bit transmitter UART TX Soft-Core IP: Parameterized baud generator and 8-bit transmitter
Basic Python Tool for ISCAS’85/’89 Benchmark Analysis & Fault-Modeling | Link

A work-in-progress open-source Python package implementing foundational DFT/Fault-modeling utilities for ISCAS’85 and ISCAS’89 benchmark circuits.
Includes automatic Verilog netlist generation, random testbench creation, serial/parallel fault simulation, fault collapsing, SCOAP metric computation, and initial ATPG experimentation. PODEM implementation is under development.

  • Generates structural Verilog (.v) files and matching randomized testbenches from ISCAS netlists; injects stuck-at faults using ID-indexed assign overrides appended to the generated modules.
  • Implements serial fault simulation and parallel bit-packed fault simulation, enabling coverage estimation under random vector sets.
  • Automated reporting: coverage tables, detected/undetected fault lists, fault dictionaries, and comparison across vector batches.
  • Supports fault collapsing using dominance & equivalence relations; identifies FFR-based partitions and reduces fault sets prior to simulation.
  • Computes SCOAP controllability (CC0/CC1) and observability (CO) metrics for every internal node, enabling analysis of circuit hard-to-control or hard-to-observe regions.
  • Initial ATPG experiments performed using SCOAP-guided heuristics; current PODEM implementation is incomplete due to recursion termination issues being debugged.
  • Planning to add scan-chain insertion workflows for ISCAS’89 sequential circuits, enabling full-scan ATPG comparisons with random simulation.

Repositories
ISCAS8X Benchmark Fault-Tooling Package (light) ISCAS8X Benchmark Fault-Tooling Package (dark)
FIR Accelerator for Microwatt | OpenPOWER Hardware Hackathon Link

A parameterizable FIR filter acceleration block designed for Microwatt/OpenFrame systems. Implements a sequential multiply accumulate datapath, programmable coefficients, and a clean Wishbone-Lite register interface for CPU control. The design was accepted for potential fabrication during the hackathon review. The final taped-out submission was not completed due to timing constraints.


FIR Accelerator - Sequential Fixed-Point Filtering Unit

Duration: Hackathon submission Tools: Verilog | Icarus Verilog | Microwatt/OpenFrame

  • Implements an N-tap FIR filter using time-multiplexed fixed-point multiply accumulate updates.
  • Uses a shift register sample buffer and a coefficient register file loaded by Wishbone writes.
  • Performs one multiply-add per cycle until all taps have been accumulated.
  • Produces a fixed-width output using deterministic saturation or truncation.
  • CPU interacts entirely through a compact CTRL / STATUS / DATAOUT register set.
  • Optional FIFO allows input samples to be queued for continuous operation.
  • Designed to be synthesizable for FPGA and Sky130 ASIC flows.

Technical Summary

The FIR accelerator directly implements the discrete-time convolution

$y[n] = \sum_{k=0}^{N-1} h[k] , x[n-k]$

using a cycle-by-cycle accumulation loop driven by an internal finite-state controller.

Internal State Representation

Let

  • $X_k = x[n-k]$ sample shift register contents
  • $H_k = h[k]$ coefficient memory entries
  • $A_i$ accumulator after processing tap $i$

Sequential MAC Update

Each tap contributes $A_{i+1} = A_i + X_i \cdot H_i$.

At the start of each computation $A_0 = 0$.

Output Formation

After all $N$ taps have been processed the output is $y[n] = \text{sat}_W(A_N)$, where $\text{sat}_W(\cdot)$ denotes saturation to the configured output bit width $W$.

Control Semantics

  • CPU writes coefficients individually to indexed locations.
  • CPU writes input sample to the input register or FIFO.
  • CPU asserts START.
  • Core performs exactly $N$ MAC iterations.
  • OUT_VALID is raised when the accumulator completes.
  • CPU clears START to reset the internal DONE condition.

The number of cycles per output is exactly $N + c$, where $c$ is a small constant FSM overhead.

Fixed-Point Behavior

Let input samples use format $Q_d$ and coefficients use $Q_c$. The accumulator width satisfies

$A_W \ge d + c + \lceil \log_2 N \rceil$

to avoid intermediate overflow before the final saturation step. No dynamic scaling or normalization is applied; the arithmetic is strictly linear.


Repository

OpenPOWER FIR Accelerator Hackathon Submission (light) OpenPOWER FIR Accelerator Hackathon Submission (dark)


ASIC RTL2GDS Flow Projects | Link 1 | Link 2

A collection of beginner-friendly ASIC design experiments where I am still learning the full RTL to GDS flow. I automated synthesis steps, generated schematic views, pushed complete flows through OpenLane and OpenROAD, produced layout snapshots, and verified the final GDS outputs. Both designs behave as compact test vehicles for understanding physical design stages and polishing my flow setup.


ASIC RTL2GDS Practice Projects

Duration: Individual Tools: Yosys | OpenLane | OpenROAD | Magic | KLayout

  • Built small RTL blocks and pushed them through end-to-end ASIC flows using Sky130 PDK.
  • Automated synthesis scripts to consistently generate mapped netlists, hierarchy views, and clean schematic diagrams.
  • Drove OpenLane through floorplanning, placement, routing, and signoff runs, adjusting config files and verifying logs as I progressed.
  • Produced DEF and GDS files, inspected them in KLayout and Magic, and validated the generated layouts.
  • Exported final visuals, converted SVGs to PNGs, and organized the repository for easy inspection.
  • Used both projects as training grounds to understand timing reports, flow stages, and PDN setup.
  • Overall goal is learning, improving, and building confidence with complete ASIC implementation flows.

Repository Cards
Ripple Carry Adder 4-bit Ripple Carry Adder 4-bit Inverter RTL2GDS Inverter RTL2GDS

SHA256 Core Functional Verification | Link

A focused functional verification environment built around the open-source SHA256 core from secworks. I created a structured suite of directed, random, corner-case, and negative fail-case tests, automated through a TCL-driven flow that compiles, runs, checks, and aggregates results. The goal was to push the core through a wide coverage surface, verify digest correctness, exercise interface timing, and validate the robustness of the verification infrastructure itself.


Verification Summary

Duration: Individual Tools: Verilog | Icarus Verilog | TCL automation

  • Implemented a unified verification environment with fully self-checking testbenches.
  • Automated all compilation and simulation using TCL so that every test runs in a single command.
  • Verified correct digest generation for standard, multi-block, random, and corner-case stimuli.
  • Stressed control behavior by injecting malformed sequences, undefined values, and reversed block ordering.
  • Collected per-test logs and a consolidated summary to rapidly detect regressions.
  • Treated testbench infrastructure as a verification target by validating mismatch detection and protocol violation handling.

Technical Summary

The SHA-256 core is verified directly against the mathematical definition of its compression function.
Each testbench checks that the DUT produces digests matching the iterative hashing rule

$H^{(0)} = H_{IV}$

and for each message block $M^{(i)}$:

$H^{(i+1)} = \mathcal{F}(H^{(i)}, M^{(i)})$

Round Computation

Each round computes two temporary values:

$T_1 = h + \Sigma_1(e) + Ch(e,f,g) + K_t + W_t$

$T_2 = \Sigma_0(a) + Maj(a,b,c)$

Then the working registers update as:

$h = g$
$g = f$
$f = e$
$e = d + T_1$
$d = c$
$c = b$
$b = a$
$a = T_1 + T_2$

Message Schedule

For each block the scheduler generates:

$W_t = M_t \quad (t &lt; 16)$

$W_t = \sigma_1(W_{t-2}) + W_{t-7} + \sigma_0(W_{t-15}) + W_{t-16} \quad (t \ge 16)$

Verification Focus

The environment validates:

  • correct initialization of internal hash state
  • correct update propagation through all 64 rounds
  • multi-block chaining behavior for long messages
  • final digest stability when digest_valid asserts
  • robustness under random, corner-case, and adversarial stimuli

TCL Automation

The TCL flow executes a deterministic verification loop:

  1. compile all testbenches
  2. run all simulations
  3. compare expected vs actual digests
  4. merge logs into a single summary report

This provides a repeatable regression pipeline for SHA-256 functional verification.


Verification Coverage

  • Single-block digest generation for standard vectors
  • Multi-block message chaining and state propagation
  • Randomized 512-bit stimuli to probe broad input space behavior
  • Corner patterns including all-zero, all-one, and alternating data
  • Mode selection behavior across supported operational modes
  • Protocol timing correctness for control signals
  • Intentional failure injection to test mismatch handling and robustness

Automation Workflow

The TCL flow executes the entire suite:

compile all tbs
run all tests
collect logs
summarize results

This produces structured outputs with test names, expected vs actual digests, and overall regression status.


Repository Card

sha256 verification repository sha256 verification repository


Analog Circuits & Device-Level Design
Device Modeling using Sentaurus TCAD | Link

Performed semiconductor device modeling using Synopsys Sentaurus for foundational structures including N-type resistors, PN diodes, and NMOS transistors. Explored how doping profiles, junction depths, geometry parameters, and physical models impact device characteristics through calibrated simulations and scripted workflows.

Overview

  • Built parameterized device structures (concentration profiles, implant energies, lateral/vertical dimensions) using Sentaurus Structure Editor and process definition files.
  • Configured Sentaurus Device with transport and recombination models (SRH, Auger, mobility models, incomplete ionization where relevant) to study semiconductor behavior under applied bias.
  • Automated simulation runs in Sentaurus Workbench using command-based .cmd and .des scripts for sweeping doping levels, voltages, and geometry parameters.
  • Analyzed simulation output with Sentaurus Visual/Inspect, examining electrostatic potential maps, electron/hole concentration distributions, electric field intensity, and I–V characteristics.
  • Extracted device metrics such as diode forward/reverse characteristics, NMOS transfer/output curves, threshold behavior, and resistance scaling for the N-type resistor.

Repository

TCAD Projects - Sentaurus device modeling for N-resistor, PN diode, NMOS with doping/geometry parameterization TCAD Projects - Sentaurus device modeling for N-resistor, PN diode, NMOS with doping/geometry parameterization

CMOS Inverter Layout (Magic VLSI) & Ngspice Simulation | Link

A complete CMOS inverter implementation built using Magic VLSI (SCMOS) for physical layout and ngspice for extracted-device simulation.
Covers device construction rules under the SCMOS process, physical layout of PMOS/NMOS devices, contact/tap structures, parasitic-aware extraction, and transient analysis of inverter switching characteristics.

The layout follows the SCMOS ruleset:

  • PMOS implemented inside an n-well using p-diffusion; body tied to the well tap (VDD).
  • NMOS implemented directly in the p-substrate using n-diffusion; body tied to substrate tap (GND).
  • Poly crossing active regions forms the MOS channel; poly, metal1, and contact stack-up follows SCMOS vertical connectivity.
  • Metal1 routes input/output rails; taps ensure reverse-biased junctions and latch-up prevention.

Extraction produces a transistor-level .spice netlist including geometry-derived parasitics.
Transient simulation evaluates:

  • Static noise margins and switching point displacement due to device sizing.
  • Rise/fall asymmetry from mobility difference (μₙ ≫ μₚ).
  • Output slew vs. load capacitance and PMOS/NMOS drive ratio.
  • Propagation delays under 1.8 V operation using level-1 MOS models.

The repository includes the Magic layout (.mag), extracted netlists, wrapper files for stimulus, and generated ngspice waveforms.


Repository
CMOS Inverter Magic + ngspice Layout/Simulation (light) CMOS Inverter Magic + ngspice Layout/Simulation (dark)

Two-Stage CMOS Operational Amplifier with Miller Compensation | Link

A two-stage CMOS op-amp designed in TSMC 180 nm, using an NMOS differential input pair with PMOS current-mirror load, followed by a common-source second stage. Frequency compensation is implemented using a Miller capacitor between the first-stage output and the second-stage output node, producing dominant-pole behavior and stable unity-gain operation.

Device dimensions were set from closed-form analog constraints:

  • Slew-rate requirement → tail bias current and overdrive allocation
  • GBW requirement → input-pair transconductance and CC relationship
  • ICMR bounds → saturation margins for the differential pair and tail device
  • Output swing → overdrive and saturation limits for the second stage
  • Pole-splitting → ratio gₘ₆/CC and non-dominant pole placement

Simulation results:

  • Open-loop gain: ~53.1 dB
  • Unity-gain bandwidth: ~4.35 MHz
  • Dominant pole: ~9.6 kHz
  • Phase margin: ~60° with Miller compensation
  • Slew rate: ~10 V/µs from Ibias/CC
  • Output swing: ~0.14 V to ~1.03 V (linear region, no distortion at 1 kHz)
  • CMRR: ~32 dB
  • PSRR: +64.6 dB / –80.8 dB
  • Power consumption: ~1 mW with ±2.5 V rails

Operating-point analysis confirms all MOS devices remain in saturation with expected overdrive values, and both transient and AC characteristics match analytical pole/zero predictions for a Miller-compensated two-stage topology.


Repository

Two-Stage CMOS Op-Amp Repository Card (light mode) | Design and Analysis of Two-Stage CMOS Op-Amp with Miller Compensation Two-Stage CMOS Op-Amp Repository Card (dark mode) | Design and Analysis of Two-Stage CMOS Op-Amp with Miller Compensation

5-Stage CMOS Ring-Oscillator VCO | Link

A 5-stage CMOS inverter ring used as a voltage-controlled delay line, producing oscillation whose frequency scales with the control voltage. A 3-stage buffer isolates the oscillator core and restores the internal sine-like waveform into a full-swing CMOS square wave.

The oscillator operates from 0.7–3.0 V control input and shows a monotonic delay reduction with increasing drive strength.

Measured characteristics

  • Frequency range: 0.724–1.93 GHz
  • Linear KVCO region: ~2.1 GHz/V for 0.7–1.2 V
  • Frequency saturation: begins above ~1.8 V as inverter delay approaches its minimum
  • Core waveform: ~0.3–1.7 V swing with rounded edges
  • Buffered output: 0–1.8 V square wave, ~50% duty cycle
  • Startup time: ~0.5–0.8 ns to reach steady oscillation
  • Simulation sweep: confirmed monotonic f–V relation and early compression through parametric input stepping

Frequency points

Vctrl (V) f (GHz)
0.7 0.724
0.8 1.107
1.0 1.59
1.2 1.76
1.5 1.88
2.0 1.92
2.5 1.928
3.0 1.9298

Repository

Ring Oscillator VCO Repository Card (light mode) | Design and SPICE simulation of a 5-stage CMOS inverter-based ring VCO with buffered output and multi-GHz tunability Ring Oscillator VCO Repository Card (dark mode) | Design and SPICE simulation of a 5-stage CMOS inverter-based ring VCO with buffered output and multi-GHz tunability

Analog Function Generator with Adjustable Amplitude/Offset/Phase | Link

A multi-waveform analog function generator built using discrete op-amp blocks (TL082), passive RC networks, and a CD4051 analog multiplexer. The generator produces sine, square, and triangular outputs and exposes continuous control of amplitude, DC offset, and phase. Additional AM/PM blocks and a relaxation-oscillator VCO extend the system for modulation experiments.

The signal path is fully modular-each block is buffered to avoid inter-stage loading errors, enabling predictable behavior across a 1 kHz–500 kHz operating band.

Measured characteristics

  • Waveforms: sine, square (<200 ns rise/fall), triangle
  • Frequency range: ~1 kHz → 500 kHz (Wien-bridge tuned)
  • Amplitude control: ±10 V
  • DC offset range: ±5 V
  • Phase control: 0°–160° (first-order all-pass)
  • Square-wave performance: clean CMOS-level transitions, rise/fall < 200 ns
  • Triangular output: linear ramps from integrator with controllable slope
  • Waveform switching: CD4051 mux with low ON-resistance routing
  • Hardware validation: TI ASLK Pro bench + LTspice simulations
  • Modulation: AM/PM blocks implemented as additive/multiplicative stages
  • VCO: relaxation-oscillator variant providing voltage-to-frequency behavior

Signal-generation architecture

  • Wien-bridge core → low-distortion sine
  • Schmitt trigger → rail-to-rail square
  • Op-amp integrator → triangular
  • CD4051 multiplexer → waveform selection
  • Offset summer → adjustable vertical shift
  • RC all-pass → continuous phase control
  • Unity-gain buffers → isolate every stage and preserve amplitude accuracy

Representative measurements

  • Sine output distortion minimal across most of the band; clean 1.55 kHz fundamental (LTspice + CRO)
  • Square-wave rise/fall < 200 ns across load conditions
  • Triangle linearity maintained through full amplitude range
  • Phase shift examples captured at 64°, 90°, and ~162° using tuned RC values
  • Offset correctness demonstrated for 0 V, +1 V, –1 V injected shifts

Repository

Function Generator Repository Card (light mode) | Basic implementation of a Function Generator that can generate sine, square, and triangular waves with amplitude, phase, and DC shift modulations Function Generator Repository Card (dark mode)

Precision PID Controller Design using Operational Amplifiers | Link

An analog PID controller built using high-linearity op-amps (LT1007 / TL082) and RC networks, implemented entirely in continuous time and validated through LTspice. The design focuses on stable low-frequency integration, controlled differentiation without noise peaking, and diode-based output limiting for robust transient behavior.

Two complete controller variants were implemented-one minimal, one extended with gain scaling and anti-windup.

Measured / designed characteristics

  • Differential stage: unity-gain differential amplifier with high CMRR for clean error sensing

  • Integrator: 10 ms time constant →

    • (K_i \approx 100\ \text{s}^{-1})
    • (f_c \approx 16\ \text{Hz})
    • Loop-gain boost ≈ 9.5 dB
  • Derivative network: RC shaping with controlled high-frequency roll-off to prevent noise amplification

  • Output swing protection: diode clamps maintaining bounded actuation signal under large transients

  • Op-amp choices: LT1007 for low noise and precision; TL082 as a low-cost, wide-bandwidth alternative

  • Simulation: full closed-loop Bode, transient, load-step and saturation recovery tests in LTspice

Second PID variant

  • 10× front-end gain for small-signal plant feedback
  • Dual-integrator configuration for deeper low-frequency suppression
  • Anti-windup: diode shunts + soft-limiting network to prevent integrator runaway
  • Stable recovery under saturation and high-error conditions

Design intent

  • preserve linearity and phase margin across low-frequency operation
  • condition derivative action to avoid overshoot due to high-frequency noise
  • offer two architectures: a clean textbook PID and a high-authority PID with controlled limiting

Repository

PID Controller Repository Card (light mode) PID Controller Repository Card (dark mode)

Robotics and ML
ANAV for Martian Surface Exploration / GNSS-Denied Environments (ISRO IRoC-U 2025) | Link

A sub-2 kg autonomous quadrotor designed for GNSS-denied navigation, visual–inertial localization, mapping, and safe-zone landing, using onboard compute, stereo sensing, and redundant measurement sources.

Duration: Team-Based (ISRO RIG), Ongoing
Tools: Jetson Nano | Pixhawk 4 | RealSense D435i | ESP32 (ESP-Now) | ORB-SLAM3 | VINS-Fusion | ROS2


Autonomous Quadrotor for GPS-Denied Operation

  • Built a <2 kg quadrotor integrating Jetson Nano for onboard processing and Pixhawk 4 for attitude/stability, targeting GNSS-denied missions requiring drift-constrained localization and controlled landing.
  • Completed ESC calibration, thrust-balancing, and regulated 5 V / 3 A power distribution using BEC modules for stable sensor/compute operation under load variations.
  • Integrated barometer, optical flow, and stereo-IMU sensing for multi-source position estimation with fallbacks against low-texture drift.
  • Fused RealSense D435i stereo + IMU using VINS-Fusion (ROS2) and evaluated against ORB-SLAM3, achieving <5 cm drift over ~5 m trajectories in indoor GNSS-denied tests.
  • Implemented ESP-Now telemetry using ESP32 modules with ~500 m LOS range for transmitting state, estimation residuals, and system health.
  • Verified autonomous landing on 1.5 m × 1.5 m clear regions and tolerances up to ~15° surface inclination.
  • Simulated Mars-like flight (~0.38 g gravity, no-GPS) in Webots, validating drift behavior, landing accuracy, sensing degradation, and control limits.

Technical Summary
  • Integrated Jetson Nano with Pixhawk 4 for onboard computation and flight handling, with calibrated ESCs and thrust mapping ensuring stable lift and attitude control for a <2 kg platform. Power regulation used a 5 V / 3 A BEC, isolating sensor/compute loads from motor-induced voltage drops.

  • Performed extrinsic and intrinsic calibration for the RealSense D435i (stereo + IMU) and aligned timestamps between Jetson and Pixhawk sources. Evaluated VIO accuracy using VINS-Fusion and ORB-SLAM3, testing sensitivity to feature density, motion blur, low-texture floors, and illumination. Achieved <5 cm drift over ~5 m sequences with optimized IMU noise parameters and RANSAC thresholds.

  • Connected barometer, optical-flow, and external sensors to Pixhawk over I2C/UART. Configured EKF2 to combine IMU, barometer, and flow when stereo data deteriorates. Implemented consistency checks between VIO and Pixhawk position estimates; deviations above a fixed threshold (~8–10 cm) trigger reliance on flow + barometer only.

  • Implemented long-range ESP-Now telemetry between two ESP32 modules. Achieved ~500 m line-of-sight operation and <15 ms median latency. Data included estimated position, VIO confidence, EKF residuals, battery, and attitude.

  • Developed a method for landing region selection using disparity maps and IMU tilt. Evaluated a 1.5 m × 1.5 m safe area requirement; system rejected regions with irregular height profiles or slopes >15°. Confirmed consistent landings on textured and partially textured surfaces.

  • Conducted GNSS-denied simulations in Webots, setting gravity to 0.38 g to approximate Martian conditions. Assessed altitude holding, drift accumulation, and safe-area approach across multiple terrains. Logged estimator drift, thrust reserve, and landing dispersion to validate repeatability under constrained sensing.


Repositories

ANAV – ISRO IRoC-U 2025 Autonomous Drone System

Repository containing all files related to ISRO IRoC-U 2025 Robotics Challenge (telemetry, simulation, flight data) Repository containing all files related to ISRO IRoC-U 2025 Robotics Challenge (telemetry, simulation, flight data) Autonomous Drone for GNSS-Denied Environments – ROS2 setup, VIO, estimation, landing logic Autonomous Drone for GNSS-Denied Environments – ROS2 setup, VIO, estimation, landing logic

RU83C – Rubik’s Cube Solving Robot | Link

A computer-vision–driven Rubik’s Cube solver built around color detection, face reconstruction, and algorithmic solution generation. The system extracts cube state using calibrated imaging and solves it via a Kociemba two-phase search, which operates over the full 43,252,003,274,489,856,000 (~4.3×10¹⁹) state space of a standard 3×3 cube.

  • Live Demo: Live Demo - interactive cube visualization and solver interface.
  • Blog Series (PID – Project in Detail): Blog Series - detailed explanation of the color-space math, permutation constraints, cube group theory, and the intuition behind the solving algorithm.

Vision Processing and Cube State Extraction

  • Performed HSV-based per-face calibration with adjustable saturation/value envelopes to stabilize under variable illumination.

  • Applied contour filtering and grid isolation after morphological denoising to lock onto a valid 3×3 cell arrangement.

  • Executed homography-based perspective correction and grid segmentation, assigning colors by mean-HSV dominance.

  • Combined all six captures into a canonical 54-character cube state string, checked for:

    • valid center-orientation mapping,
    • edge/corner permutation parity,
    • orientation sum constraints (edges mod 2, corners mod 3).

Solution Generation and Simulation

  • Used a Unity visualization environment for state verification, stepwise execution, and intermediate-move replay.

  • Integrated Kociemba’s two-phase algorithm with explicit details:

    • Phase 1: reduces the cube into the H subgroup by constraining edge orientation (2¹¹ states), corner orientation (3⁷ states), and UD-slice edge placement (12 choose 4). Search explores ≈ ≈2.2×10¹⁰ possibilities but prunes aggressively using precomputed coordinate tables.
    • Phase 2: solves from H to the identity using restricted move set and coordinated distance tables over ≈ 1×10⁹ admissible states.
  • Typical generated solutions fall in the 18–22 move range (quarter-turn metric), with occasional optimal-length sequences for favorable states.

  • Viewer supports interactive updates, solution playback, and direct manipulation of state representations.


Repositories
RU83C: Rubik's Cube Solving Robot - Vision processing, cube state detection, solver integration RU83C: Rubik's Cube Solving Robot - Vision processing, cube state detection, solver integration V-RU81K5CU83: Virtual Cube Visualization and Unity-based Solver V-RU81K5CU83: Virtual Cube Visualization and Unity-based Solver
MRI-Based Alzheimer’s & MCI Classification using 3D CNNs | Link

Implemented a full 3D medical-imaging classification pipeline for Alzheimer’s, MCI, and cognitively normal subjects using PyTorch/MONAI.
Focused on volumetric preprocessing, stable normalization across scanners, and architecture search over 3D convolutional backbones.

  • Designed a unified DICOM/NIfTI preprocessing flow with voxel-size normalization, spatial reorientation, intensity Z-scoring, Nyúl histogram standardization, and optional radiomic-feature augmentation.
  • Built data transforms with 3D affine jitter, elastic deformation, anisotropic scaling, and bias-field augmentation to model scanner variability.
  • Implemented Med3DNet-style 3D CNNs with custom heads: channel-progressive blocks, SE/CBAM attention, depth-scheduled 3D convolutions, and dropout tuned via Bayesian optimization.
  • Used MONAI’s sliding-window inference, smart-cache loading, and mixed Gaussian/Rician noise regularization for stable training on full MRI volumes.
  • Performed Bayesian hyperparameter search over learning rates, kernel schedules, convolution depths, and ensemble configurations.
  • Achieved >93% accuracy on held-out structural MRI volumes with strong stability under cross-scanner shifts due to aggressive normalization and augmentation.

Tools: PyTorch • MONAI • NiBabel • 3D CNNs • Bayesian Optimization • Medical Image Preprocessing

PPO-Based Reinforcement Learning for Autonomous Racing on AWS DeepRacer | Link



Built and fine-tuned continuous-action PPO agents on AWS SageMaker for camera-based autonomous racing.
Focused on reward shaping, action-space optimization, and stability constraints that reduce off-track drift and maximize progress-per-step.
Achieved sub-2-minute lap times, reaching top global leaderboard ranks in 2024.

  • Trained end-to-end vision policies using clipped PPO (v4) with a shallow convolutional encoder and continuous steering/speed control.
  • Designed multiple reward families emphasizing centerline stability, heading agreement, curvature-aware waypoint tracking, and velocity-weighted progress.
  • Used distance-band shaping (0.1/0.25/0.5× track-width thresholds) to stabilize early learning and suppress divergence near edges.
  • Added steering smoothness constraints to reduce high-jerk trajectories while allowing aggressive straight-line acceleration.
  • Tuned PPO hyperparameters (entropy annealing, clipping ε, GAE λ, advantage normalization) to avoid policy collapse in long-horizon tasks.
  • Evaluated robustness under simulated perturbations via waypoint jitter, curvature sweeps, and speed-limit randomization.
  • Final optimized agent consistently produced <2 min laps, outperforming default baselines.

Tools: AWS SageMaker • DeepRacer Simulator • Clipped PPO • Continuous RL • Policy Gradient Optimization


Repositories
AWS DeepRacer RL Models Repo (light) AWS DeepRacer RL Models Repo (dark)
Autonomous Multi-Sensor Robot Simulation (GPS/IMU/LiDAR/2-DOF Vision) | Link

A fully simulated 4-wheel autonomous robot equipped with GPS, 9-axis IMU, 2-D LiDAR, ultrasonic distance sensors, and a 2-DOF camera system (linear + rotary actuation).
Implements global-position tracking, local mapping, object detection via camera streams, and reactive obstacle avoidance with minimal control logic.
All sensing, actuation, and navigation behaviors are implemented inside the simulation stack.

  • 4-wheel ground platform with independent velocity control for smooth turn/translation behavior.
  • GPS provides global (x,y) estimates; IMU provides orientation & angular velocity; LiDAR provides local ranging for free-space detection.
  • 2-DOF camera module (linear rail + rotary joint) models active vision for object detection and viewpoint planning.
  • Distance sensors around the chassis give short-range obstacle feedback for collision-free local motion.
  • Robot supports simple teleoperation mappings (↑ ↓ ← → for locomotion; W/S/A/D for camera actuation) and autonomous wandering modes.
  • Designed as a baseline multi-sensor testbed for evaluating classic robotics behaviors without advanced SLAM or learning methods.

Repositories
Main Autonomous Robot Simulation Repository (light) Main Autonomous Robot Simulation Repository (dark)

Differential Drive Robot Differential Drive Robot Line Follower Robot Line Follower Robot
Obstacle Avoidance Robot Obstacle Avoidance Robot Wall Follower Robot Wall Follower Robot

Differential-Drive Kinematics & Odometry Robot

A two-wheel system where encoder increments produce linear and angular motion through standard differential-drive relations.

  • Wheel displacements follow $$\Delta s_L = r,\Delta\phi_L,\qquad \Delta s_R = r,\Delta\phi_R,$$ with $r = 0.025,\text{m}$.
  • Linear and rotational increments arise from $$v = \tfrac{\Delta s_L + \Delta s_R}{2},\qquad \omega = \tfrac{\Delta s_R - \Delta s_L}{b},$$ where $b = 0.09,\text{m}$.
  • Pose $(x,y,\theta)$ updates through $$x' = x + v\cos\theta,\qquad y' = y + v\sin\theta,\qquad \theta' = \theta + \omega.$$
  • Encoder drift directly affects integration of $(\Delta x,\Delta y,\Delta\theta)$, giving the accumulated trajectory purely from wheel motion.
Line-Follower Robot

A two-sensor contrast system that adjusts wheel velocities according to inequalities between left and right reflectance values.

  • Let $I_L$ and $I_R$ denote the two IR readings. Straight motion occurs when $$|I_L - I_R| \approx 0.$$
  • Left steering triggered by $$I_L &gt; I_R,\qquad I_L \in [I_{\min},I_{\max}],$$ implemented by reducing or reversing the left wheel.
  • Right steering triggered by $$I_R &gt; I_L,\qquad I_R \in [I_{\min},I_{\max}],$$ applied symmetrically to the right wheel.
  • The motion law is a simple state determined by the ordering of sensor values: $$I_L \lessgtr I_R ;\Longrightarrow; \text{turn left/right},\qquad I_L \approx I_R;\Longrightarrow;\text{forward}.$$
Obstacle-Avoidance Robot

A proximity-based motion rule where wheel speeds depend on whether any sensor exceeds a threshold.

  • Six sensors yield values $p_i$. Forward motion holds when $$p_i \le \tau\ \ \forall i,$$ for some threshold $\tau$.
  • If any sensor satisfies $$p_j &gt; \tau,$$ the left wheel reverses, creating a turning motion away from the detected obstacle.
  • The velocity pair $(v_L,v_R)$ is therefore piecewise: $$(v_L,v_R)= \begin{cases} (v_{\max},,v_{\max}), & \max_i p_i \le \tau,[4pt] (-v_{\max},,v_{\max}), & \max_i p_i > \tau. \end{cases}$$
  • Exploration emerges purely from repeated evaluation of $\max_i p_i$ and switching of wheel direction.
Wall-Follower Robot

A proximity-driven motion rule based on simple comparisons involving front-facing and left-side sensor values.

  • Let $f$ denote the front sensor reading and $\ell$ the left sensor reading (threshold $\tau$). $$\text{front wall: } f&gt;\tau,\qquad \text{left wall: } \ell&gt;\tau.$$
  • Turning in place arises when $$f&gt;\tau,$$ implemented as $$(v_L,v_R)=(v_{\max},-v_{\max}).$$
  • Forward motion occurs when $$f\le\tau,\quad \ell&gt;\tau,$$ giving $$(v_L,v_R)=(v_{\max},,v_{\max}).$$
  • Right steering occurs when $$f\le\tau,\quad \ell\le\tau,$$ with $$(v_L,v_R)=(\tfrac{1}{8}v_{\max},,v_{\max}).$$
  • Position estimates use $$\Delta s = \tfrac{s_L+s_R}{2},\qquad \theta = \tfrac{s_R - s_L}{d},$$ $$x' = x + \Delta s\cos\theta,\qquad y' = y + \Delta s\sin\theta,$$ allowing detection of when $(x,y)$ enters the target region.
Robotrix-2k25 - Stereo Vision Based 3D Hoop Control | Link

A simulation-based control project developed for the Robotrix-2k25 finals.
A ball is shot in random directions and with varying forces, and the robot must reposition a 3-axis hoop to intercept it.
Ball position cannot be accessed directly; only two stereo cameras mounted on the backboard are available.
Ball 3D position is reconstructed via color segmentation, stereo disparity, and camera→world transforms, followed by 3D trajectory prediction and PID-driven actuator control.


Stereo Vision + 3D Reconstruction + Predictive Control Workflow

  • Detect the ball using HSV color filtering on two synchronized camera frames.
  • Extract pixel centroids from both sensors and compute disparity $d = x_l - x_r$.
  • Recover depth using the stereo pinhole model $Z = (fB)/d$.
  • Reconstruct $(X,Y,Z)$ in camera coordinates and map into world frame using fixed transforms.
  • Estimate velocity from successive frames and predict future positions using projectile equations.
  • Command the hoop actuators through 3 PID controllers (X, Y, Z) to align with the predicted intercept point.
  • Loop at camera frame rate until shot completes.

Technical Summary

Ball detection uses HSV thresholding around the known orange color signature:

$lower_hsv = (h_{min}, s_{min}, v_{min})$
$upper_hsv = (h_{max}, s_{max}, v_{max})$

Contours and circle fitting yield pixel centers $(x_l,y_l)$ and $(x_r,y_r)$.

Stereo disparity:

$d = x_l - x_r$

Depth recovery:

$Z = \frac{f \cdot B}{d}$

3D coordinates in camera frame:

$X = \frac{(x - c_x) Z}{f}$
$Y = \frac{(y - c_y) Z}{f}$

Coordinates are transformed into the world frame:

$P_{world} = R P_{cam} + t$

Relative ball→hoop position:

$P_{rel} = P_{ball} - P_{hoop}$

Velocity estimation (finite differences):

$v_x = (X_2 - X_1)/\Delta t$
$v_y = (Y_2 - Y_1)/\Delta t$
$v_z = (Z_2 - Z_1)/\Delta t$

Projectile prediction:

$x(t) = v_x t + x_0$
$y(t) = v_y t + y_0$
$z(t) = v_z t + z_0 - \tfrac{1}{2} g t^2$

The target hoop position is chosen where the predicted trajectory intersects the hoop’s capture volume.

PID control per axis:

$u(t) = K_p e(t) + K_i \int e(t) dt + K_d \frac{de(t)}{dt}$

Errors:

$e_x = x_{target} - x_{hoop}$
$e_y = y_{target} - y_{hoop}$
$e_z = z_{target} - z_{hoop}$

Each PID output drives its respective sliding joint through velocity commands.


Repository

robotrix-2k25 repository card (light) robotrix-2k25 repository card (dark)


Smart Vision Grocery Quality & Quantity Analysis - Flipkart GRiD 6.0 | Link

Qualified the Round-1 of GRiD-6.0

A vision-based system for automated grocery-item quality and quantity assessment.
The solution integrates a unified dataset (multiple Roboflow sources aggregated and re-annotated) and a consolidated training run using a CNN-based detector (YOLOv7 backbone). The pipeline evaluates produce freshness, packaging correctness, text/OCR extraction, and item count/brand verification.


Core Capabilities Implemented

  • Constructed a merged multi-domain dataset (FMCG, produce, OTC, personal care, household items) using Roboflow pipelines; standardized annotations across label types.
  • Designed a complete smart vision quality pipeline following the GRiD specification
    • High-resolution image acquisition with normalization, light balancing, and noise suppression.
    • Preprocessing: brightness/contrast normalization, color correction, background segmentation.
    • OCR extraction using contour-guided ROI selection for brand name, pack size, label info, MRP, expiration dates.
    • Freshness scoring for fruits/vegetables via color-shift analysis, texture deviation metrics, spoilage cue detection, and geometric deformation checks.
    • Classification using deep CNN feature embeddings + auxiliary SVM for edge cases requiring shallow decision boundaries.
    • Brand recognition & count estimation using object-level shape/size features, multi-crop inference, and IR-style logical rules (simulated) as required by the event’s Use Case 3.
  • Implemented text-driven quality/validity validation:
    • Extract label text and MRP/expiry-printed data.
    • Run OCR confidence filtering and text-normalization passes.
  • Designed a decision engine cross-checking extracted attributes against the product database (brand, freshness index, label validity, count correctness).
  • Added a continuous data logging + feedback loop as recommended in event guidelines for improving classification reliability over time.
  • Submitted simulation videos demonstrating:
    • OCR output validation
    • Freshness detection on vegetables/fruits
    • Packaging/label integrity checks
    • Count & product-category recognition

Technical Summary

The system follows the GRiD 6.0 Smart Vision architecture:

Image Acquisition:
Uniform lighting normalization, noise filtering, and contrast stabilization ensure consistent input quality.

Preprocessing Pipeline:
Images undergo intensity normalization, edge-aware smoothing, and segmentation to isolate foreground products.
This supports text regions, geometric features, and surface attributes required for OCR and quality scoring.

Feature Extraction:
Text regions are processed using OCR; geometric features (edges, contours, size ratios), color-space transformations, and texture descriptors support defect/freshness detection.
Deep CNN embeddings from the trained model are used for brand/category classification, while SVM layers assist with high-similarity items.

Classification and Decision Rules:
Outputs are checked against a product database for correctness.
Freshness of produce uses color variance, texture irregularities, bruise signatures, and abnormal shape metrics.
Count estimation uses object-level consistency checks aligned with the event’s “IR-based counting” specification.

Output & Feedback:
Detected attributes (brand, count, OCR text, expiry date, freshness index) are logged.
A feedback loop stores misclassified samples for incremental dataset improvement.


Repository

GRiD-6.0-2k24 repository card (light) GRiD-6.0-2k24 repository card (dark)


$ env --familiar | grep TOOLS

Python C C++ Java Verilog Icarus Verilog Vivado Yosys OpenROAD Skywater PDK Sentaurus TCAD Tcl/Tk Perl Bash MATLAB LTSpice NGSpice QUCS Proteus Simulink KLayout KiCad Arduino Raspberry Pi ROS OpenCV YOLO Mission Planner Webots Gazebo V-REP TensorFlow PyTorch LaTeX Markdown Vim VS Code Git Docker Linux PowerShell Blender Unity

$ git stats --all

GitHub Views Total Repos GitHub User's stars

mummanajagadeesh trophies (light) mummanajagadeesh trophies (dark)

Stats (light) Streak (light) Top Languages (light)

Stats (dark) Streak (dark) Top Languages (dark)

$ sudo wisdom

Motivational Quote (light)

Motivational Quote (dark)

Pinned Loading

  1. inverter-rtl-2-gdsii inverter-rtl-2-gdsii Public

    Complete ASIC design flow for a Simple Inverter — from RTL (Verilog) to GDSII — using the OpenLane toolchain and the SkyWater 130nm PDK (sky130)

    Verilog 1

  2. gpbot-w gpbot-w Public

    This 4-wheeled robot is equipped with GPS, IMU, LiDAR, Distance Sensors, and a 2-DOF camera (using linear and rotary actuators). It detects objects using computer vision, avoids obstacles, and navi…

    C++ 2 1

  3. OpenPOWER-HW-Design-Hackathon OpenPOWER-HW-Design-Hackathon Public

    A hardware FIR accelerator with configurable taps and coefficients, exposing a Wishbone-Lite memory-mapped interface. Input samples are processed via a sequential MAC arch with deterministic latenc…

    Verilog 1

  4. aws-deepracer-rl-models aws-deepracer-rl-models Public

    Collection of trained reinforcement learning models for autonomous car racing on AWS DeepRacer. Made it to leaderboards with <1min lap time

    Python 1

  5. robotrix-2k25 robotrix-2k25 Public

    Basket the Ball – A robotics project built during the 24-hour IEEE Robotrix 2K25 Hackathon by NITK

    Python 1

  6. ORIGO2K25 ORIGO2K25 Public

    Virtual handout and website for ORIGO 2025 — the annual robotics workshop hosted at NIT Calicut

    HTML 2