Skip to content
View ysshao's full-sized avatar

Organizations

@harvard-acc

Block or report ysshao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Statically scheduled Neural Processing Unit

Python 7 2 Updated Jun 15, 2026

Configurable low-precision floating-point and microscaling hardware in Chisel

Scala 13 Updated May 28, 2026

Generating Schedules for Robotic Workloads

Python 4 Updated Jun 8, 2026

Software kernels for the Radiance GPU

C++ 3 1 Updated Jun 8, 2026

Functional model for Radiance

Rust 4 1 Updated Jun 15, 2026

MLIR-in as a compiler stack for ucb-bar

MLIR 16 3 Updated May 28, 2026
Python 6 6 Updated Jun 1, 2026

Verilog hardware abstraction library

Python 53 8 Updated May 24, 2026
2 Updated Feb 22, 2026

SP26 Floating Point Units Generator

Scala 4 7 Updated May 6, 2026

A simple perf model for NPU.

Python 10 4 Updated May 6, 2026

This repo contains the skeleton scripts for running a full RTL2GDS flow using Cadence tools, as demonstrated in the Full RTL2GDS Demo prepared and delivered by Prof. Adam (Adi) Teman.

Assembly 77 15 Updated Oct 18, 2025

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 980 83 Updated May 28, 2026

Allo Accelerator Design and Programming Framework (PLDI'24)

Python 386 69 Updated May 13, 2026
C++ 2 5 Updated Jun 12, 2026

A hardware–software co-design framework for developing and characterizing extended reality (XR) workloads on embedded systems-on-chip (SoCs).

2 Updated Nov 27, 2025

EE194 Lab 0: Chisel Crash Course

Jupyter Notebook 7 1 Updated May 24, 2026

Lab manual EECS151 Tapeout Decal. Public view - release to main branch only once ready.

Verilog 10 8 Updated Oct 7, 2025

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 2,391 295 Updated Jun 15, 2026

Model-predictive control for microcontrollers (fork mapping tinyMPC to gemmini or other HW accelerators)

C++ 9 2 Updated Oct 13, 2025

A Heterogeneous GPU Platform for AI and Neural Graphics

Scala 60 4 Updated Jun 15, 2026

Autocomp: Optimize any AI kernel, anywhere.

Python 137 14 Updated Jun 15, 2026

A submodule of Chipyard https://github.com/ucb-bar/chipyard

HTML 21 19 Updated Apr 28, 2026

An open-source UCIe implementation

Scala 115 24 Updated Jun 13, 2026

Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC

Scala 56 10 Updated Jan 20, 2026
Scala 35 2 Updated Nov 6, 2024

Chisel RISC-V Vector 1.0 Implementation

Assembly 147 34 Updated Apr 23, 2026

Tool for converting PyTorch models into raw C codes with minimal dependency and some performance optimizations.

C 48 8 Updated Sep 1, 2025

LLM training in simple, raw C/CUDA

Cuda 30,226 3,646 Updated Jun 26, 2025
Verilog 2,098 491 Updated Jun 15, 2026
Next