CASCADE: A Composable Analytical System of Chiplets for AI Devices at the Edge

Democratizing Customization for ML at the Edge through Hetero-Chiplet SiP Architectures

Authors: Matthew Joseph Adiletta, Gu-Yeon Wei, and David Brooks
Affiliation: Harvard University, Paulson School of Engineering and Applied Sciences
Publication: IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 2025

Overview

CASCADE is an early macro-architecture Design Space Exploration (DSE) framework for System-in-Package (SiP) architectures built from heterogeneous compute chiplets. It enables rapid evaluation of chiplet compositions for machine learning workloads by combining analytical roofline models with a dynamic kernel-mapping optimizer.

The framework models composable chiplets—including GPU, Convolution, Sparse, and Attention accelerators—each representing distinct computational motifs critical to modern ML applications such as vision, language, graph, and recommendation models.

CASCADE captures the performance, bandwidth, and energy trade-offs in multi-chiplet systems, demonstrating that hetero-chiplet SiPs can achieve 3–5× performance speedups over homogeneous GPU-only baselines, depending on the workload.

Features

Analytical chiplet performance models
Compute and data-movement models based on kernel specific roofline bounds calibrated to real hardware.
Composable chiplet menu
GPU, Convolution, Sparse, and Attention chiplets, each extending a shared analytical base model.
Dynamic kernel mapping engine
Automatically optimizes workload partitioning across heterogeneous chiplets using constrained optimization.
Fast macro-architecture DSE
Enables rapid evaluation of design points—orders of magnitude faster than cycle-accurate simulation.
Support for multi-tenant edge workloads
Evaluate single-, dual-, and multi-tenant chiplet configurations for edge AI deployments.

Installation

CASCADE uses Python 3.10+ and common scientific libraries.

# 1. Create and activate the environment
conda create -n cascade python=3.10
conda activate cascade

# 2. Install dependencies
pip install matplotlib scipy pandas tqdm

Running Experiments

All experiments are launched from the dse/ directory.

cd dse
./run.sh

`run.sh` Overview

WORKSPACE=../
TRACE_DIR=$WORKSPACE/traces
CHIPLET_LIBRARY=$WORKSPACE/dse/chiplet-library

# Example experiment configuration (change as needed)
EXPERIMENT_DIR=$WORKSPACE/dse/experiments/gpt-j-1024-weighted.json
# EXPERIMENT_DIR=$WORKSPACE/dse/experiments/resnet50-test.json
# EXPERIMENT_DIR=$WORKSPACE/dse/experiments/ogbn-products-test.json
# EXPERIMENT_DIR=$WORKSPACE/dse/experiments/sd-test.json

python test_system_eval.py \
    --chiplet-library=$CHIPLET_LIBRARY \
    --trace-dir=$TRACE_DIR \
    --experiment=$EXPERIMENT_DIR

Each experiment JSON defines:

Workload traces (from ML models like GPT-J, ResNet50, Stable Diffusion, OGBN-Products)
Target design parameters (number of chiplets, bandwidth, power, etc.)

Configuring Chiplet Selections

Chiplet types are set in the Python configuration file (e.g., test_system_eval.py):

# Hyperparameters for chiplet selection
# Specify a hub-and-spoke system with 6 GPU chiplets and 6 Sparse Chiplets
GPU     = 6
ATTEN   = 0
SPARSE  = 6
CONV    = 0

Adjust these to control the composition of your SiP system.

Each chiplet class in chiplet-library/ extends a base_chiplet model implementing:

Compute performance via roofline analysis
Memory and interconnect bandwidth models
Optimization routines for kernel sharding and workload mapping

Repository Structure

cascade-sim/
│
├── dse/
│   ├── chiplet-library/           # analytical chiplet models
│   |── experiments/               # experiment configurations (JSON)
│   ├── lib                        # cascade simulation files
│   ├── test_system_eval.py        # core evaluation script
│   └── run.sh                     # main entry point for experiments
│
├── traces/                        # workload traces (LLM, StableDiffusion, GCN, ResNet, etc.)
└── README.md                      # this file

Example: Running GPT-J Weighted Workload

cd dse
./run.sh

This will evaluate a hetero-chiplet SiP for the GPT-J model (prefill and decode phases), automatically optimizing workload partitioning across GPU, Attention, Sparse, and Convolution chiplets.

Reference

If you use CASCADE or build upon this framework, please cite our IEEE JETCAS 2025 paper:

M. J. Adiletta, G.-Y. Wei, and D. Brooks,
“Democratizing Customization for ML at the Edge through Hetero-Chiplet SiP Architectures,”
IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2025.
doi: 10.1109/JETCAS.2025.3592677

BibTeX

@ARTICLE{11096615,
  author={Adiletta, Matthew Joseph and Wei, Gu-Yeon and Brooks, David},
  journal={IEEE Journal on Emerging and Selected Topics in Circuits and Systems}, 
  title={Democratizing Customization for ML at the Edge through Hetero-Chiplet SiP Architectures}, 
  year={2025},
  pages={1-1},
  doi={10.1109/JETCAS.2025.3592677},
  keywords={Hetero-Chiplet SiP; Bespoke Edge-Device; Design Space Exploration; Chiplet Ecosystem; Machine Learning; Analytical models}
}

License and Usage

This repository is provided for research and educational use in alignment with IEEE publication policies.
All rights to the underlying models, figures, and results are reserved © 2025 IEEE and the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cascade-sim		cascade-sim
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CASCADE: A Composable Analytical System of Chiplets for AI Devices at the Edge

Democratizing Customization for ML at the Edge through Hetero-Chiplet SiP Architectures

Overview

Features

Installation

Running Experiments

`run.sh` Overview

Configuring Chiplet Selections

Repository Structure

Example: Running GPT-J Weighted Workload

Reference

BibTeX

License and Usage

About

Uh oh!

Releases

Packages

Languages

harvard-acc/cascade

Folders and files

Latest commit

History

Repository files navigation

CASCADE: A Composable Analytical System of Chiplets for AI Devices at the Edge

Democratizing Customization for ML at the Edge through Hetero-Chiplet SiP Architectures

Overview

Features

Installation

Running Experiments

run.sh Overview

Configuring Chiplet Selections

Repository Structure

Example: Running GPT-J Weighted Workload

Reference

BibTeX

License and Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`run.sh` Overview

Packages