Skip to content
View balditommaso's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report balditommaso

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)

Python 27 4 Updated Oct 3, 2023

oneAPI Deep Neural Network Library (oneDNN)

C++ 4,006 1,147 Updated Jun 11, 2026

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 23,364 4,438 Updated Jun 8, 2026

Winograd minimal convolution algorithm generator for convolutional neural networks.

Python 629 148 Updated Feb 9, 2026

A compiler for lowering quantized ML operators to AMD AI Engine (AIE) firmware.

Python 20 3 Updated Jun 10, 2026

A heterogeneous accelerator-centric compute cluster

SystemVerilog 45 18 Updated Jun 7, 2026

Firmware tools for Unitree Go2

Python 164 30 Updated May 17, 2026

Differentiable architecture search for convolutional and recurrent networks

Python 4,000 840 Updated Jan 3, 2021

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,591 907 Updated Dec 17, 2024

SAM: Sharpness-Aware Minimization (PyTorch)

Python 1,979 210 Updated Feb 21, 2024

A binary fault Injector based on TensorFI fault injector. For more details, please refer to our paper

Python 4 1 Updated Feb 26, 2020

CGRA-Flow is an integrated framework for CGRA compilation, exploration, synthesis, and development.

Python 160 25 Updated Feb 18, 2026

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 5 1 Updated Aug 22, 2025

Latency performance measurement framework for ExecuTorch models with SME2 acceleration. Enables operator-level performance analysis, bottleneck identification, and automated reporting.

Python 5 1 Updated Jun 11, 2026

Generate versal system design from ONNX model. AI engine kernels. Sub-microsecond speeds for autoencoders.

C++ 18 1 Updated Dec 29, 2024

Open-source AI acceleration on FPGA: from ONNX to RTL

Python 54 7 Updated Jun 4, 2026

Run Time for AIE and FPGA based platforms

C++ 667 536 Updated Jun 10, 2026

Linux device tree generator for the Xilinx SDK (Vivado > 2014.1)

Tcl 237 204 Updated May 14, 2026

Mastering Embedded Linux Development Fourth Edition, published by Packt

C 101 35 Updated Apr 22, 2026

CHARM: Composing Heterogeneous Accelerators on Heterogeneous SoC Architecture

C++ 173 24 Updated Mar 12, 2026

MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine (accepted as full paper at FPT'23)

C++ 22 2 Updated Apr 17, 2024

Linux-based partitioning hypervisor

C 1,935 358 Updated May 18, 2024

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 2,369 294 Updated Jun 11, 2026

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 2…

C 949 161 Updated Nov 27, 2024

Sleep detector using kria KV260 AI vision and Bluecoin

Python 3 Updated Jun 11, 2025

A tool to deploy Deep Neural Networks on PULP-based SoC's

Python 94 24 Updated Aug 4, 2025

[ICML 2025 Spotlight] Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss

Python 7 1 Updated May 30, 2025

Dataflow compiler for QNN inference on FPGAs

Python 1,007 300 Updated Jun 11, 2026
Scala 18 1 Updated Jan 22, 2025
Next