-
RETIS Lab
- Pisa, Tuscany, Italy
-
23:46
(UTC +02:00) - www.linkedin.com/in/baldi-tommaso/
- https://orcid.org/0009-0005-8673-0040
- https://retis.santannapisa.it/~tbaldi/
Stars
Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)
oneAPI Deep Neural Network Library (oneDNN)
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Winograd minimal convolution algorithm generator for convolutional neural networks.
A compiler for lowering quantized ML operators to AMD AI Engine (AIE) firmware.
A heterogeneous accelerator-centric compute cluster
Differentiable architecture search for convolutional and recurrent networks
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
A binary fault Injector based on TensorFI fault injector. For more details, please refer to our paper
CGRA-Flow is an integrated framework for CGRA compilation, exploration, synthesis, and development.
fzi-peccia / tvm
Forked from apache/tvmOpen deep learning compiler stack for cpu, gpu and specialized accelerators
Latency performance measurement framework for ExecuTorch models with SME2 acceleration. Enables operator-level performance analysis, bottleneck identification, and automated reporting.
Generate versal system design from ONNX model. AI engine kernels. Sub-microsecond speeds for autoencoders.
Open-source AI acceleration on FPGA: from ONNX to RTL
Linux device tree generator for the Xilinx SDK (Vivado > 2014.1)
Mastering Embedded Linux Development Fourth Edition, published by Packt
CHARM: Composing Heterogeneous Accelerators on Heterogeneous SoC Architecture
MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine (accepted as full paper at FPT'23)
A machine learning accelerator core designed for energy-efficient AI at the edge.
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 2…
Sleep detector using kria KV260 AI vision and Bluecoin
A tool to deploy Deep Neural Networks on PULP-based SoC's
[ICML 2025 Spotlight] Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss