Skip to content
View aahouzi's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Paris, France
  • 15:44 (UTC +02:00)

Organizations

@NVIDIA @Mellanox

Block or report aahouzi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 244 32 Updated Apr 6, 2026

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 506 73 Updated Apr 9, 2026

NVIDIA Inference Xfer Library (NIXL)

C++ 970 288 Updated Apr 10, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,528 1,010 Updated Apr 10, 2026

Official inference framework for 1-bit LLMs

Python 38,064 3,403 Updated Mar 10, 2026

SYCL implementation of Fused MLPs for Intel GPUs

C++ 51 11 Updated Nov 24, 2025
C++ 61 21 Updated Dec 18, 2024

Grok open release

Python 51,520 8,466 Updated Aug 30, 2024

An innovative library for efficient LLM inference via low-bit quantization

C++ 351 38 Updated Aug 30, 2024

Real-time human detection and tracking camera using YOLOV5 and Arduino

Python 10 4 Updated Nov 26, 2023

Official inference library for Mistral models

Jupyter Notebook 10,762 1,035 Updated Feb 26, 2026

SPEAR: A Simulator for Photorealistic Embodied AI Research

C++ 319 24 Updated Apr 10, 2026

Intel XeSS SDK

C 940 56 Updated Mar 9, 2026

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,612 302 Updated Apr 9, 2026

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,176 217 Updated Oct 8, 2024

Intel® Extension for TensorFlow*

C++ 352 45 Updated Oct 29, 2025

An Open Framework for Federated Learning.

Python 834 235 Updated Feb 21, 2026