Skip to content
View kabicm's full-sized avatar

Block or report kabicm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 14 8 Updated Jun 17, 2026

PyTorch training at CSCS

Jupyter Notebook 22 15 Updated Jul 4, 2025

Compression for Foundation Models

Jupyter Notebook 36 5 Updated Jul 21, 2025

Dirigent: Lightweight Serverless Orchestration

Go 44 7 Updated Aug 26, 2025

CUDA benchmarks for measuring GPU utilization and interference

Cuda 18 2 Updated Feb 11, 2025

A prototype of using ibis-substrait to compile against a substrait extension

Python 2 Updated Apr 11, 2023

Distributed Communication-Optimal LU-factorization Algorithm

C++ 12 4 Updated Aug 1, 2021

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,524 191 Updated Jun 18, 2026

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.

C++ 55 18 Updated Jun 17, 2026

Neovim config for the lazy

Lua 26,667 1,798 Updated Jun 2, 2026

Spiking neuron integration for PyTorch

Python 44 5 Updated Mar 18, 2025

Google Research

Jupyter Notebook 38,155 8,431 Updated Jun 18, 2026
Python 78 6 Updated May 4, 2021

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 529 32 Updated Jun 18, 2026
Jupyter Notebook 62 3 Updated Mar 4, 2022

Extending JAX with custom C++ and CUDA code

Python 403 23 Updated Aug 18, 2024

Long Range Arena for Benchmarking Efficient Transformers

Python 788 86 Updated Dec 16, 2023

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 7,239 818 Updated Jun 17, 2026

Making large AI models cheaper, faster and more accessible

Python 41,395 4,510 Updated May 25, 2026

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 9,901 1,457 Updated Jun 13, 2026

Model parallel transformers in JAX and Haiku

Python 6,374 884 Updated Jan 21, 2023

Fast and memory-efficient exact attention

Python 24,180 2,840 Updated Jun 18, 2026

Training and serving large-scale neural networks with auto parallelization.

Python 3,182 361 Updated Dec 9, 2023

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Jupyter Notebook 138 35 Updated Jul 25, 2024

Trax — Deep Learning with Clear Code and Speed

Python 8,306 821 Updated Sep 26, 2025

ML-Perf HPC WG Implementation of Mesh-Tensorflow and (buildscripts) for Tensorflow with MPI

Python 4 1 Updated Oct 18, 2019

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

C++ 32 11 Updated Apr 2, 2025

Distributed Communication-Optimal Shuffle and Transpose Algorithm

C++ 14 7 Updated Apr 18, 2026

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

C++ 215 33 Updated Apr 18, 2026
Next