Skip to content
View CodingCat's full-sized avatar

Organizations

@dmlc

Block or report CodingCat

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 1 Updated Jan 16, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 10,026 1,061 Updated May 7, 2026

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Scala 255 73 Updated Feb 21, 2023

A curated list of Rust code and resources.

Rust 58,130 3,456 Updated Jul 2, 2026

lakeFS - Data version control for your data lake | Git for data

Go 5,425 458 Updated Jun 29, 2026

An Open Standard for lineage metadata collection

Java 2,523 483 Updated Jul 2, 2026

Hexagonal hierarchical geospatial indexing system

C 6,360 590 Updated Jul 1, 2026

Remote shuffle service for Apache Spark to store shuffle data on remote servers.

Java 335 100 Updated Sep 29, 2023

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Scala 430 117 Updated Jan 14, 2022

Koalas: pandas API on Apache Spark

Python 3,374 369 Updated Mar 20, 2024

Distributed Big Data Orchestration Service

Java 1,763 373 Updated Jun 11, 2026

🦉 Data Versioning and ML Experiments

Python 15,716 1,306 Updated Jun 29, 2026

Apache Iceberg

Java 9,003 3,369 Updated Jul 3, 2026

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Scala 349 65 Updated Jan 14, 2026

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,857 1,429 Updated Jan 28, 2026

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

Jupyter Notebook 2,699 730 Updated Jun 12, 2026

Avro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.

Scala 129 21 Updated May 9, 2020

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive lea…

C++ 8,684 1,925 Updated May 8, 2026

xgboost website

Ruby 21 8 Updated Aug 6, 2025

Chisel: A Modern Hardware Design Language

Scala 4,704 649 Updated Jul 2, 2026

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Jupyter Notebook 10,965 6,971 Updated Jul 2, 2026

cuDF - GPU DataFrame Library

C++ 9,694 1,074 Updated Jul 3, 2026

A curated list of automated machine learning papers, articles, tutorials, slides and projects

4,148 681 Updated Jun 11, 2024

A Time Series Library for Apache Spark

Scala 1,165 199 Updated Jul 3, 2020

Apache Pinot - A realtime distributed OLAP datastore

Java 6,104 1,484 Updated Jul 2, 2026

An open source python library for automated feature engineering

Python 7,659 914 Updated Jun 17, 2026

A game theoretic approach to explain the output of any machine learning model.

Jupyter Notebook 25,580 3,730 Updated Jul 1, 2026

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while control…

Python 26,830 5,939 Updated Jul 3, 2026

Iceberg is a table format for large, slow-moving tabular data

Java 494 63 Updated Apr 10, 2023

Vectorized processing for Apache Arrow

483 60 Updated Feb 14, 2022
Next