Stars
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
A curated list of awesome Deep Learning (DL) for Natural Language Processing (NLP) resources
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Machine Learning and Agentic AI Resources, Practice and Research
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Alluxio, data orchestration for analytics and machine learning in the cloud
VSCode extension to work with Databricks
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache TinkerPop - a graph computing framework
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics