- localhost
Starred repositories
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Base classes to use when writing tests with Spark
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
An open protocol for secure data sharing
Spark package for checking data quality
Extensible Rules Engine for custom Dataframe / Dataset validation
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset