Skip to content
View themba's full-sized avatar

Block or report themba

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,622 585 Updated Jun 11, 2026

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 12,926 3,661 Updated Jun 15, 2026

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++ 16,839 4,135 Updated Jun 14, 2026

StAEDI - Streaming API for EDI: Java library featuring a reader/parser, writer/generator, and validation

Java 147 42 Updated Jun 10, 2026