Skip to content
View mdagost's full-sized avatar

Organizations

@dssg

Block or report mdagost

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
46 stars written in Java
Clear filter

The official home of the Presto distributed SQL query engine for big data

Java 16,673 5,533 Updated Apr 1, 2026

Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.

Java 14,003 2,435 Updated Apr 1, 2026

Apache Druid: a high performance real-time analytics database.

Java 13,969 3,774 Updated Apr 1, 2026

OpenRefine is a free, open source power tool for working with messy data and improving it

Java 11,792 2,133 Updated Mar 28, 2026

The Metadata Platform for your Data and AI Stack

Java 11,750 3,416 Updated Apr 1, 2026

A Camera component for React Native. Also supports barcode scanning!

Java 9,644 3,538 Updated Jun 7, 2023

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

Java 8,790 1,651 Updated Aug 16, 2017

Use SQL to query Elasticsearch

Java 7,020 1,533 Updated Feb 21, 2026

High-quality QR Code generator library in Java, TypeScript/JavaScript, Python, Rust, C++, C.

Java 6,493 1,252 Updated Jan 23, 2025

Apache Pinot - A realtime distributed OLAP datastore

Java 6,053 1,458 Updated Apr 1, 2026

JanusGraph: an open-source, distributed graph database

Java 5,750 1,208 Updated Nov 21, 2025

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Java 4,379 590 Updated Apr 1, 2026

Apache Parquet Java

Java 3,045 1,521 Updated Mar 27, 2026

Jenkins plugin to run dynamic agents in a Kubernetes/Docker environment

Java 2,306 1,273 Updated Apr 1, 2026

Stream summarizer and cardinality estimator.

Java 2,265 556 Updated Nov 28, 2019

A Graph Traversal Language (no longer active - see Apache TinkerPop)

Java 1,952 229 Updated Aug 16, 2021
Java 1,939 168 Updated Jul 17, 2021

Secor is a service implementing Kafka log persistence

Java 1,857 532 Updated Mar 10, 2026

A large-scale entity and relation database supporting aggregation of properties

Java 1,789 365 Updated Jun 6, 2025

Workload Automation System

Java 1,335 230 Updated Aug 27, 2024

Hopsworks - Data-Intensive AI platform with a Feature Store

Java 1,291 156 Updated Feb 10, 2025

A Java package to automatically detect anomalies in large scale time-series data

Java 1,189 327 Updated Nov 14, 2023

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Java 1,171 159 Updated Mar 27, 2026

A platform for visualization and real-time monitoring of data workflows

Java 1,169 197 Updated Jan 22, 2020

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Java 1,132 384 Updated Apr 10, 2023

A software library of stochastic streaming algorithms, a.k.a. sketches.

Java 951 220 Updated Mar 27, 2026

Semantic Parser with Execution

Java 841 295 Updated May 1, 2023

The metric correlation component of Etsy's Kale system

Java 709 69 Updated Apr 18, 2017

Hadoop library for large-scale data processing, now an Apache Incubator project

Java 581 132 Updated Jul 8, 2014

A tool that translates augmented markdown into HTML or latex

Java 474 31 Updated Jun 19, 2022
Next