-
Hopsworks
- Stockholm
- @jim_dowling
- in/jim-dowling-206a98
Stars
🔎 Open source distributed and RESTful search engine.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Upserts, Deletes And Incremental Processing on Big Data.
Apache Kafka® running on Kubernetes
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of …
An Open Standard for lineage metadata collection
Official code repository for GATK versions 4 and up
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Hopsworks - Data-Intensive AI platform with a Feature Store
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Payara Server is an open source middleware platform that supports reliable and secure deployments of Java EE (Jakarta EE) and MicroProfile applications in any environment: on premise, in the cloud …
Truly open source API gateway with native OpenAPI support. Written in Java, it is easily extensible, supports legacy XML and SOAP, and is optimized for container deployments.
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Generic Data Ingestion & Dispersal Library for Hadoop
Uniffle is a high performance, general purpose Remote Shuffle Service.
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
🆕 Find the k-nearest neighbors (k-NN) for your vector data
Rapid is a scalable distributed membership service
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Kompics - A message-passing component model for building distributed systems
Reproducing Distributed Systems and Experiments on Cloud