Skip to content
View jsteggink's full-sized avatar

Organizations

@knowsy-nl

Block or report jsteggink

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

70 stars written in Scala
Clear filter

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,520 28,975 Updated Dec 21, 2025

💾 Database Tools incl. ORM, Migrations and Admin UI (Postgres, MySQL & MongoDB) [deprecated]

Scala 16,443 848 Updated Sep 1, 2022

CMAK is a tool for managing Apache Kafka clusters

Scala 11,941 2,500 Updated Aug 2, 2023

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 8,475 1,975 Updated Dec 20, 2025

The leader in Customer Data Infrastructure

Scala 6,986 1,187 Updated Jun 4, 2025

Simple and Distributed Machine Learning

Scala 5,191 854 Updated Dec 17, 2025

State of the Art Natural Language Processing

Scala 4,083 736 Updated Dec 21, 2025

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,558 576 Updated Nov 4, 2025

A Scala API for Apache Beam and Google Cloud Dataflow.

Scala 2,615 526 Updated Dec 16, 2025

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine

Scala 2,388 107 Updated Sep 24, 2025

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,286 970 Updated Dec 18, 2025

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Scala 2,270 401 Updated Sep 29, 2023

Streaming MapReduce with Scalding and Storm

Scala 2,131 264 Updated Jan 19, 2022

Feathr – A scalable, unified data and AI engineering platform for enterprise

Scala 1,915 238 Updated Apr 4, 2024

Html Content / Article Extractor in Scala - open sourced from Gravity Labs

Scala 1,530 315 Updated Apr 18, 2017

MLeap: Deploy ML Pipelines to Production

Scala 1,528 315 Updated Dec 16, 2025

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Scala 1,489 552 Updated Dec 20, 2025

🤖 A bot that helps you keep your projects up-to-date

Scala 1,189 509 Updated Dec 18, 2025

Free Elasticsearch security plugin and Kibana security plugin: super-easy Kibana multi-tenancy, Encryption, Authentication, Authorization, Auditing

Scala 955 163 Updated Dec 21, 2025

Chronon is a data platform for serving for AI/ML applications.

Scala 953 86 Updated Dec 10, 2025

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

Scala 953 267 Updated Dec 17, 2025

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Scala 938 619 Updated Dec 16, 2025

The software used to extract structured data from Wikipedia

Scala 915 291 Updated Dec 21, 2025

An open protocol for secure data sharing

Scala 908 216 Updated Dec 18, 2025

Essential Spark extensions and helper methods ✨😲

Scala 766 151 Updated Sep 14, 2025

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.

Scala 759 193 Updated Mar 8, 2018

Data Lineage Tracking And Visualization Solution

Scala 649 157 Updated Dec 19, 2025

Hybrid search engine, combining best features of text and semantic search worlds

Scala 589 16 Updated Dec 3, 2025

Qubole Sparklens tool for performance tuning Apache Spark

Scala 586 143 Updated Jun 26, 2024

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estima…

Scala 554 144 Updated Dec 19, 2017
Next