jsteggink

Jeroen Steggink jsteggink

Freelance AI/ML Strategist & Architect

26 followers · 72 following

@knowsy-nl
Netherlands
17:05 (UTC +01:00)
https://nl.linkedin.com/in/jeroensteggink

Achievements

Organizations

Lists (32)

Sort

Starred repositories

70 stars written in Scala

Clear filter

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,520 28,975 Updated Dec 21, 2025

prisma / prisma1

💾 Database Tools incl. ORM, Migrations and Admin UI (Postgres, MySQL & MongoDB) [deprecated]

Scala 16,443 848 Updated Sep 1, 2022

yahoo / CMAK

CMAK is a tool for managing Apache Kafka clusters

Scala 11,941 2,500 Updated Aug 2, 2023

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 8,475 1,975 Updated Dec 20, 2025

snowplow / snowplow

The leader in Customer Data Infrastructure

Scala 6,986 1,187 Updated Jun 4, 2025

microsoft / SynapseML

Simple and Distributed Machine Learning

Scala 5,191 854 Updated Dec 17, 2025

JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing

Scala 4,083 736 Updated Dec 21, 2025

awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,558 576 Updated Nov 4, 2025

spotify / scio

A Scala API for Apache Beam and Google Cloud Dataflow.

Scala 2,615 526 Updated Dec 16, 2025

metarank / metarank

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine

Scala 2,388 107 Updated Sep 24, 2025

apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,286 970 Updated Dec 18, 2025

salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Scala 2,270 401 Updated Sep 29, 2023

twitter / summingbird

Streaming MapReduce with Scalding and Storm

Scala 2,131 264 Updated Jan 19, 2022

feathr-ai / feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

Scala 1,915 238 Updated Apr 4, 2024

GravityLabs / goose

Html Content / Article Extractor in Scala - open sourced from Gravity Labs

Scala 1,530 315 Updated Apr 18, 2017

combust / mleap

MLeap: Deploy ML Pipelines to Production

Scala 1,528 315 Updated Dec 16, 2025

apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Scala 1,489 552 Updated Dec 20, 2025

scala-steward-org / scala-steward

🤖 A bot that helps you keep your projects up-to-date

Scala 1,189 509 Updated Dec 18, 2025

sscarduzio / elasticsearch-readonlyrest-plugin

Free Elasticsearch security plugin and Kibana security plugin: super-easy Kibana multi-tenancy, Encryption, Authentication, Authorization, Auditing

Scala 955 163 Updated Dec 21, 2025

airbnb / chronon

Chronon is a data platform for serving for AI/ML applications.

Scala 953 86 Updated Dec 10, 2025

NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

Scala 953 267 Updated Dec 17, 2025

apache / incubator-livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Scala 938 619 Updated Dec 16, 2025

dbpedia / extraction-framework

The software used to extract structured data from Wikipedia

Scala 915 291 Updated Dec 21, 2025

delta-io / delta-sharing

An open protocol for secure data sharing

Scala 908 216 Updated Dec 18, 2025

mrpowers-io / spark-daria

Essential Spark extensions and helper methods ✨😲

Scala 766 151 Updated Sep 14, 2025

dbpedia-spotlight / dbpedia-spotlight

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.

Scala 759 193 Updated Mar 8, 2018

AbsaOSS / spline

Data Lineage Tracking And Visualization Solution

Scala 649 157 Updated Dec 19, 2025

nixiesearch / nixiesearch

Hybrid search engine, combining best features of text and semantic search worlds

Scala 589 16 Updated Dec 3, 2025

qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Scala 586 143 Updated Jun 26, 2024

factorie / factorie

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estima…

Scala 554 144 Updated Dec 19, 2017

Natural language processing

Jeroen Steggink jsteggink

Organizations

Lists (32)

AI Agents

Annotations

Apache Pulsar

Apache Spark

Big Data

Business Rules

Crawler

Data Management

Data quality

Data Science

DevOps

GraphQL

Graphs

Kubernetes

LLM

Machine Learning

Messaging

Messaging / RPC

ML - Images

MLOps

Music

NLP

OpenShift

Programming

Reverse engineering

Search

Security

UI

Video

Vision Language Models

Web

Windows

Starred repositories

Natural language processing