sonalgoyal

Sonal sonalgoyal

Building Zingg - open source data mastering, deduplication and entity resolution with ML

102 followers · 67 following

https://github.com/zinggAI/zingg
India
@sonalgoyal

Achievements

x2 x2 x3

Achievements

x2 x2 x3

Organizations

Stars

d3 / d3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Shell 112,287 22,820 Updated Dec 2, 2025

mrdoob / three.js

JavaScript 3D Library.

JavaScript 110,806 36,267 Updated Feb 9, 2026

opencv / opencv

Open Source Computer Vision Library

C++ 86,112 56,523 Updated Feb 9, 2026

JuliaLang / julia

The Julia Programming Language

Julia 48,339 5,734 Updated Feb 9, 2026

metabase / metabase

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊

Clojure 45,904 6,222 Updated Feb 9, 2026

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,789 29,059 Updated Feb 9, 2026

duckdb / duckdb

DuckDB is an analytical in-process SQL database management system

C++ 35,981 2,915 Updated Feb 9, 2026

getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Python 28,209 4,559 Updated Feb 7, 2026

antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Java 18,717 3,429 Updated Jan 1, 2026

nltk / nltk

NLTK Source

Python 14,497 2,977 Updated Jan 10, 2026

dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Python 12,223 2,267 Updated Feb 9, 2026

cleanlab / cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 11,310 882 Updated Jan 13, 2026

great-expectations / great_expectations

Always know what to expect from your data.

Python 11,133 1,677 Updated Feb 9, 2026

nathanmarz / storm

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

Java 8,798 1,654 Updated Aug 16, 2017

open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…

TypeScript 8,662 1,623 Updated Feb 9, 2026

klange / toaruos

A completely-from-scratch hobby operating system: bootloader, kernel, drivers, C library, and userspace including a composited graphical UI, dynamic linker, syntax-highlighting text editor, network…

C 6,614 536 Updated Feb 9, 2026

snorkel-team / snorkel

A system for quickly generating training data with weak supervision

Python 5,938 855 Updated May 2, 2024

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

Go 5,143 430 Updated Feb 9, 2026

OpenTSDB / opentsdb

A scalable, distributed Time Series Database.

Java 5,064 1,240 Updated Dec 12, 2024

argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Python 4,850 474 Updated Feb 9, 2026

octokatherine / readme.so

An online drag-and-drop editor to easily build READMEs

JavaScript 4,628 372 Updated Feb 5, 2026

nmslib / nmslib

Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

C++ 3,568 465 Updated Jan 12, 2026

twitter-archive / flockdb

A distributed, fault-tolerant graph database

Scala 3,330 251 Updated Mar 16, 2017

elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

HTML 2,244 206 Updated Feb 9, 2026

pinterest / querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.

TypeScript 2,229 281 Updated Feb 9, 2026

trekhleb / promote-your-next-startup

🚀 Free resources you may use to promote your next startup

2,191 188 Updated Nov 23, 2025

MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata

Java 2,116 378 Updated Jan 24, 2026

AstraZeneca / awesome-explainable-graph-reasoning

A collection of research papers and software related to explainability in graph machine learning.

1,985 135 Updated Apr 4, 2022

datastax / jvector

JVector: the most advanced embedded vector search engine

Java 1,682 146 Updated Feb 9, 2026

capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

Python 1,541 183 Updated Sep 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sonal sonalgoyal

Achievements

Achievements

Organizations

Block or report sonalgoyal

Stars

d3 / d3

mrdoob / three.js

opencv / opencv

JuliaLang / julia

metabase / metabase

apache / spark

duckdb / duckdb

getredash / redash

antlr / antlr4

nltk / nltk

dbt-labs / dbt-core

cleanlab / cleanlab

great-expectations / great_expectations

nathanmarz / storm

open-metadata / OpenMetadata

klange / toaruos

snorkel-team / snorkel

treeverse / lakeFS

OpenTSDB / opentsdb

argilla-io / argilla

octokatherine / readme.so

nmslib / nmslib

twitter-archive / flockdb

elementary-data / elementary

pinterest / querybook

trekhleb / promote-your-next-startup

MarquezProject / marquez

AstraZeneca / awesome-explainable-graph-reasoning

datastax / jvector

capitalone / DataProfiler