24 Oct 25
WhatTheDuck is an open-source web application built on DuckDB. It allows users to upload CSV and Parquet files, store them in tables, and perform SQL queries on the data.WhatTheDuck is a Python library available on GitHub that serves as a high-performance bridge for seamless data transfer and integration between the DuckDB analytical database and Pandas DataFrames.
DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever want to learn about data engineering
This is a repo with links to everything you’d ever want to learn about data engineering. The Data Engineering Handbook on GitHub is a comprehensive, open-source guide and curriculum intended to help aspiring and current professionals master the skills and tools required to become a Data Engineer.
The Fullstack_data_course GitHub repository contains the curriculum and materials for a comprehensive course designed to teach users the skills required to become a full-stack data professional.
🪄 Create rich visualizations with AI. Data-Formulator is a Microsoft-developed Python library available on GitHub designed for simple and efficient data generation and transformation, facilitating tasks like creating synthetic data and preparing datasets for analysis.
This example uses the datamapplot visualization library to create a 2D map visualization of Hacker News post data, showing clusters of related topics and discussions.
This is an “awesome list” repository that curates fully functional, click-and-run Google Colaboratory notebooks and repositories covering a broad spectrum of topics in Data Science, Deep Learning, and various AI applications.
15 Dec 22
02 Nov 15
An interactive map of data engineering tools