Build software better, together

Incalculable-driverslicence975 / data-projects-portfolio

📊 Showcase data projects that highlight analytics, machine learning, and MLOps with reproducible code and clear business insights.

nlp finance machine-learning ai computer-vision dashboard deep-learning hadoop etl excel scikit-learn keras pandas data-visualization portfolio-project data-science-projects tableau-dashboards hiveq

Updated Dec 15, 2025

HariSekhon / Knowledge-Base

Star

Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP and international Consulting including extensive Travel Tips around the world

Updated Dec 15, 2025
Shell

Vaishnavee13 / Learn-by-projects

Star

🚀 Build your skills with hands-on programming tutorials across various languages, guiding you to create applications from scratch.

redux css java html rust open-source php django opengl hadoop cardboard blockchain shader learn-to-code hacktoberfest lynda begineer gssoc-ext

Updated Dec 15, 2025

Toxic-mofo / Talent-

Star

🎯 Streamline talent management with this intuitive platform for tracking, recruiting, and onboarding top candidates efficiently.

mysql python git linux distributed-systems awesome hadoop nosql remote sre hacktoberfest front-end-development foreign-service-bidding pingcap tabular-methods tabular-data-benchmark tabular-data-toolkit tabular-data-machine-learning

Updated Dec 15, 2025

kardel17 / s3-fit

Star

nodejs python java aws machine-learning big-data hive hadoop etl s3-bucket amazon-web-services emr-cluster mongodb-atlas ec2-instances amazonsagemaker

Updated Dec 15, 2025

Phantom9888 / RetailETL-Store-Data-Pipeline

Star

📊 Streamline retail store data processing and enhance reporting with this efficient ETL pipeline.

python airflow scala sql spark hadoop reporting pandas powerbi batch-processing datawarehouse etl-pipeline dataorchestration bigdataprocessing

Updated Dec 15, 2025

Sudharsanan098 / PySpark

Star

📚 Master PySpark in 18 days with structured lessons, hands-on tasks, and an end-to-end project, covering essential concepts and ML model training.

python boilerplate data-science big-data hadoop etl reference scikit-learn data-engineering cheatsheet sparksql spark-sql big-data-analytics etl-job pyspark-python rdds pyspark-sql ranking-functions

Updated Dec 15, 2025

Lu30-ux / hbase-68i

Star

📊 Enhance data management with hbase-68i, a powerful tool for efficient handling and processing of large datasets on HBase.

java real-time big-data hadoop nosql fault-tolerance data-storage scalability hbase cloud-computing distributed-database data-processing data-modeling data-access apache-hbase

Updated Dec 15, 2025

rug5803 / hbase-fiy

Star

🚀 Enhance HBase performance with advanced data handling and management tools, streamlining operations for better efficiency and reliability.

java open-source big-data hadoop nosql fault-tolerance data-storage hbase cloud-computing performance-tuning distributed-database data-management data-modeling real-time-processing scalable-systems

Updated Dec 15, 2025

jgarciadiaz10 / Vaga_Sr_Auditoria_Continua_Metodologia_e_IA

Star

📊 Explore simulated financial transactions and AI logs for the Sr. Auditor Analytics challenge, enhancing continuous auditing through data analysis and risk indicators.

sql big-data hadoop analytics pyspark auditoria gemini-api github-config llm auditoria-continua

Updated Dec 15, 2025
C

Arbuz13 / data-portfolio

Star

📊 Showcase data projects in engineering, machine learning, and business intelligence, emphasizing technical processes and business impacts.

react python nlp machine-learning spark hadoop jupyter-notebook pandas data-visualization seaborn recommendation-system data-analysis matplotlib portfolio-site portfolio-project react-portfolio data-science-projects hiveq

Updated Dec 15, 2025
Jupyter Notebook

4ngelojr / AI-ML-Cheatsheets

Star

🗂️ Access essential AI and ML concepts with quick-reference cheatsheets for effective learning and project implementation.

linux computer-science data-science statistics sql deep-learning hadoop numpy excel bigdata probability plotly keras cheatsheet neural-networks pig convolutional-neural-networks dl

Updated Dec 15, 2025

demonfire2134 / calc

Star

Calc is a simple calculator application that performs basic arithmetic operations. It features a user-friendly interface, allowing users to quickly add, subtract, multiply, and divide numbers.

react javascript html open-source sql hadoop cpp bignum complex-numbers prime-numbers calculators calc calculator-application primes

Updated Dec 15, 2025

amrajib / velox

Star

Velox is a work in progress, experimental Rust based Operating System written for fun and learning

rust arm framework big-data spark hadoop arrow neon x64 clang file-system hdfs vectorization arm64

Updated Dec 15, 2025
Rust

A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.

docker kubernetes jenkins environment big-data hadoop cpp docker-image cuda x11 qtcreator docker-php reproducibility docker-setup

Updated Dec 15, 2025
Shell

ego-creator / hepmassClassification

Star

Pipeline PySpark pour la classification de particules en physique des hautes énergies (dataset HEPMASS). Inclut le prétraitement distribué, l'entraînement de modèles (régression logistique, arbres de décision), l'évaluation et des visualisations clés. Optimisé pour Hadoop/Spark.

machine-learning hadoop pipeline random-forest distributed-computing data-visualization pyspark mllib hdfs logistic-regression hyperparameter-tuning hepmass

Updated Dec 15, 2025

apache / kyuubi

Star

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

kubernetes sql spark hive hadoop jdbc thrift data-lake hacktoberfest spark-sql

Updated Dec 15, 2025
Scala

hghghgh12 / Large-Scale-Data-Pipeline-Migration

Star

🚀 Migrate legacy mainframe data to a modern Hadoop ecosystem, automating ingestion, transformation, and validation for scalable storage and analytics.

mysql big-data spark hive hadoop oozie sqoop dataengineering etl-pipeline

Updated Dec 15, 2025
Python

mahmoudbalal784-create / Logistic-Regression

Star

📊 Build a Logistic Regression model to predict customer churn in telecom, utilizing Python and scikit-learn for data analysis and insights.

python nba data-science machine-learning spark hadoop model numpy scikit-learn prediction pandas kaggle gbdt logistic-regression factorization-machines predictive-modeling nba-analytics nba-prediction

Updated Dec 15, 2025
Jupyter Notebook

trinodb / trino

Star

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated Dec 15, 2025
Java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop

Here are 3,663 public repositories matching this topic...

Incalculable-driverslicence975 / data-projects-portfolio

HariSekhon / Knowledge-Base

Vaishnavee13 / Learn-by-projects

Toxic-mofo / Talent-

kardel17 / s3-fit

Phantom9888 / RetailETL-Store-Data-Pipeline

Sudharsanan098 / PySpark

Lu30-ux / hbase-68i

rug5803 / hbase-fiy

jgarciadiaz10 / Vaga_Sr_Auditoria_Continua_Metodologia_e_IA

Arbuz13 / data-portfolio

4ngelojr / AI-ML-Cheatsheets

demonfire2134 / calc

amrajib / velox

EX539 / docker-dev-env

ego-creator / hepmassClassification

apache / kyuubi

hghghgh12 / Large-Scale-Data-Pipeline-Migration

mahmoudbalal784-create / Logistic-Regression

trinodb / trino

Improve this page

Add this topic to your repo