📊 Showcase data projects that highlight analytics, machine learning, and MLOps with reproducible code and clear business insights.
-
Updated
Dec 15, 2025
📊 Showcase data projects that highlight analytics, machine learning, and MLOps with reproducible code and clear business insights.
Large Tech Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP and international Consulting including extensive Travel Tips around the world
🚀 Build your skills with hands-on programming tutorials across various languages, guiding you to create applications from scratch.
🎯 Streamline talent management with this intuitive platform for tracking, recruiting, and onboarding top candidates efficiently.
📊 Streamline retail store data processing and enhance reporting with this efficient ETL pipeline.
📚 Master PySpark in 18 days with structured lessons, hands-on tasks, and an end-to-end project, covering essential concepts and ML model training.
📊 Enhance data management with hbase-68i, a powerful tool for efficient handling and processing of large datasets on HBase.
🚀 Enhance HBase performance with advanced data handling and management tools, streamlining operations for better efficiency and reliability.
📊 Explore simulated financial transactions and AI logs for the Sr. Auditor Analytics challenge, enhancing continuous auditing through data analysis and risk indicators.
📊 Showcase data projects in engineering, machine learning, and business intelligence, emphasizing technical processes and business impacts.
🗂️ Access essential AI and ML concepts with quick-reference cheatsheets for effective learning and project implementation.
Calc is a simple calculator application that performs basic arithmetic operations. It features a user-friendly interface, allowing users to quickly add, subtract, multiply, and divide numbers.
A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.
Pipeline PySpark pour la classification de particules en physique des hautes énergies (dataset HEPMASS). Inclut le prétraitement distribué, l'entraînement de modèles (régression logistique, arbres de décision), l'évaluation et des visualisations clés. Optimisé pour Hadoop/Spark.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
🚀 Migrate legacy mainframe data to a modern Hadoop ecosystem, automating ingestion, transformation, and validation for scalable storage and analytics.
📊 Build a Logistic Regression model to predict customer churn in telecom, utilizing Python and scikit-learn for data analysis and insights.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Add a description, image, and links to the hadoop topic page so that developers can more easily learn about it.
To associate your repository with the hadoop topic, visit your repo's landing page and select "manage topics."