You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A cloud-based ETL testing and data analysis pipeline for YouTube trending video data using AWS services including Lambda, Glue, Athena, S3, and QuickSight. This project focuses on ingesting, transforming, storing, and analyzing structured and semi-structured data to generate insights based on video categories and trending metrics.
End to end data reporting project using Azure services like Azure Data Factory for data orchestration, Azure Synapse Analytics for data warehousing, Databricks for data transformations, and Power BI for intuitive data visualization and reporting.
This repo is designed to show how to read and write data from/to google cloud storage with pyspark. The raw data is ingested, transformed and stored in the data lake in snapshot format.
Application of our De-identification Framework with open source technologies, enabling enterprises to take ownership of the de-identification process and deploy it in trusted environments.
This repository is designed for a data science project aimed to education, wich uses a public database from brazilian educational research institute about the nationam highschool exam and applies ETL and datamining association rules to this dataset.