Big data training material
-
Updated
Jun 29, 2023 - Python
Big data training material
Big Data – Apache server logs analysis using Pig and Python
Análisis al Proyecto GDELT con herramientas bigdata basadas den hadoop en nube Microsoft Azure
A web interface that requests data from search engine and displays results with AmMap.
Big data technologies that I have experimented with
Analysing posts from Stack Exchange on GCP clusters.
Processing and transforming data via Hadoop Ecosystem
Joining, Cleaning, Querying, Performing ETL on Twitter Posts Dataset.
StackExchange data procured is cleaned with pig, queried with hive-ql, performed tf-idf to obtain top 10 words used by top 10 users of StackExchange.
Working on a batch analytics pipeline using Hortonworks HDP 2.6.5. Include loading data into HDFS, creating schemas, using Pig and Hive for transformations, running a MapReduce job, and building PySpark models for clustering, classification, and regression. NLP and sentiment analysis, reduce features using PCA or SVD, and graph analysis applied.
Scalable Pig Docker image with built-in Hadoop works at Docker Compose and Kubernetes
Add a description, image, and links to the pig topic page so that developers can more easily learn about it.
To associate your repository with the pig topic, visit your repo's landing page and select "manage topics."