Lab 7

This document provides a lab guide for using Apache Pig on Cloudera QuickStart VM, focusing on data processing with Pig Latin. It outlines tasks such as starting Pig, loading sample data, and performing operations like projection, filtering, sorting, and grouping. Additionally, it includes practice problems for further application of the concepts learned.

Uploaded by

tanusinghh03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views2 pages

Lab 7

Uploaded by

tanusinghh03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Lab 7: Getting Started with Pig on Cloudera QuickStart VM

Apache Pig is a high-level platform for processing large datasets in Hadoop. It uses a scripting
language called Pig Latin, which simplifies complex data transformations and analysis. Pig
converts these scripts into MapReduce jobs, making it easier to work with structured and semi-
structured data without writing low-level Java code. It is widely used for ETL (Extract,
Transform, Load) tasks, data preprocessing, and analytics.
Objective:
In this lab, you will learn how to start Pig in local and MapReduce modes, load data, and
perform basic operations such as sorting, grouping, joining, projecting, and filtering.
Task 1: Start Apache Pig
1. Open the Cloudera QuickStart VM.
2. Open the Terminal and start Pig in Grunt Shell using
pig -x local
Task 2: Create and Load Sample Data
1. Create an input file in HDFS (or local if using local mode). In the terminal, execute:
101,John,28,IT,60000
102,Alice,24,HR,55000
103,Bob,30,IT,70000
104,David,27,Finance,65000
105,Eve,29,HR,62000
Task 3: Load Data into Pig
Run the following Pig script in the Grunt shell:
EMPLOYEES = LOAD 'employees.txt' USING PigStorage(',')
AS (ID:INT, NAME:CHARARRAY, AGE:INT, DEPT:CHARARRAY,
SALARY:INT);
Verify the loaded data:
DUMP EMPLOYEES;
Task 4: Projection (Selecting Specific Columns)
To display only Name and Department:
EMP_PROJECTION = FOREACH EMPLOYEES GENERATE NAME, DEPT;
DUMP EMP_PROJECTION;
Task 5: Filtering (Employees with Salary > 60000)
HIGH_SALARY = FILTER EMPLOYEES BY SALARY > 60000;
DUMP HIGH_SALARY;
Task 6: Sorting Data by Age
SORTED_EMPLOYEES = ORDER EMPLOYEES BY AGE ASC;
DUMP SORTED_EMPLOYEES;
Task 7: Grouping Data by Department
GROUPED_BY_DEPT = GROUP EMPLOYEES BY DEPT;
DUMP GROUPED_BY_DEPT;

Practice problems;
Data:
201,John,TV,Electronics,2,50000
202,Alice,Laptop,Electronics,1,70000
203,Bob,Phone,Electronics,3,30000
204,David,Shirt,Clothing,4,2000
205,Eve,Shoes,Clothing,2,4000
206,Frank,WashingMachine,Electronics,1,25000
207,Grace,Table,Furniture,1,15000
208,Harry,Chair,Furniture,2,5000

SALES = LOAD 'sales.txt' USING PigStorage(',')

AS (TID:INT, CNAME:CHARARRAY, PRODUCT:CHARARRAY,
CATEGORY:CHARARRAY, QTY:INT, PRICE:INT);
DUMP SALES;

Task 1: Projection (Selecting Specific Columns)

Display Customer Name, Product, and Price only.
Task 2: Filtering (Transactions where Quantity > 2)
Task 3: Sorting Data by Price (Descending Order)
Task 4: Grouping Transactions by Category

7 Ibiz Pig Workouts
No ratings yet
7 Ibiz Pig Workouts
7 pages
Pig Expt 5
No ratings yet
Pig Expt 5
4 pages
Hadoop Week 5
No ratings yet
Hadoop Week 5
78 pages
BDH Practical 08 29
No ratings yet
BDH Practical 08 29
3 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
Pig Commands
No ratings yet
Pig Commands
9 pages
Bda V
No ratings yet
Bda V
10 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
94 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Apache Pig
100% (2)
Apache Pig
80 pages
PIG Installation-Lab
No ratings yet
PIG Installation-Lab
5 pages
RTAP
No ratings yet
RTAP
38 pages
Unit 5
No ratings yet
Unit 5
16 pages
Pig
No ratings yet
Pig
12 pages
Experiment-7 Pig-Script
No ratings yet
Experiment-7 Pig-Script
4 pages
Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Lab 5
No ratings yet
Lab 5
9 pages
Sai PIG Practicals PDF
No ratings yet
Sai PIG Practicals PDF
6 pages
Module-IV Pig
No ratings yet
Module-IV Pig
34 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
ABP W9-W10 Big Data Analytics Lab-PIG
No ratings yet
ABP W9-W10 Big Data Analytics Lab-PIG
11 pages
Experiment 3 4
No ratings yet
Experiment 3 4
7 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Experiment-7 BDA
No ratings yet
Experiment-7 BDA
4 pages
Chapter 10
No ratings yet
Chapter 10
50 pages
BDA Unit - IV
No ratings yet
BDA Unit - IV
81 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Pig 2
No ratings yet
Pig 2
63 pages
Apache Pig Data Processing Guide
No ratings yet
Apache Pig Data Processing Guide
10 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Pig
No ratings yet
Pig
16 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
BDA Module 4 - Part 1 (Pig) 2023
100% (1)
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
67% (3)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
34 pages
BDC Output 7
No ratings yet
BDC Output 7
9 pages
Apache Pig: Big Data Analytics Guide
No ratings yet
Apache Pig: Big Data Analytics Guide
65 pages
Apache Pig
No ratings yet
Apache Pig
61 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
No ratings yet
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
26 pages
Apache Pig for Data Analysts
No ratings yet
Apache Pig for Data Analysts
58 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
Pig Practicals
No ratings yet
Pig Practicals
4 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
Apache Pig: Senthil Kumar A
No ratings yet
Apache Pig: Senthil Kumar A
24 pages
Pig - Lab Demonstrations Explore!: Woha! Pig Is Supercool!
No ratings yet
Pig - Lab Demonstrations Explore!: Woha! Pig Is Supercool!
4 pages
EMP1.txt (Id:int, Name:chararray, Dept:chararray, Salary:int)
No ratings yet
EMP1.txt (Id:int, Name:chararray, Dept:chararray, Salary:int)
2 pages
Pig Notes-1
No ratings yet
Pig Notes-1
6 pages
ETL - With - Apache Pig
No ratings yet
ETL - With - Apache Pig
61 pages
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
Pig Operations Load Store Dump Describe
No ratings yet
Pig Operations Load Store Dump Describe
8 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Bda Exp3 Chinmay
No ratings yet
Bda Exp3 Chinmay
5 pages
Pig Scripting for Data Processing
No ratings yet
Pig Scripting for Data Processing
6 pages
Unit V
No ratings yet
Unit V
30 pages
4608 Eed 01
100% (1)
4608 Eed 01
29 pages
Alco Blow - Como Hacer Alcotest
No ratings yet
Alco Blow - Como Hacer Alcotest
2 pages
DMA Controller (DMAC) : - Data transfer between the μP's main memory & an external
100% (1)
DMA Controller (DMAC) : - Data transfer between the μP's main memory & an external
6 pages
1411 00 Bal 88-10-2703 en
No ratings yet
1411 00 Bal 88-10-2703 en
178 pages
Hindustan Motors: Survival Struggles
No ratings yet
Hindustan Motors: Survival Struggles
22 pages
Engine Design Parameters Tutorial
No ratings yet
Engine Design Parameters Tutorial
40 pages
Revised 2 C Btech 2 4 6 7 8 June 2023 - 05.06.2023
No ratings yet
Revised 2 C Btech 2 4 6 7 8 June 2023 - 05.06.2023
9 pages
Digital MEMS Inclinometer System 2021
No ratings yet
Digital MEMS Inclinometer System 2021
5 pages
EMMC Bus Protocol Linux Kernel Internals by SSM
No ratings yet
EMMC Bus Protocol Linux Kernel Internals by SSM
10 pages
Playwright Interview Questions by Advanto
No ratings yet
Playwright Interview Questions by Advanto
3 pages
ExamVue Duo User Manual For Medical
No ratings yet
ExamVue Duo User Manual For Medical
120 pages
University of Pune Engineering Graduates
100% (1)
University of Pune Engineering Graduates
350 pages
Model Engine Ignition Guide
No ratings yet
Model Engine Ignition Guide
2 pages
RAN3359 Feature Training
No ratings yet
RAN3359 Feature Training
9 pages
Pagewide Enterprise Menu Map QRG XLWW
No ratings yet
Pagewide Enterprise Menu Map QRG XLWW
4 pages
Telesushi - Report - Daniela Marques, Diogo Miguel, Joana Ramos, Maria Pelotte
No ratings yet
Telesushi - Report - Daniela Marques, Diogo Miguel, Joana Ramos, Maria Pelotte
37 pages
Manual Alesis QS8
No ratings yet
Manual Alesis QS8
141 pages
ARCON PAM ITSM Integration
No ratings yet
ARCON PAM ITSM Integration
9 pages
Chapter 5 Classified MS
No ratings yet
Chapter 5 Classified MS
11 pages
CompresoresMT MTZ NTZ
No ratings yet
CompresoresMT MTZ NTZ
8 pages
Fake News Detection Using Python and Machine Learning
No ratings yet
Fake News Detection Using Python and Machine Learning
6 pages
Module 5 - IoT and The Cloud
No ratings yet
Module 5 - IoT and The Cloud
32 pages
Project Model Review Summary
No ratings yet
Project Model Review Summary
3 pages
Menschen Für Menschen (MFM) Is Currently Looking
No ratings yet
Menschen Für Menschen (MFM) Is Currently Looking
1 page
Financial Compliance Recording Guide
No ratings yet
Financial Compliance Recording Guide
7 pages
Passive Active Safety
100% (1)
Passive Active Safety
2 pages
Betelco - SPL-315 kVA SS-1.1
100% (1)
Betelco - SPL-315 kVA SS-1.1
8 pages
(Ebook) Discovering The Internet: Complete by Gary B. Shelly, Jennifer Campbell ISBN 9781111820725, 1111820724 Digital Download
100% (3)
(Ebook) Discovering The Internet: Complete by Gary B. Shelly, Jennifer Campbell ISBN 9781111820725, 1111820724 Digital Download
112 pages
Data Domain - DD3300 Mapping Disk Alerts To The Correct Hard Drive Slot - Dell US
No ratings yet
Data Domain - DD3300 Mapping Disk Alerts To The Correct Hard Drive Slot - Dell US
4 pages
Cost Accounting Book Basu Das Solution In6tlyjz
17% (6)
Cost Accounting Book Basu Das Solution In6tlyjz
2 pages

Lab 7

Uploaded by

Lab 7

Uploaded by

Lab 7: Getting Started with Pig on Cloudera QuickStart VM

SALES = LOAD 'sales.txt' USING PigStorage(',')

Task 1: Projection (Selecting Specific Columns)

You might also like