0% found this document useful (0 votes)

35 views98 pages

BDA Unit-4

Apache Pig is an abstraction over MapReduce that simplifies data analysis in Hadoop using a high-level language called Pig Latin. It allows programmers to perform complex data manipulation tasks easily, supports user-defined functions, and handles both structured and unstructured data. Key features include a rich set of operators, optimization opportunities, and the ability to process large datasets efficiently.

Uploaded by

Devabn Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views98 pages

BDA Unit-4

Uploaded by

Devabn Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Apache Pig

What is Apache Pig?

 An abstraction over MapReduce

 Used to analyse larger sets of data representing them as data flows
 Performs all the data manipulation operations in Hadoop
 Provides a high-level language known as Pig Latin
 Programmers can develop their own functions for reading, writing, and
processing data
 Scripts are internally converted to Map and Reduce tasks done by Pig
Engine
Why Do We Need Apache Pig?

 Programmers can perform MapReduce tasks easily without having to type

complex codes in Java
 uses multi-query approach, thereby reducing the length of codes
 SQL-like language
 Provides many built-in operators and Data Types
Features of Pig
Rich set of operators
It provides many operators to perform operations like join,
sort, filer, etc
Ease of programming
Pig Latin is similar to SQL and it is easy to write a Pig script if
you are good at SQL.
Optimization opportunities

Apache Pig optimize their execution automatically, so the

programmers need to focus only on semantics of the
language.
Extensibility
Using the existing operators, users can develop their own
functions to read, process, and write data.
User-defined Functions
Invoke or embed them in Pig Scripts
Handles all kinds of data

Structured as well as unstructured.

Apache Pig Vs MapReduce

Apache Pig MapReduce

 Data flow language  Data processing paradigm
 High level language  Low level and rigid
 Performing a Join operation is  Difficult to perform a Join
simple operation between datasets
 Knowledge of SQL is sufficient
 Exposure to Java is mandatory
 No need for compilation.
 Have a long compilation process.
 Every Apache Pig operator is
converted internally into a
MapReduce job
Apache Pig Vs SQL

Pig SQL
 Procedural language  Declarative language
 Schema is optional  Schema is mandatory
 Limited opportunity for Query  More opportunity for Query
optimization. optimization
 Allows splits in the pipeline.
 Allows developers to store data
anywhere in the pipeline.
 Provides operators to perform ETL
(Extract, Transform, and Load)
functions.
Apache Pig Vs Hive

Pig Hive
 Language Pig Latin.  HiveQL
 Created at Yahoo  Facebook
 Data flow language  Query processing language
 A procedural language and it fits  Declarative language
in pipeline paradigm
 Mostly for structured data
 Handle structured, unstructured,
and semi-structured data
Applications of Apache Pig

 Tasks involving ad-hoc processing and quick prototyping

 To process huge data sources such as web logs

 To perform data processing for search platforms

 To process time sensitive data loads

History
2006 –
Developed as 2008 – The first
a research release of
project at Apache Pig
Yahoo came

2010 – It
2007 – Open graduated as
Sourced Via Apache top-
Apache level Project
Incubator
Architecture
Parser

 Checks the syntax of the script - type checking

 Output of the parser will be a DAG, represents Pig Latin
statements and logical operators
Optimizer

 The logical plan (DAG) is passed to the logical optimizer,

which carries out the logical optimizations such as
projection and pushdown.
Compiler and Execution engine

 The compiler compiles the optimized logical plan into a

series of MapReduce jobs.
 Finally the MapReduce jobs are submitted to Hadoop in
a sorted order for execution to produce desired results.
Data Model
 Atom  Tuple
 Any single value - irrespective of  A record that is formed by an
their data type - Atom. ordered set of fields is known as a
tuple, the fields can be of any
 It is stored as string and can be
type.
used as string and number.
 A tuple is similar to a row in a
 int, long, float, double, chararray,
table of RDBMS.
and bytearray are the atomic
values of Pig.  Example: (Raja, 30)
 A piece of data or a simple
atomic value is known as a field.
 Example: ‘raja’ or ‘30’
 Bag  Relation
 Unordered set of tuples or a  A bag of tuples.
collection of tuples
 Unordered - No guarantee that
 Tuple can have any number of tuples are processed in any
fields (flexible schema). particular order.
 Represented by ‘{}’.  Map
 Similar to a table in RDBMS  A map (or data map) is a set of
 Not necessary - tuples contain the key-value pairs.
same number of fields and have  The key needs to be of type
the same type. chararray and should be unique.
 Example:  The value might be of any type. It
{(Raja, 30), (Mohammad, 45)} is represented by ‘[]’

 Can be a field in a relation - inner  Example: [name#Raja, age#30]

bag.
 Example: {Raja, 30, {9848022338,
raja@gmail.com,}}
Execution Modes

 Local Mode  MapReduce Mode

 Run from your local host and  Load or process the data that
local file system exists in the Hadoop File
 Used for testing purpose System (HDFS)
 A MapReduce job is invoked
in the back-end to perform a
particular operation on the
data
Execution Mechanisms

Interactive Mode (Grunt shell)

Batch Mode (Script)
Embedded Mode (UDF)
Defining our own functions (User Defined Functions) in
programming languages such as Java, and using
them in our script.
Invoking the Grunt Shell

$ ./pig –x local
$ ./pig -x mapreduce
Either of these commands gives you the Grunt shell
prompt as shown below.
grunt>
You can exit the Grunt shell using ‘ctrl + d’.
Batch Mode

Write an entire Pig Latin script in a file and

execute it using the –x command.

$ pig -x local Sample_script.pig

$ pig -x mapreduce Sample_script.pig

Shell & Utility commands

sh Command
Invoke any shell commands
grunt> sh shell_command parameters
grunt> sh ls
pig
pig_1444799121955.log
pig.cmd
pig.py
fs Command
Invoke any Hadoop File system Shell commands
grunt> fs File System command parameters
grunt> fs –ls
Found 3 items
drwxrwxrwx - Hadoop supergroup 0 2015-09-08 14:13 Hbase
drwxr-xr-x - Hadoop supergroup 0 2015-09-09 14:52 seqgen_data
drwxr-xr-x - Hadoop supergroup 0 2015-09-08 11:30 twitter_data
Utility Commands
 clear : clear the screen
 grunt> clear
 help : Provides help about the commands.
 history : Displays a list of statements executed / used so
far since the Grunt sell is invoked .
 set : Used to show/assign values to keys used in Pig.
 quit : You can quit from the Grunt shell.
 exec/run: Can execute Pig scripts
 grunt> exec [–param param_name = param_value] [–param_file
file_name] script
 kill : kill a job from the Grunt shell , grunt> kill JobId
Pig Latin
 A Relation is the outermost structure data model. And it is a bag
where –
 A bag is a collection of tuples.
 A tuple is an ordered set of fields.
 A field is a piece of data.
 Processing Data:
 Statements are the basic constructs
 Statements work with relations
 Statements include operators, expressions and schemas
 Statements take a relation as input and produce another relation as
output except load and store
Student_data = LOAD
'student_data.txt‘
USING PigStorage(',') as ( id:int,
firstname:chararray,
lastname:chararray,
phone:chararray,
city:chararray );
 Values for all the data types can be NULL and
treats null values in a similar way as SQL does
Operators

Category Operators Example

Arithmetic +, - , * , / , % , b = (a == 1)? 20: 30;
?: (Bincond Operator)
CASE CASE f2 % 2
WHEN THEN WHEN 0 THEN 'even'
ELSE WHEN 1 THEN 'odd'
END END

Comparison ==, !=, >, <, >=,<= f1 matches '.tutorial.'

matches (Pattern Matching)
Type Tuple Construction operator : () (Raju, 30)
Construction Bag Construction operator: {} {(Raju, 30), (Mohammad,
Map Construction operator: [] 45)}
[name#Raja, age#30]
Relational operators
Preparing Data
 In MapReduce mode, Pig reads (loads) data from HDFS and stores the
results back in HDFS. Therefore, let us start HDFS and create the following
sample data in HDFS.
Load Operator
 The load statement consists of two parts divided by the “=” operator.
 On the left-hand side, we need to mention the name of the relation where
we want to store the data, and on the right-hand side, we have to define
how we store the data.
 Given below is the syntax of the Load operator.
 Relation_name = LOAD 'Input file path' USING function as schema;

Component Description
Relation_name The relation in which we want to store the data.
Input file path Mention the HDFS directory where the file is stored
function A function from the set of load functions provided by
Apache Pig (BinStorage, JsonLoader, PigStorage,
TextLoader).
schema Define the schema of the data
 We can define the required schema as follows
 (column1 : data type, column2 : data type, column3 : data type);
 Note: We load the data without specifying the schema.
In that case, the columns will be addressed as $01, $02,
etc…
 grunt> student = LOAD
‘hdfs://localhost:9000/pig_data/student_data.txt' USING
PigStorage(',') as ( id:int, firstname:chararray,
lastname:chararray, phone:chararray, city:chararray );
 The PigStorage() function:
 It loads and stores data as structured text files. It takes a
delimiter using which each entity of a tuple is separated,
as a parameter. By default, it takes ‘\t’ as a parameter.
Store operator

 STORE Relation_name INTO ' required_directory_path '

[USING function];
 Ex:
 grunt> STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage
(',');
Diagnostic Operators
 Dump Operator
 The Dump operator is used to run the Pig Latin statements and
display the results on the screen. It is generally used for
debugging Purpose.
 grunt> Dump Relation_Name;
Ex: Dump student;
 Once you execute the above Pig Latin statement, it will start a
MapReduce job to read data from HDFS.
 Describe: Used to view the schema of a relation
 grunt> Describe Relation_name
 Ex: grunt> describe student;
Output: student: { id: int,firstname: chararray,lastname: chararray,phone:
chararray, city: chararray }
 Explain: Used to display the logical, physical, and MapReduce
execution plans of a relation.
 grunt> explain Relation_name;
 Ex: grunt> explain student;
 Illustrate: Gives you the step-by-step execution of a sequence of
statements
 grunt> illustrate Relation_name;
 grunt> illustrate student;
Group Operator
 The group operator is used to group the data in one or more relations. It
collects the data having the same key.
 Group_data = GROUP Relation_name BY age;
 grunt> group_data = GROUP student_details by age;
 grunt> Dump group_data;
 The output contains two columns: one is age – with which we have grouped the
relation, and the other is Bag – Which contains group of tuples, student records
with the respective age.
 You can see the schema of the table after grouping the data using the
describe command as shown below.
Cogroup Operator
 group operator is normally used with one relation, while the cogroup
operator is used in statements involving two or more relations.
The cogroup operator groups the tuples from
each schema according to age where each
group depicts a particular age value.
For example, if we consider the 1st tuple of the
result, it is grouped by age 21. And it contains
two bags –
the first bag holds all the tuples from the first schema
(student_details in this case) having age 21, and
the second bag contains all the tuples from the
second schema (employee_details in this case)
having age 21.
In case a schema doesn’t have tuples having the
age value 21, it returns an empty bag.
Join Operator
 The join operator is used to combine records from two or more relations.
 While performing a join operation, we declare one (or a group of) tuple(s)
from each relation, as keys.
 When these keys match, the two particular tuples are matched, else the
records are dropped.
 Joins can be of the following types:
 Self-join
 Inner-join
 Outer-join : left join, right join, and full join
 Self-join is used to join a table with itself as if the table
were two relations, temporarily renaming at least one
relation.
 Generally, in Apache Pig, to perform self-join, we will
load the same data multiple times, under different
aliases (names).
Outer Join
 Returns all the rows from at least one of the relations. An outer join
operation is carried out in three ways – Left, Right, and Full.
 left outer Join operation returns all rows from the left table, even if there are
no matches in the right relation.
 right outer join operation returns all rows from the right table, even if there
are no matches in the left table.
 full outer join operation returns rows when there is a match in one of the
relations.
Cross Operator
 Computes the cross-product of two or more relations.
Combining and Splitting
 Union Operator :
 The UNION operator of Pig Latin is used to merge the content of two relations.
 To perform UNION operation on two relations, their columns and domains must
be identical.
 Split : Used to split a relation into two or more relations.
Filter Operator
 Used to select the required tuples from a relation based on a condition.
Distinct Operator
 Used to remove redundant (duplicate) tuples from a relation
Foreach Operator
 Used to generate specified data transformations based on the column
data.
Order By
 Used to display the contents of a relation in a sorted order based on one or
more fields.
Limit Operator
 Used to get a limited number of tuples from a relation
Built-in Functions – EVAL Functions
 AVG: Used to compute the average of the numerical values within a bag
and ignores the NULL values.
 To get the global average value, we need to perform a Group All operation,
and calculate the average value using the AVG function.
 To get the average value of a group, we need to group it using the Group By
operator and proceed with the average function.
 Max - Used to calculate the highest value for a column (numeric values or
chararrays) in a single-column bag and ignores the NULL values.
 Count:
 Used to get the number of elements in a bag.
 While counting the number of tuples in a bag, the count() function ignores (will
not count) the tuples having a NULL value in the FIRST FIELD.

 COUNT_STAR
• It includes the NULL values.
 Sum: to get the total of the numeric values of a column in a single-column
bag and ignores the null values.
 DIFF:
 Used to compare two bags (fields) in a tuple.
 It takes two fields of a tuple as input and matches them.
 If they match, it returns an empty bag.
 If they do not match, it finds the elements that exist in one filed (bag) and not
found in the other, and returns these elements by wrapping them within a bag.
 Generally the Diff() function compares two bags in a tuple.
 SUBTRACT :
 Used to subtract two bags.
 It takes two bags as inputs and returns a bag which contains the tuples of the first
bag that are not in the second bag.
 IsEmpty : Used to check if a bag or map is empty.
 Size : Used to compute the number of elements based on any Pig data
type.
 BagToString :
 Used to concatenate the elements of a bag into a string.
 While concatenating, we can place a delimiter between these values (optional).
 Concat : Used to concatenate two or more expressions of the same type.
 Tokenize :
 Used to split a string (which contains a group of words) in a single tuple and
return a bag which contains the output of the split operation.
 As a delimeter to the tokenize function, we can pass space [ ], double quote [" "],
coma [ , ], parenthesis [ () ], star [ * ].
 Word Count Example:
 lines = LOAD ‘data’ AS (line:chararray);
 words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
 grouped = GROUP words BY word;
 wordcount = FOREACH grouped GENERATE group, count(words);
 DUMP wordcount;
Load and Store functions
 Used to determine how the data goes in and comes out of Pig.
It Can’t be used for store operation

BinStorge() in Pig is generally used to store temporary data generated

between the MapReduce jobs.
 Handling Compression: Compressed files can read using PigStorage and
TextLoader functions.
Bag and Tuple Functions
 TOBAG :
 Converts one or more expressions to individual tuples. And these tuples are
placed in a bag.
 TOP : Used to get the top N tuples of a bag.
 To this function, as inputs, we have to pass a relation, the number of tuples we
want, and the column name whose values are being compared.
 This function will return a bag containing the required columns.
 TOTUPLE : Used convert one or more expressions to the data type tuple.
 TOMAP : Used to convert the key-value pairs into a Map.
String Functions
Operator Description
ENDSWITH ENDSWITH(string, testAgainst)
To verify whether a given string ends with a particular
substring
STARTSWITH STARTSWITH(string, substring)
Accepts two string parameters and verifies whether the
first string starts with the second.
SUBSTRING SUBSTRING(string, startIndex, stopIndex)
Returns a substring from a given string.

EqualsIgnoreCase EqualsIgnoreCase(string1, string2)

To compare two stings ignoring the case.

INDEXOF INDEXOF(string, ‘character’, startIndex)

Returns the first occurrence of a character in a string,
searching forward from a start index.
Operator Description
LAST_INDEX_OF LAST_INDEX_OF(expression)
Returns the index of the last occurrence of a character in a
string, searching backward from a start index.
LCFIRST /UCFIRST LCFIRST(expression) /UCFIRST(expression)
Converts the first character in a string to lower case /Upper
case.
REPLACE REPLACE(string, ‘oldChar’, ‘newChar’);
To replace existing characters in a string with new characters.

UPPER / LOWER UPPER(expression) / LOWER(expression)

Returns a string converted to upper/lower case.
STRSPLIT STRSPLIT(string, regex, limit)
To split a string around matches of a given regular expression.
SPLITTOBAG SPLITTOBAG(string, regex, limit)
Similar to the STRSPLIT() function, it splits the string by given
delimiter and returns the result in a bag.
TRIM /LTRIM/RTRIM TRIM(expression) /LTRIM(expression)/RTRIM(expression) Returns
a copy of a string with leading and trailing / leading/ trailing
whitespaces removed.
Date and Time functions
 ToDate : Used to generate a DateTime object according to the given
parameters.
 ToDate(milliseconds)
 ToDate(userstring, format)
 ToDate(userstring, format, timezone)
Math functions
 ABS, ACOS, ATAN, ASIN, CBRT, CEIL, COS, COSH, EXP, FLOOR, LOG, LOG10,
RANDOM, ROUND, SIN, SINH, SQRT, TAN, TANH
 Ex:
Running Scripts
 how to run Apache Pig scripts in batch mode ?
 Comments in Pig Script :
 /* */ - Multiline comment , -- - Single line comment
 Executing Pig Script in Batch mode
Step 1
Write all the required Pig Latin statements in a single file. We can write all the
Pig Latin statements and commands in a single file and save it as .pig file.
Step 2
Execute the Apache Pig script. You can execute the Pig script from the shell
(Linux) as shown below.
 You can execute it from the Grunt shell as well using the exec command as
shown below.
grunt> exec /sample_script.pig
 Executing a Pig Script from HDFS :
 Suppose there is a Pig script with the name Sample_script.pig in the HDFS directory
named /pig_data/.
 $ pig -x mapreduce hdfs://localhost:9000/pig_data/Sample_script.pig

Apache PIG
No ratings yet
Apache PIG
41 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
81 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
Pig 2
No ratings yet
Pig 2
63 pages
Apache Pig for Data Engineers
No ratings yet
Apache Pig for Data Engineers
50 pages
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
Apache Pig
100% (2)
Apache Pig
80 pages
BigData Unit 4
No ratings yet
BigData Unit 4
13 pages
3 Pig
No ratings yet
3 Pig
77 pages
Apache Pig Guide: Features & Functions
No ratings yet
Apache Pig Guide: Features & Functions
31 pages
BD Unit 2
No ratings yet
BD Unit 2
20 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
Bdaut 2
No ratings yet
Bdaut 2
66 pages
BDP U4
No ratings yet
BDP U4
58 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
6 Part2
No ratings yet
6 Part2
45 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Apache Pig: For Live Hadoop Training, Please See Courses
No ratings yet
Apache Pig: For Live Hadoop Training, Please See Courses
25 pages
Big Data Module V Notes
No ratings yet
Big Data Module V Notes
26 pages
Unit 4 Bba
No ratings yet
Unit 4 Bba
10 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Apache Pig for Data Analysts
No ratings yet
Apache Pig for Data Analysts
58 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
Unit 4 Apachepig 210825041412
No ratings yet
Unit 4 Apachepig 210825041412
16 pages
Unit 5 (Pig, Hive, Hbase)
No ratings yet
Unit 5 (Pig, Hive, Hbase)
18 pages
Pig: Building High-Level Dataflows Over Map-Reduce
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce
59 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
94 pages
Session 3.3
No ratings yet
Session 3.3
30 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Pig
No ratings yet
Pig
61 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
U5 Big Data Aktu
No ratings yet
U5 Big Data Aktu
32 pages
Big Data Applications: Pig & Hive
No ratings yet
Big Data Applications: Pig & Hive
29 pages
Unit 5
No ratings yet
Unit 5
24 pages
BDA Unit5
No ratings yet
BDA Unit5
36 pages
Hadoop Big Data: Pig, Hive, HBase
No ratings yet
Hadoop Big Data: Pig, Hive, HBase
17 pages
BDA Unit 5-1
No ratings yet
BDA Unit 5-1
29 pages
Unit 5
No ratings yet
Unit 5
76 pages
Hadoop Week 5
No ratings yet
Hadoop Week 5
78 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
BDA Unit - IV
No ratings yet
BDA Unit - IV
81 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Notes of Aktu Btech 3 Yr Big Data
No ratings yet
Notes of Aktu Btech 3 Yr Big Data
15 pages
BD 5
No ratings yet
BD 5
28 pages
Big Data Unit IV
No ratings yet
Big Data Unit IV
19 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
Notes 5 Unit Big Data
No ratings yet
Notes 5 Unit Big Data
23 pages
Unit 5
No ratings yet
Unit 5
39 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Arrays
No ratings yet
Arrays
2 pages
Test Paper 1
No ratings yet
Test Paper 1
1 page
Omniful - Ai Intern Frontend
No ratings yet
Omniful - Ai Intern Frontend
3 pages
Digital Nurture 4.0 - Qualifier Assessment
No ratings yet
Digital Nurture 4.0 - Qualifier Assessment
1 page
The Power of Authentic Assessment in Higher Ed
No ratings yet
The Power of Authentic Assessment in Higher Ed
59 pages
FSD Internal Questions
No ratings yet
FSD Internal Questions
2 pages
1-2 & 2-2 Sem Invigilations
No ratings yet
1-2 & 2-2 Sem Invigilations
1 page
Challenging Problems Time Work Speed Distance
No ratings yet
Challenging Problems Time Work Speed Distance
40 pages
Digital Nurture Eligible Students List
No ratings yet
Digital Nurture Eligible Students List
10 pages
Unit 5
No ratings yet
Unit 5
14 pages
CO-PO Mapping
No ratings yet
CO-PO Mapping
11 pages
GSB Biss 2025 (Gitam University, Vizag)
No ratings yet
GSB Biss 2025 (Gitam University, Vizag)
8 pages
Programme Credit Framework
No ratings yet
Programme Credit Framework
14 pages
About The Scheme:: Atmanirbhar Bharat"
No ratings yet
About The Scheme:: Atmanirbhar Bharat"
9 pages
Building, Trustworthy Generative AI Systems Brochure
No ratings yet
Building, Trustworthy Generative AI Systems Brochure
6 pages
Sat Class - 2
No ratings yet
Sat Class - 2
17 pages
Apssdc Summer Internship-2025
No ratings yet
Apssdc Summer Internship-2025
20 pages
R25 Department Vision & Mission
No ratings yet
R25 Department Vision & Mission
3 pages
Sat Class - 26
No ratings yet
Sat Class - 26
15 pages
Sat Class - 19
No ratings yet
Sat Class - 19
15 pages
Sat Class - 1
No ratings yet
Sat Class - 1
13 pages
Sat Class - 4
No ratings yet
Sat Class - 4
6 pages
Sat Class - 14
No ratings yet
Sat Class - 14
7 pages
Sat Class - 5
No ratings yet
Sat Class - 5
18 pages
On ESD UNIT 5
No ratings yet
On ESD UNIT 5
31 pages
Sat Class - 11
No ratings yet
Sat Class - 11
13 pages
Sat Class - 12
No ratings yet
Sat Class - 12
11 pages
Sat Class - 23
No ratings yet
Sat Class - 23
4 pages
Transform Warehouse Security With AI Powered Surveillance
No ratings yet
Transform Warehouse Security With AI Powered Surveillance
9 pages
Sat Class - 21
No ratings yet
Sat Class - 21
4 pages
Wireless Network Monitoring Tool
0% (1)
Wireless Network Monitoring Tool
17 pages
Engineers' Guide to Tank Stability
No ratings yet
Engineers' Guide to Tank Stability
10 pages
Chapter 2 Industrial Robot Applications
No ratings yet
Chapter 2 Industrial Robot Applications
16 pages
Article 3 1987 Constitution - Memorize
No ratings yet
Article 3 1987 Constitution - Memorize
2 pages
6 - Science Sample Paper
No ratings yet
6 - Science Sample Paper
8 pages
TAO Bedah Digestif 05 November 2023 Update
No ratings yet
TAO Bedah Digestif 05 November 2023 Update
90 pages
Image and Identity Copy-1
No ratings yet
Image and Identity Copy-1
8 pages
Site Criteria and Loads On Structure: ASCE 7-98 / IBC 2000
No ratings yet
Site Criteria and Loads On Structure: ASCE 7-98 / IBC 2000
13 pages
HVAC Smoke Detector Guidelines
No ratings yet
HVAC Smoke Detector Guidelines
1 page
Cover Letter Examples Engineering Entry Level
100% (2)
Cover Letter Examples Engineering Entry Level
8 pages
Operation Manual For QC12Y-8X3200
100% (2)
Operation Manual For QC12Y-8X3200
37 pages
C Program: Linked List Basics
No ratings yet
C Program: Linked List Basics
6 pages
Eni - Wikipedia
No ratings yet
Eni - Wikipedia
111 pages
Assignment 3 - SCM
No ratings yet
Assignment 3 - SCM
6 pages
Case Study TCS
0% (1)
Case Study TCS
19 pages
Media Object File FAST38 A380 Maintenance
No ratings yet
Media Object File FAST38 A380 Maintenance
5 pages
6-Verbal Question Formats
100% (1)
6-Verbal Question Formats
32 pages
PT Ungaran Sari Garment 2015
No ratings yet
PT Ungaran Sari Garment 2015
2 pages
Assignment 4 - Dealing With Classroom Problems.
No ratings yet
Assignment 4 - Dealing With Classroom Problems.
3 pages
USB-COMi-M & USB-COMi-SI-M Manual
No ratings yet
USB-COMi-M & USB-COMi-SI-M Manual
25 pages
Iso 32100
No ratings yet
Iso 32100
9 pages
1.tata Steel Limited
No ratings yet
1.tata Steel Limited
36 pages
PHD Entrance Test Structure and Syllabus
No ratings yet
PHD Entrance Test Structure and Syllabus
28 pages
CIBA Admin Posts
No ratings yet
CIBA Admin Posts
4 pages
Animal Repellent - Full
100% (1)
Animal Repellent - Full
58 pages
Advanced Certificate in Pharmacy: TH TH RD
No ratings yet
Advanced Certificate in Pharmacy: TH TH RD
8 pages
Assignment 8 Professional Learning Plan
No ratings yet
Assignment 8 Professional Learning Plan
9 pages
Ccma Exam Review
100% (1)
Ccma Exam Review
447 pages
Speed Racer?
No ratings yet
Speed Racer?
20 pages
Computer Hardware Essentials
No ratings yet
Computer Hardware Essentials
21 pages

BDA Unit-4

Uploaded by

BDA Unit-4

Uploaded by

Apache Pig

What is Apache Pig?

 An abstraction over MapReduce

 Programmers can perform MapReduce tasks easily without having to type

Apache Pig optimize their execution automatically, so the

Structured as well as unstructured.

Apache Pig MapReduce

 Tasks involving ad-hoc processing and quick prototyping

 To process huge data sources such as web logs

 To perform data processing for search platforms

 To process time sensitive data loads

 Checks the syntax of the script - type checking

 The logical plan (DAG) is passed to the logical optimizer,

 The compiler compiles the optimized logical plan into a

 Can be a field in a relation - inner  Example: [name#Raja, age#30]

 Local Mode  MapReduce Mode

Interactive Mode (Grunt shell)

Write an entire Pig Latin script in a file and

$ pig -x local Sample_script.pig

$ pig -x mapreduce Sample_script.pig

Category Operators Example

Comparison ==, !=, >, <, >=,<= f1 matches '.*tutorial.*'

 STORE Relation_name INTO ' required_directory_path '

BinStorge() in Pig is generally used to store temporary data generated

EqualsIgnoreCase EqualsIgnoreCase(string1, string2)

INDEXOF INDEXOF(string, ‘character’, startIndex)

UPPER / LOWER UPPER(expression) / LOWER(expression)

You might also like

Comparison ==, !=, >, <, >=,<= f1 matches '.tutorial.'