0% found this document useful (0 votes)

21 views81 pages

BDA Unit - IV

Uploaded by

Associate Professor, CSE Vel Tech, Chennai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views81 pages

BDA Unit - IV

Uploaded by

Associate Professor, CSE Vel Tech, Chennai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Department of

Computer Science and Engineering

10212CS210 – Big Data Analytics

Course Category : Program Elective

Credits :4
Slot : S1 & S5
Semester : Summer
Academic Year : 2024-2025
Faculty Name : Dr. S. Jagan

School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology
Unit 4 Big Data Visualization and Prediction

Pig : Introduction to PIG, Execution Modes of Pig, Comparison of

Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data
Processing operators. Hive : Hive Shell, Hive Services, Hive
Metastore, Comparison with Traditional Databases, HiveQL, Tables,
Querying Data and User Defined Functions, NoSQL Databases :
Schema-less Models‖: Increasing Flexibility for Data Manipulation-
Key Value Stores- Document Stores – Tabular Stores – Object Data
Stores – Graph Databases Hive – Sharding- Hbase – Analyzing big
data with twitter – Big data for E-Commerce Big data for blogs.
Department of Computer Science and Engineering 2
Introduction to PIG

• Developed by Yahoo! and a top level Apache project

• Immediately makes data on a cluster available to non-
Java programmers via Pig Latin – a dataflow language
• Interprets Pig Latin and generates MapReduce jobs
that run on the cluster
• Enables easy data summarization, ad-hoc reporting
and querying, and analysis of large volumes of data
• Pig interpreter runs on a client machine – no
administrative overhead required

Department of Computer Science and Engineering 3

Introduction to PIG

Department of Computer Science and Engineering 4

Pig Terms

• All data in Pig one of four types:

• An Atom is a simple data value - stored as a string but can
be used as either a string or a number
• A Tuple is a data record consisting of a sequence of "fields"
• Each field is a piece of data of any type (atom, tuple or bag)
• A Bag is a set of tuples (also referred to as a ‘Relation’)
• The concept of a “kind of a” table
• A Map is a map from keys that are string literals to values
that can be any data type
• The concept of a hash map

Department of Computer Science and Engineering 5

Pig Capabilities

• Support for
• Grouping
• Joins
• Filtering
• Aggregation
• Extensibility
• Support for User Defined Functions (UDF’s)
• Leverages the same massive parallelism as native
MapReduce

Department of Computer Science and Engineering 6

Pig Basics

• Pig is a client application

• No cluster software is required
• Interprets Pig Latin scripts to MapReduce jobs
• Parses Pig Latin scripts
• Performs optimization
• Creates execution plan
• Submits MapReduce jobs to the cluster

Department of Computer Science and Engineering 7

Execution Modes

• Pig has two execution modes

• Local Mode - all files are installed and run using your local host
and file system
• MapReduce Mode - all files are installed and run on a Hadoop
cluster and HDFS installation
• Interactive
• By using the Grunt shell by invoking Pig on the command line
$ pig
grunt>
• Batch
• Run Pig in batch mode using Pig Scripts and the "pig" command
$ pig –f id.pig –p <param>=<value> ...

Department of Computer Science and Engineering 8

Pig Latin

• Pig Latin scripts are generally organized as follows

• A LOAD statement reads data
• A series of “transformation” statements process the data
• A STORE statement writes the output to the filesystem
• A DUMP statement displays output on the screen
• Logical vs. physical plans:
• All statements are stored and validated as a logical plan
• Once a STORE or DUMP statement is found the logical plan
is executed

Department of Computer Science and Engineering 9

Example Pig Script

-- Load the content of a file into a pig bag named ‘input_lines’

input_lines = LOAD 'CHANGES.txt' AS (line:chararray);

-- Extract words from each line and put them into a pig bag named ‘words’
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;

-- filter out any words that are just white spaces

filtered_words = FILTER words BY word MATCHES '\\w+';

-- create a group for each word

word_groups = GROUP filtered_words BY word;

-- count the entries in each group

word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS
word;

-- order the records by count

ordered_word_count = ORDER word_count BY count DESC;

-- Store the results ( executes the pig script )

STORE ordered_word_count INTO 'output’;

Department of Computer Science and Engineering 10

Basic “grunt” Shell Commands

• Help is available
$ pig -h
• Pig supports HDFS commands
grunt> pwd
• put, get, cp, ls, mkdir, rm, mv, etc.

Department of Computer Science and Engineering 11

About Pig Scripts

• Pig Latin statements grouped together in a file

• Can be run from the command line or the shell
• Support parameter passing
• Comments are supported
• Inline comments '--'
• Block comments /* */

Department of Computer Science and Engineering 12

Simple Data Types
Type Description
int 4-byte integer
long 8-byte integer
float 4-byte (single precision) floating point
double 8-byte (double precision) floating point
bytearray Array of bytes; blob
chararray String (“hello world”)
boolean True/False (case insensitive)
datetime A date and time
biginteger Java BigInteger
bigdecimal Java BigDecimal

Department of Computer Science and Engineering 13

Complex Data Types

Type Description
Tuple Ordered set of fields (a “row / record”)
Bag Collection of tuples (a “resultset / table”)
Map A set of key-value pairs
Keys must be of type chararray

Department of Computer Science and Engineering 14

Pig Data Formats

• BinStorage
• Loads and stores data in machine-readable (binary) format
• PigStorage
• Loads and stores data as structured, field delimited text
files
• TextLoader
• Loads unstructured data in UTF-8 format
• PigDump
• Stores data in UTF-8 format
• YourOwnFormat!
• via UDFs

Department of Computer Science and Engineering 15

Loading Data Into Pig

• Loads data from an HDFS file

var = LOAD 'employees.txt';
var = LOAD 'employees.txt' AS (id, name,
salary);
var = LOAD 'employees.txt' using PigStorage()
AS (id, name, salary);
• Each LOAD statement defines a new bag
• Each bag can have multiple elements (atoms)
• Each element can be referenced by name or position ($n)
• A bag is immutable
• A bag can be aliased and referenced later

Department of Computer Science and Engineering 16

Storing Data Into Pig

• STORE
• Writes output to an HDFS file in a specified directory
grunt> STORE processed INTO
'processed_txt';
• Fails if directory exists
• Writes output files, part-[m|r]-xxxxx, to the directory
• PigStorage can be used to specify a field delimiter
• DUMP
• Write output to screen
grunt> DUMP processed;

Department of Computer Science and Engineering 17

Relational Operators

• FOREACH
• Applies expressions to every record in a bag
• FILTER
• Filters by expression
• GROUP
• Collect records with the same key
• ORDER BY
• Sorting
• DISTINCT
• Removes duplicates

Department of Computer Science and Engineering 18

Relational Operators

• Use the FOREACH …GENERATE operator to work with

rows of data, call functions, etc.
• Basic syntax:
alias2 = FOREACH alias1 GENERATE
expression;
• Example:
DUMP alias1;
(1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5)
(8,4,3)
alias2 = FOREACH alias1 GENERATE col1, col2;
DUMP alias2;
(1,2) (4,2) (8,3) (4,3) (7,2) (8,4)

Department of Computer Science and Engineering 19

Relational Operators

• Use the FILTER operator to restrict tuples or rows of

data
• Basic syntax:
alias2 = FILTER alias1 BY expression;
• Example:
DUMP alias1;
(1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5)
(8,4,3)
alias2 = FILTER alias1 BY (col1 == 8) OR (NOT
(col2+col3 > col1));
DUMP alias2;
(4,2,1) (8,3,4) (7,2,5) (8,4,3)

Department of Computer Science and Engineering 20

Relational Operators

• Use the GROUP…ALL operator to group data

• Use GROUP when only one relation is involved
• Use COGROUP with multiple relations are involved
• Basic syntax:
alias2 = GROUP alias1 ALL;
• Example:
DUMP alias1;
(John,18,4.0F) (Mary,19,3.8F)
(Bill,20,3.9F) (Joe,18,3.8F)
alias2 = GROUP alias1 BY col2;
DUMP alias2;
(18,{(John,18,4.0F),(Joe,18,3.8F)})
(19,{(Mary,19,3.8F)})
(20,{(Bill,20,3.9F)})
Department of Computer Science and Engineering 21
Relational Operators

• Use the ORDER…BY operator to sort a relation based

on one or more fields
• Basic syntax:
alias = ORDER alias BY field_alias [ASC|DESC];
• Example:
DUMP alias1;
(1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5)
(8,4,3)
alias2 = ORDER alias1 BY col3 DESC;
DUMP alias2;
(7,2,5) (8,3,4) (1,2,3) (4,3,3) (8,4,3)
(4,2,1)

Department of Computer Science and Engineering 22

Relational Operators

• Use the DISTINCT operator to remove duplicate tuples

in a relation.
• Basic syntax:
alias2 = DISTINCT alias1;
• Example:
DUMP alias1;
(8,3,4) (1,2,3) (4,3,3) (4,3,3) (1,2,3)
alias2= DISTINCT alias1;
DUMP alias2;
(8,3,4) (1,2,3) (4,3,3)

Department of Computer Science and Engineering 23

Relational Operators

• FLATTEN
• Used to un-nest tuples as well as bags
• INNER JOIN
• Used to perform an inner join of two or more relations based on
common field values
• OUTER JOIN
• Used to perform left, right or full outer joins
• SPLIT
• Used to partition the contents of a relation into two or more
relations
• SAMPLE
• Used to select a random data sample with the stated sample size

Department of Computer Science and Engineering 24

Relational Operators

• Use the JOIN operator to perform an inner, equi-join

join of two or more relations based on common field
values
• The JOIN operator always performs an inner join
• Inner joins ignore null keys
• Filter null keys before the join
• JOIN and COGROUP operators perform similar
functions
• JOIN creates a flat set of output records
• COGROUP creates a nested set of output records

Department of Computer Science and Engineering 25

Relational Operators

DUMP Alias1; Join Alias1 by Col1 to

(1,2,3) Alias2 by Col1
(4,2,1) Alias3 = JOIN Alias1
(8,3,4) BY Col1, Alias2 BY
Col1;
(4,3,3)
(7,2,5)
(8,4,3) Dump Alias3;
DUMP Alias2; (1,2,3,1,3)
(2,4) (4,2,1,4,6)
(8,9) (4,3,3,4,6)
(1,3) (4,2,1,4,9)
(2,7) (4,3,3,4,9)
(2,9) (8,3,4,8,9)
(4,6) (8,4,3,8,9)
(4,9)

Department of Computer Science and Engineering 26

Relational Operators

• Use the OUTER JOIN operator to perform left, right, or full

outer joins
• Pig Latin syntax closely adheres to the SQL standard
• The keyword OUTER is optional
• keywords LEFT, RIGHT and FULL will imply left outer, right outer
and full outer joins respectively
• Outer joins will only work provided the relations which
need to produce nulls (in the case of non-matching keys)
have schemas
• Outer joins will only work for two-way joins
• To perform a multi-way outer join perform multiple two-way
outer join statements

Department of Computer Science and Engineering 27

User-Defined Functions

• Natively written in Java, packaged as a jar file

• Other languages include JavaScript, Ruby, Groovy, and
Python
• Register the jar with the REGISTER statement
• Optionally, alias it with the DEFINE statement
REGISTER /src/myfunc.jar;
A = LOAD 'students';
B = FOREACH A GENERATE myfunc.MyEvalFunc($0);

Department of Computer Science and Engineering 28

DEFINE

• DEFINE can be used to work with UDFs and also

streaming commands
• Useful when dealing with complex input/output formats
/* read and write comma-delimited data */
DEFINE Y 'stream.pl' INPUT(stdin USING PigStreaming(','))
OUTPUT(stdout USING PigStreaming(','));
A = STREAM X THROUGH Y;

/* Define UDFs to a more readable format */

DEFINE MAXNUM org.apache.pig.piggybank.evaluation.math.MAX;
A = LOAD ‘student_data’ AS (name:chararray, gpa1:float,
gpa2:double);
B = FOREACH A GENERATE name, MAXNUM(gpa1, gpa2);
DUMP B;

Department of Computer Science and Engineering 29

Data Warehousing package built on top of
Hadoop

Department of Computer Science and Engineering 30

Hive Background

• Started at Facebook
• Data was collected and stored into Oracle DB
• Data Grew from 10s of GB (2006) to 1 TB/day new data(2007)
• Now the 2020 time its 1024 TB of data generating in a minute.

Department of Computer Science and Engineering 31

Hive use case @ Facebook

Department of Computer Science and Engineering 32

What is Hive

• Data Warehousing package built on top of Hadoop.

• Used for data analysis.
• Targeted towards users comfortable with SQL.
• It is similar to SQL and called HiveQL.
• For managing and querying structured data.
• No need to learn Java and Hadoop APIs.
• Developed by Facebook and contributed by community.
• Facebook analyzed several Terabytes of data every day using Hive.

Department of Computer Science and Engineering 33

Features of Hive

• Hive is fast and scalable.

• It provides SQL-like queries (i.e., HQL) that are implicitly transformed to
MapReduce or Spark jobs.
• It is capable of analyzing large datasets stored in HDFS.
• It allows different storage types such as plain text, RCFile, and HBase.
• It uses indexing to accelerate queries.
• It can operate on compressed data stored in the Hadoop ecosystem.
• It supports user-defined functions (UDFs) where user can provide its
functionality.

Department of Computer Science and Engineering 34

What is Hive

ETL – Extract,
Transform,
Load

Department of Computer Science and Engineering 35

Why go for Hive? When Pig is there

Department of Computer Science and Engineering 36

Hive Architecture and components

Department of Computer Science and Engineering 37

Why go for Hive When Pig is there

Pig Latin: Hive QL:

Procedural data-flow language Declarative SQLish language

A=load’mydata’; Select * from ‘mytable’;
Dump A;

Pig is used by programmer and Hive is used by Analysts generating daily

Researchers. reports.

Department of Computer Science and Engineering 38

Pig vs Hive

Features Hive Pig

Language SQL-like Piglatin
Schemas/Types Yes(Explicit) Yes(Implicit)
Partitions Yes No
Server Optional(Thrift) No
User Defined Yes(Java) Yes(Java)
Functions(UDF)

DFS Direct access Yes Yes

Join/Order/Sort Yes Yes
Shell Yes Yes
Web Interface Yes No
JDBC/ODBC Yes No

Department of Computer Science and Engineering 39

Differences between Hive and Pig

Hive Pig

Hive is commonly used by Data Pig is commonly used by

Analysts. programmers.

It follows SQL-like queries. It follows the data-flow language.

It can handle structured data. It can handle semi-structured data.

It works on server-side of HDFS It works on client-side of HDFS

cluster. cluster.

Hive is slower than Pig. Pig is comparatively faster than Hive.

Department of Computer Science and Engineering 40

Hive Architecture

Department of Computer Science and Engineering 41

Apache Hive Installation

Java Installation - Check whether the Java is installed or not using the following
command.
$ java -version
•Hadoop Installation - Check whether the Hadoop is installed or not using the
following command.
$hadoop version
Steps to install Apache Hive
Download the Apache Hive tar file.
http://mirrors.estointernet.in/apache/hive/hive-1.2.2/
DUnzip the downloaded tar file.

Department of Computer Science and Engineering 42

Apache Hive Installation

tar -xvf apache-hive-1.2.2-bin.tar.gz

DOpen the bashrc file.
$ sudo nano ~/.bashrc
DNow, provide the following HIVE_HOME path.
export HIVE_HOME=/home/codegyani/apache-hive-1.2.2-bin
export PATH=$PATH:/home/codegyani/apache-hive-1.2.2-bin/bin
DUpdate the environment variable.
$ source ~/.bashrc
DLet's start the hive by providing the following command.
$ hive

Department of Computer Science and Engineering 43

Hive Components

Department of Computer Science and Engineering 44

Metastore

Department of Computer Science and Engineering 45

Limitations of Hive

Department of Computer Science and Engineering 46

Abilities of Hive Query Language

Department of Computer Science and Engineering 47

Hive Data Models

Department of Computer Science and Engineering 48

Partitioning

Department of Computer Science and Engineering 49

Partitioning in Hive

• The partitioning in Hive means dividing the table into some parts based
on the values of a particular column like date, course, city or country.
• The advantage of partitioning is that since the data is stored in slices, the
query response time becomes faster.
• As we know that Hadoop is used to handle the huge amount of data, it is
always required to use the best approach to deal with it.
• The partitioning in Hive is the best example of it.

Department of Computer Science and Engineering 50

Partitioning in Hive

• Let's assume we have a data of 10 million students studying in an institute.

• Now, we have to fetch the students of a particular course.
• If we use a traditional approach, we have to go through the entire data.
• This leads to performance degradation.
• In such a case, we can adopt the better approach i.e., partitioning in Hive and
divide the data among the different datasets based on particular columns.

The partitioning in Hive can be executed in two ways -

•Static partitioning
•Dynamic partitioning

Department of Computer Science and Engineering 51

Bucketing

• Bucket concept is based on (Hashing function) mod (By total

number of bucket)

Department of Computer Science and Engineering 52

Bucketing in Hive

• The bucketing in Hive is a data organizing technique.

• It is similar to partitioning in Hive with an added functionality that it divides
large datasets into more manageable parts known as buckets.
• So, we can use bucketing in Hive when the implementation of partitioning
becomes difficult.
• However, we can also divide partitions further in buckets.

Department of Computer Science and Engineering 53

Bucketing in Hive

•The concept of bucketing is based on the hashing technique.

•Here, modules of current column value and the number of required
buckets is calculated (let say, F(x) % 3).
•Now, based on the resulted value, the data is stored into the
corresponding bucket.

Department of Computer Science and Engineering 54

Example of Bucketing in Hive

•First, select the database in which we want to create a table.

hive> use showbucket;

Department of Computer Science and Engineering 55

SerDe - Serialization and Deserialization

Introduction to Hive SerDe

• For the purpose of IO, Apache Hive uses the Hive SerDe interface.
Hence, it handles both serialization and deserialization in Hive.

• Also, interprets the results of serialization as individual fields for

processing.

• In addition, to read in data from a table a SerDe allows Hive.

Further writes it back out to HDFS in any custom format.

• However, it is possible that anyone can write their own SerDe for
their own data formats.

Department of Computer Science and Engineering 56

SerDe

• HDFS files –> InputFileFormat –> <key, value> –>

Deserializer –> Row object

• Row object –> Serializer –> <key, value> –>

OutputFileFormat –> HDFS files

Department of Computer Science and Engineering 57

UDF

• User Defined Functions, also known as UDF, allow you to

create custom functions to process records or groups of
records.

• Hive comes with a comprehensive library of functions.

• There are however some omissions, and some specific cases

for which UDFs are the solution.

Department of Computer Science and Engineering 58

UDF

A UDF processes one or several columns of one row and outputs one
value. For example :
•SELECT lower(str) from table

For each row in "table," the "lower" UDF takes one argument, the value
of "str", and outputs one value, the lowercase representation of "str".
•SELECT datediff(date_begin, date_end) from table

Department of Computer Science and Engineering 59

UDF

For each row in "table," the "datediff" UDF takes two arguments, the value of
"date_begin" and "date_end", and outputs one value, the difference in time
between these two dates.
Each argument of a UDF can be:
•A column of the table.
•A constant value.
•The result of another UDF.
•The result of an arithmetic computation.

Department of Computer Science and Engineering 60

Types of Built-in Functions in HIVE

• Collection Functions.

• Date Functions.

• Mathematical Functions.

• Conditional Functions.

• String Functions.

Department of Computer Science and Engineering 61

NoSQL – Not Only Sql

• Lightweight, Open source,.

• NoSQL DB used in

• Bigdata

• Real-time Web application.

• Log analysis

• Social networking feeds

• Non-relational database.

• Distributed.

• No support for Acid properties.

• No fixed table schema.

Department of Computer Science and Engineering 62
NoSQL - Types

• NoSQL

• Key-value or big hash table – Dynamo, Redis, Riak

• Document – MongoDB, Apache CouchDB, Mark Logic.

• Columnar – Cassandra, Hbase.

• Graph formats – Neo4j, Hypergraph DB, Infinite Graph

Department of Computer Science and Engineering 63

NoSQL - Types

Department of Computer Science and Engineering 64

What is it?
• NoSql database are not relational. - Key value
• Key value pair or document oriented or column oriented or graph
oriented.
Key value or big hash table.
• Key Value
• Firstname Rahul
• Lastnme Dravid

Document oriented.
• Maintain data in collections constituted of documents.
• For ex- mongoDB, Apache CouchDB, Couchbase, MarkLogic.
{
“Book Name” : BDA “,
“Publisher” : Wiley India
“ Year of publications”: 2011
}
Department of Computer Science and Engineering 65
Column

• Column – each storage block has data from only one column.

NoSQL

Key/Value or Bighash
table Schema less

Department of Computer Science and Engineering 66

Graph

• They are called network database, A graph stores in nodes.

ID: 1001 ID : 1002

ID : 1003

Department of Computer Science and Engineering 67

NoSQL – Types & Tools

Department of Computer Science and Engineering 68

Advantages of NoSql

• Can easily scale up and down

• Does not require a predefined schema
• Cheap, easily to implement.
• Relaxes the data consistency requirement.
• Data can be replicated to multiple nodes and can be partitioned.

Department of Computer Science and Engineering 69

Sql Vs NoSql

Department of Computer Science and Engineering 70

No SQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, Ebay

Google Big Table Adobe Photoshop

Department of Computer Science and Engineering 71

Hbase

HBase is an open-source,
distributed, column-oriented
database built on top of HDFS
based on BigTable!

Department of Computer Science and Engineering 72

Hbase

• A distributed data store that can scale horizontally to

1,000s of commodity servers and petabytes of
indexed storage.
• Designed to operate on top of the Hadoop distributed
file system (HDFS) or Kosmos File System (KFS, aka
Cloudstore) for scalability, fault tolerance, and high
availability.

Department of Computer Science and Engineering 73

Hbase

• Distributed storage
• Table-like in data structure
• multi-dimensional map
• High scalability
• High availability
• High performance

Department of Computer Science and Engineering 74

Hbase

• Started toward by Chad Walters and Jim

• 2006.11
• Google releases paper on BigTable
• 2007.2
• Initial HBase prototype created as Hadoop contrib.
• 2007.10
• First useable HBase
• 2008.1
• Hadoop become Apache top-level project and HBase becomes
subproject
• 2008.10~
• HBase 0.18, 0.19 released
Department of Computer Science and Engineering 75
Hbase

• Tables have one primary index, the row key.

• No join operators.
• Scans and queries can select a subset of available
columns, perhaps by using a wildcard.
• There are three types of lookups:
• Fast lookup using row key and optional timestamp.
• Full table scan
• Range scan from region start to end.

Department of Computer Science and Engineering 76

Hbase

• HBase is a Bigtable clone.

• It is open source
• It has a good community and promise for the future
• It is developed on top of and has good integration for
the Hadoop platform, if you are using Hadoop
already.
• It has a Cascading connector.

Department of Computer Science and Engineering 77

Hbase

Department of Computer Science and Engineering 78

Analyzing big data with twitter

Department of Computer Science and Engineering 79

Big data for E-Commerce

Department of Computer Science and Engineering 80

Big data for blogs

Department of Computer Science and Engineering 81

Pig Hive
No ratings yet
Pig Hive
72 pages
Session 3.3
No ratings yet
Session 3.3
30 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
Hadoop Week 5
No ratings yet
Hadoop Week 5
78 pages
Apache Pig for Data Analysts
No ratings yet
Apache Pig for Data Analysts
58 pages
Apache Pig
No ratings yet
Apache Pig
61 pages
Pig 2
No ratings yet
Pig 2
63 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Module-IV Pig
No ratings yet
Module-IV Pig
34 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
ETL - With - Apache Pig
No ratings yet
ETL - With - Apache Pig
61 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
Big Data Module V Notes
No ratings yet
Big Data Module V Notes
26 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
81 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
BDA Module 4 - Part 1 (Pig) 2023
100% (1)
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Pig Full Lecture
No ratings yet
Pig Full Lecture
38 pages
Notes 5 Unit Big Data
No ratings yet
Notes 5 Unit Big Data
23 pages
Unit 5
No ratings yet
Unit 5
24 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
94 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
L Apachepigdataquery PDF
No ratings yet
L Apachepigdataquery PDF
10 pages
Bda Module 5
No ratings yet
Bda Module 5
26 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Unit-4 PIG
No ratings yet
Unit-4 PIG
9 pages
Unit 5
No ratings yet
Unit 5
19 pages
Apache Pig vs MapReduce: Key Differences
No ratings yet
Apache Pig vs MapReduce: Key Differences
27 pages
BD Unit 2
No ratings yet
BD Unit 2
20 pages
Apache Pig Guide: Features & Functions
No ratings yet
Apache Pig Guide: Features & Functions
31 pages
U5 Big Data Aktu
No ratings yet
U5 Big Data Aktu
32 pages
BIGDATUNIT5
No ratings yet
BIGDATUNIT5
32 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Big Data Unit 5 Big Data Notes of Unit 5
No ratings yet
Big Data Unit 5 Big Data Notes of Unit 5
16 pages
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
05a Pig
No ratings yet
05a Pig
52 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
19 pages
IMTC634 - Data Science - Chapter 16
No ratings yet
IMTC634 - Data Science - Chapter 16
20 pages
Apache Pig for Data Engineers
No ratings yet
Apache Pig for Data Engineers
50 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Pig Latin: Simplifying Hadoop for All
No ratings yet
Pig Latin: Simplifying Hadoop for All
9 pages
BigData Unit 4
No ratings yet
BigData Unit 4
13 pages
Unit5 Part1 Notes
No ratings yet
Unit5 Part1 Notes
21 pages
Introduction To Data Mesh
No ratings yet
Introduction To Data Mesh
5 pages
1 s2.0 S0278612523001577 Main
No ratings yet
1 s2.0 S0278612523001577 Main
19 pages
Anil Kumar - ETL Testing - 3.2 Yrs - Resume
100% (2)
Anil Kumar - ETL Testing - 3.2 Yrs - Resume
4 pages
Cse345p Bi Lab
No ratings yet
Cse345p Bi Lab
30 pages
Python Practical Programs
No ratings yet
Python Practical Programs
28 pages
Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan PDF Download
No ratings yet
Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan PDF Download
76 pages
Java Programming for Advanced Students
No ratings yet
Java Programming for Advanced Students
3 pages
Final
No ratings yet
Final
67 pages
Student Registration System
No ratings yet
Student Registration System
6 pages
Power BI Desktop Data Prep Guide
No ratings yet
Power BI Desktop Data Prep Guide
150 pages
DBMS Study Guide for Students
100% (1)
DBMS Study Guide for Students
3 pages
Tooooo
No ratings yet
Tooooo
92 pages
Subqueries & CTE
No ratings yet
Subqueries & CTE
11 pages
[Oct-2024] New PassLeader AI-102 Exam Dumps
No ratings yet
[Oct-2024] New PassLeader AI-102 Exam Dumps
13 pages
Lecture 1
No ratings yet
Lecture 1
7 pages
Central Limit Theorem Explained
No ratings yet
Central Limit Theorem Explained
3 pages
Online Shopping Portal Documentation
No ratings yet
Online Shopping Portal Documentation
72 pages
5164357478365108
No ratings yet
5164357478365108
2 pages
Mylar 800 - UL Product IQ
No ratings yet
Mylar 800 - UL Product IQ
2 pages
Lab Manual Dbms
100% (1)
Lab Manual Dbms
58 pages
Manish Assignment1
No ratings yet
Manish Assignment1
7 pages
Oral Questions and Answers For Dbms Mysql Mongodb Nosql
No ratings yet
Oral Questions and Answers For Dbms Mysql Mongodb Nosql
10 pages
What Is JDBC
No ratings yet
What Is JDBC
3 pages
2935 5841 1 SM
No ratings yet
2935 5841 1 SM
8 pages
1) Assignment Question
No ratings yet
1) Assignment Question
7 pages
Introduction To Nursing Research Statistics
No ratings yet
Introduction To Nursing Research Statistics
3 pages
Educational Data Mining for Student Success
No ratings yet
Educational Data Mining for Student Success
71 pages
Grade XII Informatics Practices PA1
No ratings yet
Grade XII Informatics Practices PA1
6 pages
(RESUME) Meena Stanikzai
No ratings yet
(RESUME) Meena Stanikzai
5 pages
System Design Spec for Developers
No ratings yet
System Design Spec for Developers
12 pages