0% found this document useful (0 votes)

202 views134 pages

Hive L1

Hive BigData tutorial

Uploaded by

Santosh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

202 views134 pages

Hive L1

Hive BigData tutorial

Uploaded by

Santosh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 134

Apache Hive

Bigdata

Copyright © 2016 Tech Mahindra. All rights reserved. 1

Topics to be covered

 Introduction to Hive
 Manipulating Data with Hive
 Partitioning and Bucketing
 Advanced Hive Features
 Hive Best Practices

Copyright © 2016 Tech Mahindra. All rights reserved. 2

Introduction to Hive

Copyright © 2016 Tech Mahindra. All rights reserved. 3

Introduction to Hive
 What Is Hive?
 Benefits of Hive
 Getting Data into Hive
 The Hive Architecture
 Creating Hive Tables
 Loading Data into Hive
 Storing Query Results in HDFS
 Creating Different Databases

Copyright © 2016 Tech Mahindra. All rights reserved. 4

What is Hive?
 Hive is an open-source data warehousing solution built on top of Hadoop.

 Hive is a way to allow non-Java programmers access to the data stored in

Hadoop clusters.

 Used by Data analysts, Statisticians, Scientists etc

 Hive supports queries expressed in a SQL-like declarative language - HiveQL,

which are compiled into MapReduce jobs that are executed using Hadoop.

 HiveQL includes a type system with support for tables.

 HiveQL contains primitive types, collections like arrays & maps and nested
compositions of the same.

Copyright © 2016 Tech Mahindra. All rights reserved. 5

What is Hive? (cont'd)
 In addition, HiveQL enables users to plug in custom map-reduce scripts into
queries.

 Hive also includes a system catalog - Metastore – that contains schemas and
statistics, which are useful in data exploration, query optimization and query
compilation.

 The Hive implementation at Facebook contains tens of thousands of tables

and stores over 700TB of data. It is being used extensively for both reporting
and ad-hoc analyses by more than 200 users per month.

Copyright © 2016 Tech Mahindra. All rights reserved. 6

Benefits of Hive
 Hive is much easier to learn and use as its based on standard SQL.

 Writing HiveQL queries is much faster than writing the equivalent Java code.

 Many people already know SQL.

 Can rapidly start using Hive to query and manipulate data in the cluster.

Copyright © 2016 Tech Mahindra. All rights reserved. 7

Getting Data Into Hive
 To get data into Hive, first access Hive through the Hive Shell.
$ hive>

 Invoke the Hive shell from the command line by typing

$ hive

 The shell responds with its own command prompt

hive>

 A file containing HiveQL code can be executed using the -f option

$ hive -f myfile.hql

Copyright © 2016 Tech Mahindra. All rights reserved. 8

Getting Data Into Hive (cont’d)
 HiveQL can be used directly from the command line using the -e option
$ hive -e 'select * from users`;
 As with most similar shells, HiveQL queries must be terminated with a
Semicolon (;).

Copyright © 2016 Tech Mahindra. All rights reserved. 9

The Hive Architecture

Copyright © 2016 Tech Mahindra. All rights reserved. 10

The Hive Architecture (cont'd)
 Metastore: stores system catalog.

 Driver: manages the life cycle of HiveQL query as it moves thru‟ HIVE, also
manages session handle and session statistics.

 Query compiler: Compiles HiveQL into a directed acyclic graph of map/reduce

tasks.

 Execution engines: The component executes the tasks in proper dependency

order; interacts with Hadoop.

 HiveServer: provides Thrift interface and JDBC/ODBC for integrating other

applications.

 Client components: include - CLI, web interface, JDBC/ODBC interface,

Extensibility interface include SerDe, User Defined Functions and User
Defined Aggregate Function.

Copyright © 2016 Tech Mahindra. All rights reserved. 11

Creating a Table in Hive
 Example table definition:

CREATE TABLE movies (id INT, name STRING, year INT)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE

 ROW FORMAT DELIMITED

– Tells Hive to expect one record per line.

 FIELDS TERMINATED BY . . .
– Columns will be separated by the specified character.
– Default separator is „Ctrl-A‟.

Copyright © 2016 Tech Mahindra. All rights reserved. 12

Hive’s Column Types
 Hive supports a number of different column types.
 Each type maps to a native data type in Java.

Type Description

TINYINT 1 byte
SMALLINT 2 bytes
INT 4 bytes
BIGINT 8 bytes
FLOAT Single precision
DOUBLE Double precision
STRING Sequence of characters
BOOLEAN True/false
TIMESTAMP YYYY/MM/DD HH:MM:SS.fffffffff

Copyright © 2016 Tech Mahindra. All rights reserved. 13

Hive’s Column Types (cont'd.)
 Hive has no binary column types like BLOB data type available in RDBMSs.

 Hive is designed to deal with textual data.

Complext
Column Types

Maps Arrays Struct

User
Key, Value Lists of
defined
Pairs Elements
Structures

Copyright © 2016 Tech Mahindra. All rights reserved. 14

How Hive Stores Data
 By default, tables are stored in Hive‟s Warehouse HDFS directory
– The contents of the files in this directory are considered to be the contents
of the table.

hive> CREATE TABLE employees (name string, age int);

Now check if this table is stored as a directory in hive‟s warehouse in
HDFS:
$ hadoop fs -ls /user/hive/warehouse/employees
drwxr-xr-r . . . /user/hive/warehouse/employees/data1.txt

Copyright © 2016 Tech Mahindra. All rights reserved. 15

How Hive Stores Data (cont’d)
Accessing table using hive shell:
hive> SELECT * FROM employees;
OK
Steve 32
John 28
Brian 35
Check the contents of table in HDFS using cat command :
$ hadoop fs -cat /user/hive/warehouse/employees/data1.txt
Steve 32
John 28
Brian 35

Copyright © 2016 Tech Mahindra. All rights reserved. 16

How Hive interprets Data?
 Hive layers a table definition onto a directory.
– Doesn‟t verify the data when it is loaded, but rather when a
– query is issued. This is called schema on read.

 The table definition describes the layout of the data files

– Typically they are delimited files (using commas, tabs, or other characters).

– Hive‟s default delimiter is the Control-A character.

– This does not have to be the case if you use a custom

Serializer/Deserializer.

 This table definition is saved in Hive‟s Metastore.

Copyright © 2016 Tech Mahindra. All rights reserved. 17

Hive’s Metastore
 The Hive Metastore is held in a set of tables stored in an RDBMS.
– Typically either Derby (the default) or MySQL.

 The data held in the Metastore includes:

– Table definitions.

– Table name, column names, column data types, etc.

– Information on where the table data is stored in HDFS.

– Row format of files in the table.

– Storage format of the files in the table.

– Determines which Input Format and Output Format the MapReduce jobs
will use.

Copyright © 2016 Tech Mahindra. All rights reserved. 18

Submitting a Hive Query
 When a HiveQL query is submitted, the Hive interpreter on the client machine
converts it into one or more MapReduce jobs.

 The MapReduce jobs are then submitted to the cluster by the interpreter.

Copyright © 2016 Tech Mahindra. All rights reserved. 19

External Table
 It is possible to create a table which points to an existing directory rather than
having the directory automatically created under /user/hive/warehouse.

 This is known as an external table.

CREATE EXTERNAL TABLE movies (id INT, name STRING, year INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION „ /user/ian/movieFileDirectory ‟

Copyright © 2016 Tech Mahindra. All rights reserved. 20

External Table (cont’d)
 External Tables
– Creates a table and points to a specified directory in HDFS.

– Hive does not use a default location for this table.

– Very useful for existing data.

– DROPing an External Table only removes metadata.

Copyright © 2016 Tech Mahindra. All rights reserved. 21

Table operations
 VIEW Table Definitions
– DESCRIBE t (to see the columns of the table).

– DESCRIBE EXTENDED t (to see detailed description of the table).

 DELETE the Table

– DROP TABLE t (to delete the table).

 ALTER structure of existing tables

– Add and remove partitions.

– Rename a table/modify properties.

– Modify Columns.

– Change a table‟s location.

Copyright © 2016 Tech Mahindra. All rights reserved. 22

Table operations (cont’d)
 Examples
– ALTER TABLE t ADD PARTITION (part_col='val')

– ALTER TABLE t DROP PARTITION (part_col='val')

– ALTER TABLE t RENAME TO x

– ALTER TABLE t CHANGE old_name new_name new_type

– ALTER TABLE t ADD COLUMNS (col_name, type, ...)

– ALTER TABLE t SET LOCATION new_location (change location for

external table)

Copyright © 2016 Tech Mahindra. All rights reserved. 23

Loading data into Hive
 Loading data into a Hive table involves moving the data files into the Hive
table warehouse directory in HDFS

 This can be done by two ways

– Can be done directly using hadoop fs command.

$ hadoop fs –mv /depts/finance/salesdata /user/hive/warehouse/sales/

– Alternatively, use Hive‟s LOAD DATA INPATH command:

hive> LOAD DATA INPATH '/depts/finance/salesdata'

INTO TABLE sales ;

Copyright © 2016 Tech Mahindra. All rights reserved. 24

Loading Data From Files (cont’d)
 Add the LOCAL keyword to load data from the local disk

– Copies a local file (or directory) to the table‟s directory in HDFS

hive> LOAD DATA LOCAL INPATH '/home/bob/salesdata'

INTO TABLE sales;
Or Alternatively Hadoop fs put command
$ hadoop fs -put /home/bob/salesdata /user/hive/warehouse/sales

Copyright © 2016 Tech Mahindra. All rights reserved. 25

Loading Data From Files (cont’d)
 OVERWRITE
– Direct update in hive tables is not possible in which case, load thenew
updated file with updated records using OVERWRITE command

hive> LOAD DATA INPATH '/depts/finance/salesdata'

OVERWRITE INTO TABLE sales;

Copyright © 2016 Tech Mahindra. All rights reserved. 26

Loading Data from a file to External Table
 Example
– Define the External Table and point it to a directory within HDFS.

CREATE EXTERNAL TABLE p_view_stagq (hitTime INT, userid STRING,

page_url STRING, comment STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t„
STORED AS TEXTFILE
LOCATION '/user/data/staging/p_views';

– Copy a local file into the specified HDFS directory.

$ hadoop fs -put /tmp/pv_data.txt /user/data/staging/p_views

Copyright © 2016 Tech Mahindra. All rights reserved. 27

Hive Complex types

Copyright © 2016 Tech Mahindra. All rights reserved. 28

Complex types
 Hive suppots 3 complex types
– Array
– Map
– Struct

Copyright © 2016 Tech Mahindra. All rights reserved. 29

Array type
 An array is a collection of individual atoms
 Each value in array has to be separated by a delimiter
 Elements are extracted using 0 based index
 Syntax:
CREATE TABLE <tablename>(
column_definitions,
columnname ARRAY<DATA TYPE>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘symbol’
COLLCTION ITEMS TERMINIATED BY ‘symbol’
STORED AS file_type;

Copyright © 2016 Tech Mahindra. All rights reserved. 30

Array type example
CREATE TABLE array_tab(
id INT,
name STRING,
phone ARRAY<STRING>,
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
COLLECTION ITEMS TERMINATED BY ‘|’
STORED AS TEXTFILE;

Sample Text file

1,ramakumar,8544283421|8877727733
2,rajesh,3333333222|1122334455
3,latha,8822112233|9988776655

Copyright © 2016 Tech Mahindra. All rights reserved. 31

Data retrieval : Array type
 Data can be retrieved using below syntax

SELECT <columns>,array_column_name[index] FROM tablename

 Index starts from 0

 Example:
SELECT id,name,phone FROM array_tab;
SELECT id,name,phone[1] FROM array_tab;

Copyright © 2016 Tech Mahindra. All rights reserved. 32

Map type
 Map is a combination of (key,value) pair
 Value can be retrieved using key
 Syntax
CREATE TABLE <table_name>)
column_definitions,
award MAP<key_data_type,value_data_type)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘symbol’
COLLECTION ITEMS TERMINATED BY ‘symbol’
MAP KEYS TERMINATED BY ‘symbol’
STORED AS file_type;

Copyright © 2016 Tech Mahindra. All rights reserved. 33

MAP Example
CREATE TABLE rewards(
id INT,
name STRING,
award MAP<STRING,INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
COLLECTION ITEMS TERMINATED BY ‘|’
MAP KEYS TERMINATED BY ‘#’
STORED AS TEXTFILE;

Sample Text file

1,rajesh,AOM#2002|POB#2002
2,radha,POB#2001|BRAVO#2009
3,rakesh,ACE#2013

Copyright © 2016 Tech Mahindra. All rights reserved. 34

Data retrival : MAP type
 Data can be retrieved using below syntax

SELECT <columns>,column_name[“key”] FROM tablename

 Index starts from 0

 Example:
SELECT * FROM rewards;
SELECT id,name,reward[“AOM”] FROM rewards;

Copyright © 2016 Tech Mahindra. All rights reserved. 35

STRUCT type
 A Structure is a collection of 2 or more elements placed in one column
 Each value in the structure is delimited
 Syntax
CREATE TABLE <table_name>(
column_definitions,
column_name STRUCT<elementname:data_type,
elementname:data_type>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘symbol’
COLLECTION ITEMS TERMINATED BY ‘symbol’
STORED AS file_type;

Copyright © 2016 Tech Mahindra. All rights reserved. 36

Example Struct
CREATE TABLE employee(
id INT,
name STRUCT<fname:STRING,mname:STRING,lname:STRING>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
COLLECTION ITEMS TERMINATED BY ‘|’
STORED AS TEXTFILE;

Sample Text file

1,rama|kumar|prabhala
2,rajesh||maddali

Copyright © 2016 Tech Mahindra. All rights reserved. 37

Data retrieval : Struct type
 Data is retrieved using . (dot) operator

 Examples
SELECT * FROM employee;
SELECT id,name.fname FROM employee;

Copyright © 2016 Tech Mahindra. All rights reserved. 38

DML Operations in Hive

Copyright © 2016 Tech Mahindra. All rights reserved. 39

Hive DML Operations
 Understanding ACID and Transactions in Hive
– Hive support Transactional operations on Hive tables just like traditional
RDBMS Systems
 Hive Supports DML Operations
– Insert
– Update
– Delete

Copyright © 2016 Tech Mahindra. All rights reserved. 40

Configuration for Hive DML
 Configure below options in hive configuration file and restart hive server or
issue below commands from hive prompt.

set hive.support.concurrency=true;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=2;

Copyright © 2016 Tech Mahindra. All rights reserved. 41

Create a table with ACID Properties
 Table updation is allowed only on tables that are created with transactional
facility along with bucketing feature

CREATE TABLE <table_name>(

Column list,
)
CLUSTERED BY (column_name)
INTO <number> BUCKETS
STORED AS ORCFILE
TBLPROPERTIES(„transactional‟=„true‟);

Copyright © 2016 Tech Mahindra. All rights reserved. 42

Insert Operation on Hive Table
 Inserting Single Record into Hive Table
– INSERT INTO <table_name> VALUES(…)
 Example
– INSERT INTO employee VALUES(1,’RAMA’,’10-JUN-
1996’,’MALE’)
 Inserting Multiple Records into Hive Table
– INSERT INTO <table_name> VALUES (…),(…),…
 Example
– INSERT INTO employee VALUES
(1,’RAMA’,’10-JUN-1996’,’MALE’),
(2,’RADHA’,’12-MAY-1992,’FEMALE’)

Copyright © 2016 Tech Mahindra. All rights reserved. 43

Update operation on Hive Table
 UPDATE <table_name> SET colname=value[,…] [WHERE <condition>

Copyright © 2016 Tech Mahindra. All rights reserved. 44

Delete operation on Hive Table
 DELETE FROM <table_name> [WHERE <condition>;

Copyright © 2016 Tech Mahindra. All rights reserved. 45

Sqoop – Import Data
 A frequent requirement is to load data from a table in an existing RDBMS.

 Sqoop, an open-source tool from Cloudera, helps to automate this process.

Custom
MapReduce
MySQL Sqoop programs
Hadoop reinterpret data

Via auto -
generated datatype
definitions

Copyright © 2016 Tech Mahindra. All rights reserved. 46

Sqoop Concepts
 The Sqoop program launches a Map-only MapReduce job to import the data.

 Makes multiple simultaneous connections to the RDBMS.

– Default is four

– Each connection imports a portion of the table

 Sqoop uses JDBC to connect to RDBMSs.

 Inspects the table and creates a mapping from the database column types to
the corresponding Java data types.

 Can be configured to automatically create a Hive table from the imported

data.

Copyright © 2016 Tech Mahindra. All rights reserved. 47

Sqoop Custom Connectors
 Connectors allow data transfer at much higher speeds than the standard
JDBC interface.

 These connectors are not open-source but are provided free-of-charge by

their respective OEMs (Original Equipment Manufacturers) like Netezza,
Teradata, Oracle (via Quest software).

Copyright © 2016 Tech Mahindra. All rights reserved. 48

Sqoop Syntax
 Basic Sqoop syntax to import table t1 from database db, store as tab-
separated files, and create a Hive table

sqoop import \
--username user \
--password pass \
--connect jdbc:mysql://dbserver.example.com/db \
--hive-import \
--fields-terminated-by '\t' \
--table t1

Copyright © 2016 Tech Mahindra. All rights reserved. 49

Storing Query Results in HDFS
 Writing Output into Local Filesystem or HDFS.
– INSERT OVERWRITE DIRECTORY

 Writing Output to an existing Hive Table.

– INSERT OVERWRITE

– INSERT INTO

Copyright © 2016 Tech Mahindra. All rights reserved. 50

Writing Output an Existing Hive Table
 INSERT OVERWRITE
– Copies data from one Hive table into another table

– Overwrites contents of second table

INSERT OVERWRITE TABLE t2 SELECT * FROM t;

 INSERT INTO
– Like INSERT OVERWRITE, it copies data from one table into another

– However, it appends to (rather than overwrite) the contents of the second

table

 INSERT INTO is only supported in Hive versions 0.8 and later

INSERT INTO TABLE t2 SELECT * FROM t;

Copyright © 2016 Tech Mahindra. All rights reserved. 51

Writing Output - Local file system or HDFS
 INSERT OVERWRITE DIRECTORY
– Writes query results into HDFS

– Can specify LOCAL to write to the local file system

– Data written to the file system is serialized as text

– Columns are separated by Ctrl-A characters

– Rows are separated by newlines

– Appropriate for extracting large amounts of data from Hive

 Example
– Write out the entire contents of table „t‟ to the local file system.
INSERT OVERWRITE LOCAL DIRECTORY '/path/file.dat'
SELECT * FROM t;

Copyright © 2016 Tech Mahindra. All rights reserved. 52

Multiple Databases in Hive
 Earlier versions of Hive placed all tables in the same namespace.

 This is still the default.

 Recent versions support the creation of multiple databases.

– Useful on clusters with multiple users

 Default database is named default.

Copyright © 2016 Tech Mahindra. All rights reserved. 53

Hands-On Exercise

Copyright © 2016 Tech Mahindra. All rights reserved. 54

Manipulating Data with Hive

Copyright © 2016 Tech Mahindra. All rights reserved. 55

Data retrieval in Hive
 An Introduction to HiveQL

 Retrieving Data using the Select statement

 Joining Tables

 Basic Hive Functions

 More Advanced HiveQL Queries

 Statistics and Data Mining

Copyright © 2016 Tech Mahindra. All rights reserved. 56

An Introduction to HiveQL
 HiveQL is Hive‟s query language, a dialectic of SQL
– Has many elements that are in SQL-92

– Has some additional elements that are not in SQL-92

– Supports JOINS, AGGREGATES,SUBQUERIES etc.

– Does not support UPDATE or DELETE

 Includes Hive-specific extensions

– Partitioning

– Sampling

– Complex data structures (Structs, Maps, Arrays)

– User-defined functions

– Multi-table inserts
Copyright © 2016 Tech Mahindra. All rights reserved. 57
Retrieving Data using select Statement
 Hive uses the SELECT statement to retrieve data.

SELECT expr, expr, … FROM tablename

 Expressions can be column names, function calls etc.

 FROM clause is required.

 Hive keywords are not case-sensitive.

Copyright © 2016 Tech Mahindra. All rights reserved. 58

Basic SELECT Syntax
 Tables can be aliased in Hive.
– However, Hive only accepts the following syntax

SELECT expr, expr, … FROM tablename alias

– Useful to get around a bug that won‟t allow a SELECT from a column that
has the same name as the table name

SELECT alias.t FROM t alias

– Also note that the alternative AS syntax used by some RDBMSs is not
supported in Hive

SELECT expr, expr, … FROM tablename AS alias

Copyright © 2016 Tech Mahindra. All rights reserved. 59

Limiting the rows returned
 Limit the rows returned from the query with the WHERE clause.

SELECT expr, expr, … FROM tablename WHERE condition

 Conditions can be any Boolean expression. Examples:

– countryCode = 'USA„

– id > 100000

– url LIKE '%.example.com‟

– col1 IS NULL
 Conditions can be combined using AND/OR.

Copyright © 2016 Tech Mahindra. All rights reserved. 60

Sorting the Rows returned
 HiveQL supports the ORDER BY expression.

 Can sort in descending order with ORDER BY expr DESC.

SELECT expr, expr, … FROM tablename

WHERE condition
ORDER BY expr;

 Often used with a LIMIT clause.

– Limits the output to the first n rows

SELECT expr, expr, … FROM tablename

WHERE condition
ORDER BY expr
LIMIT n;

Copyright © 2016 Tech Mahindra. All rights reserved. 61

Sorting the Rows returned(Cont’d)
 HiveQL also supports the SORT BY expression
– Sorts the rows before feeding to a reducer

– Sort order is dependent upon the column type

– i.e. if numeric then numeric order, if string then lexicographical
(alphabetical) order.

SELECT expr, expr, … FROM tablename

WHERE condition
SORT BY expr;

Copyright © 2016 Tech Mahindra. All rights reserved. 62

SORT BY and ORDER BY
 SORT BY
– May use multiple reducers for final output

– Sorts the data per reducer

– Only guarantees ordering of rows within reducers

– May give partially ordered final results

 ORDER BY
– Uses a single reducer to guarantee total order in output

– Single reducer will take long to sort very large outputs

– One reducer writes to one file in HDFS, this could fill the disk on a
particular node

– Use the LIMIT clause to minimize sort time

Copyright © 2016 Tech Mahindra. All rights reserved. 63

DISTRIBUTE BY
 DISTRIBUTE BY
– Distributes the rows among reducers

– Does not guarantee clustering or sorting properties

– Useful if there is a need to partition and sort the output of the query for
subsequent queries

Copyright © 2016 Tech Mahindra. All rights reserved. 64

DISTRIBUTE BY (cont’d)
 EXAMPLES

SELECT pets FROM t1 DISTRIBUTE BY pets;

Reducer 1

Table t1 Cat Final output

Fish Cat, Fish, Cat
Cat Cat Cat
Fish Fish
Dog Cat
Cat Reducer 2 Dog
Dog Dog
Bird Dog Bird
Dog Dog, Dog, Bird
Bird

Copyright © 2016 Tech Mahindra. All rights reserved. 65

CLUSTER BY
 CLUSTER BY
– Combination of DISTRIBUTE BY and SORT BY

– Rows with the same keys are distributed to the same reducer

– Clustered in adjacent position and sorted per reducer

Copyright © 2016 Tech Mahindra. All rights reserved. 66

CLUSTER BY
 EXAMPLE

SELECT pets FROM t1 CLUSTER BY pets;

Reducer 1

Cat Final output

Table t1
Fish Cat, Cat, Fish
Cat Cat
Cat
Cat
Fish
Fish
Dog Reducer 2 Bird
Cat
Dog
Dog Dog
Bird, Dog, Dog Dog
Bird Dog
Bird

Copyright © 2016 Tech Mahindra. All rights reserved. 67

Grouping the rows returned
 The GROUP BY aggregate function is used to group the retrieved data by one
or more columns

SELECT gender, count (DISTINCT userid )

FROM users
GROUP BY gender;

 Also used with multiple aggregations

SELECT gender, count (DISTINCT userid ), count(*)

FROM users
GROUP BY gender;

Copyright © 2016 Tech Mahindra. All rights reserved. 68

Hands-On Exercise

Copyright © 2016 Tech Mahindra. All rights reserved. 69

Joining Tables

Copyright © 2016 Tech Mahindra. All rights reserved. 70

Joining Tables
 A frequent requirement is to join two or more tables together.

 Hive supports
– Inner joins

– Left Outer joins

– Right Outer joins

– Full Outer joins

– Left semi joins

– Returns rows from the first table when one or more matches are found
– in the second table (e.g. list the departments in a table that have at
– least one employee)

Copyright © 2016 Tech Mahindra. All rights reserved. 71

Default Join Behavior
 Reduce side join
– Mapper reads the tables and emits the join key and join value
– Reducer performs the actual join

SELECT * FROM A
JOIN B ON
A.Value=B.value
Map Reduce
Task
Table Table
Mapper Mapper
A B
Mapper Mapper
Mapper Mapper

Shuffle and Sort

Reducer
Copyright © 2016 Tech Mahindra. All rights reserved. 72
Join Optimization
 Map-side join
– Reads the smaller table to an in-memory hash table

– Each mapper gets its own copy of the hash table

– The join is performed in the mapper

– No reducer or sort and shuffle phases are required

SELECT /+mapjoin(B)/ * FROM A JOIN B ON

A.Value = B.Value;

Copyright © 2016 Tech Mahindra. All rights reserved. 73

Map-side v/s Reduce-side Joins
 There are two fundamental ways to join dataset in MapReduce
– Map-side join

– Reducer-side join

 Map-side Joins
– Function similar to an RDBMS joins

– Scans the smaller table in memory for each record found in the larger table

– Smaller table is shipped with the job, replicated and loaded into memory

– Larger table can be filtered in the mapper

– i.e. if there is no row match the row isn‟t sent across the network to the
reducer

Copyright © 2016 Tech Mahindra. All rights reserved. 74

Map- side v/s Reduce- side Joins (cont’d)
 Pros
– More efficient, but requires one dataset to be small enough to be held in the
memory
 Cons
– All data is shipped over the network to the reducers

– Wasteful for inner joins where there are few matches between tables

– Previous versions of Hive needed to be told if they could use Map-side join
via a „hint‟

– An extra clause in the HiveQL query

– Largest version of Hive can work this out for themselves, based on the side
of the input data
– To enable this, set hive.auto.convert.join = true

Copyright © 2016 Tech Mahindra. All rights reserved. 75

Joining Tables: Syntax
 Hive uses the following syntax to join tables.

SELECT cols FROM t1 JOIN t2 ON (condition) JOIN

t3 ON (condition)

 JOIN can be replaced with LEFT OUTER JOIN,RIGHT OUTER JOIN,FULL

OUTER JOIN or LEFT SEMI JOIN.

Copyright © 2016 Tech Mahindra. All rights reserved. 76

Join Examples
 Consider two tables:
Customer Table

custid firstname surname

1 Jane Doe
2 Tom Smith
3 Doug Jones

Purchases Table

custid order_date cost

1 2010-01-15 42.99
2 2010-03-21 19.65
2 2010-04-01 1.00
1 2010-04-18 170.16
Copyright © 2016 Tech Mahindra. All rights reserved. 77
Inner Join
 An inner join returns all rows where the condition is satisfied.

SELECT * FROM Customer JOIN Purchases

ON (customer.custid = purchases.custid)

custid firstname surname custid order_date cost

1 Jane Doe 1 2010-01-15 42.99

1 Jane Doe 1 2010-04-18 170.16

2 Tom Smith 2 2010-03-21 19.65

2 Tom Smith 2 2010-04-01 1.00

Copyright © 2016 Tech Mahindra. All rights reserved. 78

Outer join
 A left outer join returns all the rows in the first table even if there are no
matches.

SELECT C.custid, firstname, surname, order_date

FROM Customer C LEFT OUTER JOIN Purchases P
ON (C.custid = P.custid)

custid firstname surname order_date

1 Jane Doe 2010-01-15
1 Jane Doe 2010-04-18
2 Tom Smith 2010-03-21
2 Tom Smith 2010-04-01
3 Doug Jones NULL

Copyright © 2016 Tech Mahindra. All rights reserved. 79

Left Semi Joins
 A left semi join is an optimized IN/EXISTS sub-query.
– Performs a map-side group by to reduce data flowing to reducers

– Allows for an early exit if match in join

SELECT A.*
FROM A LEFT SEMI JOIN B
ON (A.KEY = B.KEY AND B.VALUE >100);

– Equivalent IN/EXISTS subquery

SELECT A.*
FROM A WHERE A.KEY IN
( SELECT A.KEY = B.KEY AND B.VALUE >100);

Copyright © 2016 Tech Mahindra. All rights reserved. 80

Identifying Unmatched Records
 Outer joins are useful for finding unmatched records.

 Example: Find customers who have not made purchases.

SELECT C.custid, firstname, surname

FROM Customer C LEFT SEMI JOIN Purchases P
ON (C.custid = P.custid) WHERE P.custid IS NULL;

custid firstname surname

3 Doug Jones

Copyright © 2016 Tech Mahindra. All rights reserved. 81

Joining Multiple tables
 Syntax for joining multiple tables

SELECT cols FROM

t1 JOIN t2 ON (condition)
JOIN t3 ON (condition)

Copyright © 2016 Tech Mahindra. All rights reserved. 82

Basic Hive Functions

Copyright © 2016 Tech Mahindra. All rights reserved. 83

Hive Functions
 Hive provides many built-in functions
– Mathematical

– Date

– Conditional

– String

– Aggregate functions

 Hive also supports user-defined functions.

Copyright © 2016 Tech Mahindra. All rights reserved. 84

Numeric Functions
 Numeric functions include
– round()
– floor()
– ceil()
– rand()
– exp()
– log()
– sqrt()
– abs()
– sin()
– cos()
– etc.

Copyright © 2016 Tech Mahindra. All rights reserved. 85

String Functions
 String functions include
– length()
– concat()
– substr()
– upper()
– lower()
– trim()
– regexp_replace()
– etc.

Copyright © 2016 Tech Mahindra. All rights reserved. 86

Date Functions
 Date functions include
– unix_timestamp()
– from_unixtime()
– to_date()
– year()
– month()
– day()
– date_add()
– date_sub()
– datediff()
– etc.

Copyright © 2016 Tech Mahindra. All rights reserved. 87

Aggregate Functions
 Aggregate functions include
– sum()
– avg()
– min()
– max()
– stddev_pop()
– stddev_sample()
– percentile()
– etc.

Copyright © 2016 Tech Mahindra. All rights reserved. 88

Multi-table insert
 Hive SELECT queries can take a long time.

 We sometime need to extract data to a multiple tables based on the same

query.

 Hive provides an extension to SQL to do this

– „Multi-table insert‟

– Only requires the query to be run once

FROM (SELECT ...) alias

INSERT OVERWRITE TABLE t1 ...
INSERT OVERWRITE TABLE t2 ...
...

Copyright © 2016 Tech Mahindra. All rights reserved. 89

Multi-table insert: Example
 Multi-table insert syntax

FROM (SELECT * FROM movies WHERE documentary = 1 ) docs

INSERT OVERWRITE DIRECTORY 'docs_names'
SELECT movie_name
INSERT OVERWRITE DIRECTORY 'docs_count'
SELECT count(1);

Copyright © 2016 Tech Mahindra. All rights reserved. 90

More advanced HiveQL
features

Copyright © 2016 Tech Mahindra. All rights reserved. 91

Creating tables based on existing data
 Hive supports creating table based on SELECT statement.
– Often known as „Create table as select‟, or CTAS

CREATE TABLE newtable AS

SELECT col1, col2 FROM existing table

 Column definitions are derived from the existing table

 Column names are inherited from the existing names.

– Use aliases in the SELECT statement to specify new names

Copyright © 2016 Tech Mahindra. All rights reserved. 92

Subqueries
 Hive supports sub-queries in the FROM clause of the SELECT statement.
– It does not support correlated sub-queries

SELECT col FROM

(SELECT col_a + col_b AS col FROM t1 ) t2

 The sub-query must be given a name.

– t2 in the example above

 Hive supports arbitrary levels of sub-queries

Copyright © 2016 Tech Mahindra. All rights reserved. 93

Views
 Hive supports Views.
– Syntax

CREATE VIEW v (col_names)

AS SELECT ...

 Column name list is optical.

– Will use column names from the underlying table(s) if omitted

– Views are not materialized

– Content changes as the underlying tables change.

 Views cannot be modified directly

– Cannot be used as the target for e.g., INSERT OVERWRITE

Copyright © 2016 Tech Mahindra. All rights reserved. 94

Views (cont’d)
 Views may contain ORDER BY and LIMIT clauses.

 When a View is used, the underlying query is executed to retrieve the View‟s
contents.

 To remove a View

DROP VIEW v

Copyright © 2016 Tech Mahindra. All rights reserved. 95

UNION
 Hive supports UNIONS
– Combine the results from multiple SELECT statements into a single result
set

– The number and names of columns returned by each select statement has
to be the same

– Only UNION ALL is supported

– i.e. duplicates are not eliminated

select_statement UNION ALL select_statement

UNION ALL select_statement ...

Statistics and Data Mining

N-gram Frequency Estimation
 What is an N-gram?

– A contiguous sequence of n items from a given sequence of text

– Used to find frequently occurring related items

 NGRAMS()

 Find the k most frequent n-grams from one or more sequences

 CONTEXT_NGRAMS()

 Expands ngrams() by specifying a „context‟ string around which n-grams

are to be estimated

– Example
 Return a list of the top 100 words that follow the phrase "I love" in a
hypothetical database of Twitter tweets

N-gram Frequency Estimation (cont’d)
SELECT context_ngrams(sentences(lower(tweet)),
array("I","love", null), 100) FROM twitter;

 NGRAMS() Example Use Cases

 Find important topics in text in conjunction with a stopword list

 Find trending topics in text

 Find frequency accessed URL sequences

 CONTEXT_NGRAMS() Example Use Cases

 Extract marketing intelligence around certain words (e.g., “ I love…”)

 Find frequently accessed URL sequences that sort or end at a

particular URL

 Pre-compute common search look ahead

Partitioning and Bucketing

What is Partitioning?
 Partitioning a dataset means splitting it into smaller portions based on the
value in a column.

 This is some times known as „horizontal partitioning‟.

 Hive partitions are stored in subdirectories of the table directory.

Why Partitioning?
 Partitioning allows hive to filter at the input path level.
– Non-matching data is never read

 Splitting a table into partitions is useful if many queries refer to a specific

column.

 For e.g. - A table containing log entries

– Much of the analysis looks at individual days

– By partitoning the data based on the date column, only a subset of the data
needs to be scanned while executing the query

– Since each partition is stored in a different directory, only the data in the
relevant directories needs to be read

– Any queries which do not specify the date will still work, however in such
cases, the entire data set will have to be scanned

Creating a Partitioned table in Hive
 To create a partitioned table specify a column like this

CREATE TABLE t (column_spec)

PARTITIONED BY (column_name datatype)
ROW FORMAT ...

 Example

CREATE TABLE logs (uid INT, action STRING)

PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'

Creating a Partitioned table in Hive
 The partitioned column is displayed if you DESCRIBE the table

hive> DESCRIBE logs;

OK
uid INT
action STRING
dt STRING

 However, the partition is a „virtual column‟

– The data does not exist in your incoming data

– Instead, the partition is specified when loading the data

Loading Data into Partitions
 To load data into a partition in the table, use

LOAD DATA INPATH '/path/to/table'

INTO TABLE t
PARTITION (col=val)

 Example:

LOAD DATA INPATH '/user/ian/mylogs'

INTO TABLE logs
PARTITION (dt='2012-01-15')

Querying a Partitioned Table
 When queries include the partition column in the WHERE clause, Hive only
needs to process the relevant subdirectories not the entire data set.

 The filtering is done by the Hive interpreter, using data from Hive‟s Metastore.
– Supports =, !=, <, <=, >, >=, and LIKE for strings

– Conditions can be chained together with AND and OR

Dynamic Partition Inserts
 If your data already exists in a table, recent versions of Hive can dynamically
insert the data into specific partitions for you.

 Syntax:

FROM employees
INSERT OVERWRITE TABLE emps_by_dept PARTITION(dept)
SELECT first_name, last_name, address, phone, dept;

 Partitions are automatically created based on the value of the last column.
– If the partition does not already exist, it will be created

– If the partition does exist, it will be overwritten

Dynamic Partition Inserts
 Three Hive configuration properties exist to limit this:
 hive.exec.max.dynamic.partitions.pernode
– Maximum number of dynamic partitions that can be created by any given
Mapper or Reducer

– Default 100

 hive.exec.max.dynamic.partitions
– Total number of dynamic partitions that can be created by one.

– HiveQL statement

– Default 1000

 hive.exec.max.created.files
– Maximum total files created by Mappers and Reducers

– Default 100000

Sub-Partitions
 A table can have sub-partitions.

 Example
CREATE TABLE t (col_spec)
PARTITION(dt STRING, country STRING)

 Second partition will be a subdirectory of the first partition‟s directory.

– E.g. create a partition year, a sub-partition month and another sub-partition
for date

 Multiple partitions can be used during a multi-partition insert.

– The dynamically defined partition must be the last partition in the list

Dropping/Adding Partitions
 Hive‟s ALTER TABLE statement provides the ability to drop or add a partition.

 Adding multiple partitions example

ALTER TABLE page_view ADD PARTITION (dt='2008-08-08',
country='us') location '/path/to/us/part080808’;

ALTER TABLE page_view ADD PARTITION (dt='2008-08-09',

country='us') location '/path/to/us/part080809';
 Removing partition example

ALTER TABLE page_view DROP PARTITION

(dt='2008-08-08', country='us');

What is Bucketing?
 Bucketing data is similar to partitioning data.

 Data is split into buckets based on the hash value of the column of the
incoming data.

 Intended to produce an even distribution of rows across n buckets.

 Useful for jobs which need to „sample‟ data from the table.
– Jobs which just work on a „random‟ portion of the data

Creating a bucketed table
 Syntax to create a bucketed table

CREATE TABLE t (col_spec)

CLUSTERED BY (col) INTO n BUCKETS

Inserting data into the bucketed table
 To insert data into the bucketed column, first insert it into a „helper‟ table (with
no bucketing) and then fire the following commands

hive>SET mapred.reduce.tasks=( number-of-buckets)

hive>SET hive.enforce.bucketing=true;
hive>INSERT OVERWRITE TABLE target_bucketed_table
SELECT * FROM helper_table;

Inserting data into the bucketed table
 Hive implements bucketing by running mapReduce job on the data in the
helper table.

Sampling data from a bucketed table
 To use the bucketed table, use the following syntax:

SELECT * FROM bucketed_table

TABLESAMPLE (bucket 1 OUT OF 4 ON col)

 If the table was bucketed into the 4 buckets on col, this would choose just the
rows in bucket 1

 If the table was bucketed into the 16 buckets on col, this would choose the
rows in bucket 1, 5, 9, 13

Hands-On Exercise

Advanced Hive Features

Advanced Hive Features
 Hive Variables

 The Hive CLI

 Hive and Thrift

 TRANSFORM

 Creating User-Defined Functions and SerDes

 Debugging and Troubleshooting Hive Queries

Hive Variables
 The Hive variable substitution mechanism was designed to avoid some of the
code that was getting baked into the scripting language on top of Hive.

 Variables can be set in two ways

– Through Hive shell

– By invoking Hive from the command line

 Within the Hive shell

hive>SET var=val

 To use the variable‟s value in a HiveQL query

hive>SELECT * FROM tbl WHERE col=${hiveconf:var};

Hive Variables (cont’d)
 Invoke Hive from the command line.

 This makes repetitive operations easier, without modifying any HiveQL For
example, imagine that the following queries in region.hql helping us to find all
employees in the north region with this command

$ hive –hiveconf var=val

SELECT * FROM employees WHERE region =

'${hiveconf:REGION}';

$ hive -hiveconf REGION=north -f region.hql

Hive Command Line Interface
 Usage: hive [-hiveconf x=y]* [<-i filename>]* [<-f filename>|<-e query-string>]

 -i <filename> Initialization Sql from file (executed automatically and

silently before any other commands)

 -e Quoted query string' Sql from command line

 -f <filename> Sql from file

 -v Verbose mode (echo executed SQL to the console)

 -p <port> connect to Hive Server on port number

 -hiveconf x=y Use this to set hive/hadoop configuration variables

 -e and -f Cannot be specified together. In the absence of these

options, interactive
 -i Can be used to execute multiple init scripts

What is Thrift?
 Thrift is a software framework.
– Allows for scalable cross/language services development
– complete stack for creating clients and servers
– Build services that work seamlessly between languages
– It is used as a remote procedure call (RPC) framework and was developed
at Facebook for scalable cross-language services development.

 Allows use of other languages with Hive.

– E.g., C++, Java, Python, PHP, Ruby, Perl, C#, JavaScript, etc.

Using TRANSFORM to Process Data
Using External Scripts
 Hive allows you to transform data through external scripts or programs.These
can be written in nearly any language

 This is done with HiveQL‟s TRANSFORM ... USING construct.

– One or more fields are supplied as arguments to TRANSFORM()

– The external script is identified by USING

 It receives each record, processes it, and returns the result

 Use ADD FILE to distribute the script to slave nodes in the cluster.

hive> ADD FILE myscript.pl;

hive> SELECT TRANSFORM(*) USING 'myscript.pl'
FROM employees;

Hive TRANSFORM Example
 Let‟s look at a complete example of using TRANSFORM in Hive.
– The perl script parses an e/mail address, determines to which country it
corresponds, and then returns an appropriate greeting

– Here‟s a sample of the input data

hive>SELECT name, email_address FROM employees;

Antoine antoine@example.fr
Kai kai@example.de
Pedro pedro@example.mx

Hive TRANSFORM Example (cont’d)
– Here‟s the corresponding HiveQL code.


hive> ADD FILE greeting.pl;

hive> SELECT TRANSFORM(name, email_address)
USING 'greeting.pl' AS greeting FROM employees;
hive>SELECT name,email_address FROM employees;
Antoine antoine@example.fr
Kai kai@example.de
Pedro pedro@example.mx

Hive TRANSFORM Example (cont’d)
 The perl script for this example is shown below.
– A complete explanation of this script follows on the next few slides.

#!/usr/bin/env perl
greetings = ('de' => 'Hallo',
'fr' => 'Bonjour',
'mx' => 'Hola');
while (<STDIN>)
{
($name, $email) = split /\t/;
($suffix) = $email =~ /\.([a-z]+)/;
$greeting = $greetings{$suffix};
$greeting = 'Hello' unless defined($greeting);
print $greeting $name\n ;
}

Hive TRANSFORM Example (cont’d)
– Finally, let‟s see the result of the transformation

hive> ADD FILE greeting.pl;

hive> SELECT TRANSFORM(name, email_address)
USING 'greeting.pl' AS greeting
FROM employees;

Bonjour Antoine
Hallo Kai
Hola Pablo

Transform/MapReduce Typing
– Hive also supports the MAP and REDUCE keywords for external
processing.

– Allows to embed custom mapper and reducer scripts

– The input and output formats are the same as with TRANSFORM

– Example usage
FROM (
FROM pv_users
MAP pv_users.userid, pv_users.date
USING 'map_script„
AS dt, uid
CLUSTER BY dt) map_output
INSERT OVERWRITE TABLE pv_users_reduced
REDUCE map_output.dt, map_output.uid
USING 'reduce_script'
AS date, count;
Copyright © 2016 Tech Mahindra. All rights reserved. 128
Creating User/Defined Functions (UDFs)
– User-defined functions (UDFs) can be used in HiveQL queries in the same
way as standard functions.

– UDFs are written in Java.

– Create a class which extends the UDF class

– Class should contain a method named evaluate

– Hadoop supports three forms of UDF

– Standard UDFs

– User/Defined Aggregate Functions (UDAFs)

– Take multiple input values, one per row, return a single value

– User/Defined Table Functions (UDTFs)

 Take a single row as input, return multiple rows as output

Creating Custom UDFs
– First, create a new class that extends UDF, with one or more methods
named evaluate.

– For example
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public final class Lower extends UDF {
public Text evaluate(final Text s)
{
if (s == null)
{
return null;
}
return new
Text(s.toString().toLowerCase());
}
}
Copyright © 2016 Tech Mahindra. All rights reserved. 130
Deploying the Jar
– Compile the code to a jar and add to the Hive classpath. Then the jar will
then be on the classpath for all jobs initiated from that session
hive> add jar my_jar.jar;
hive> list jars;

– The final step is to register the function.

hive> create temporary function my_lower as

'com.example.hive.udf.Lower';

Calling UDFs in Hive
– The UDF can be used in HiveQL

hive> SELECT my_lower(title), sum(freq) FROM titles

GROUP BY my_lower(title);
...
Ended Job = job_200906231019_0006
OK
cmo 13.0
vp 7.0

Debugging Hive Queries
– Troubleshooting Hive queries can be difficult.
– The Hive interpreter turns the query into a set of MapReduce jobs

– Those jobs are run on the cluster

– It can be difficult to determine what part of the query caused the problem

– Hadoop‟s JobTracker Web UI displays the progress of MapReduce jobs on

the cluster
– Accessed at http://job_tracker_address:50030

Thank You

Disclaimer
Tech Mahindra Limited, herein referred to as TechM provide a wide array of presentations and reports, with the contributions of
various professionals. These presentations and reports are for informational purposes and private circulation only and do not
constitute an offer to buy or sell any securities mentioned therein. They do not purport to be a complete description of the markets
conditions or developments referred to in the material. While utmost care has been taken in preparing the above, we claim no
responsibility for their accuracy. We shall not be liable for any direct or indirect losses arising from the use thereof and the viewers are
requested to use the information contained herein at their own risk. These presentations and reports should not be reproduced, re-
circulated, published in any media, website or otherwise, in any form or manner, in part or as a whole, without the express consent in
writing of TechM or its subsidiaries. Any unauthorized use, disclosure or public dissemination of information contained herein is
prohibited. Unless specifically noted, TechM is not responsible for the content of these presentations and/or the opinions of the
presenters. Individual situations and local practices and standards may vary, so viewers and others utilizing information contained
within a presentation are free to adopt differing standards and approaches as they see fit. You may not repackage or sell the
presentation. Products and names mentioned in materials or presentations are the property of their respective owners and the
mention of them does not constitute an endorsement by TechM. Information contained in a presentation hosted or promoted by
TechM is provided “as is” without warranty of any kind, either expressed or implied, including any warranty of merchantability or
fitness for a particular purpose. TechM assumes no liability or responsibility for the contents of a presentation or the opinions
expressed by the presenters. All expressions of opinion are subject to change without notice.

Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hive Main
No ratings yet
Hive Main
33 pages
Hive Final
No ratings yet
Hive Final
75 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
No ratings yet
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
91 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Hive
No ratings yet
Hive
45 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
HIVE
No ratings yet
HIVE
28 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
61 pages
Apache Hive
No ratings yet
Apache Hive
30 pages
HIVE
No ratings yet
HIVE
80 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Apache Hive Lessons For Beginner
No ratings yet
Apache Hive Lessons For Beginner
93 pages
Hive Basics
No ratings yet
Hive Basics
35 pages
BDA Hive
No ratings yet
BDA Hive
22 pages
07 Hive 01
No ratings yet
07 Hive 01
21 pages
Practical-2 Hive (Show - Create - Load Commands)
No ratings yet
Practical-2 Hive (Show - Create - Load Commands)
13 pages
BDA011GU04
No ratings yet
BDA011GU04
49 pages
Hive Part 2
No ratings yet
Hive Part 2
53 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
HiveQL Overview
No ratings yet
HiveQL Overview
71 pages
Hive
No ratings yet
Hive
29 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hive
No ratings yet
Hive
49 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Hive Part 2
No ratings yet
Hive Part 2
47 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Hive Intoduction and Tables
No ratings yet
Hive Intoduction and Tables
31 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
51 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Hive
No ratings yet
Hive
65 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Unit IV
No ratings yet
Unit IV
64 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Unit V BD LM Cse
No ratings yet
Unit V BD LM Cse
34 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Object Oriented Python Tutorial
100% (21)
Object Oriented Python Tutorial
111 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (15)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Machine Learning From Scratch PDF
89% (9)
Machine Learning From Scratch PDF
124 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Let Us Python by Yashavant Kanetkar
89% (27)
Let Us Python by Yashavant Kanetkar
429 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Coffee Break NumPy PDF
100% (7)
Coffee Break NumPy PDF
211 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
Nick Singh - Ace The Data Science Interview
70% (10)
Nick Singh - Ace The Data Science Interview
241 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
Python Basics for Beginners
100% (12)
Python Basics for Beginners
2 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Data Engineering Cookbook
90% (10)
Data Engineering Cookbook
88 pages
Deep Learning With Scikit-Learn and PyTorch (2024)
100% (2)
Deep Learning With Scikit-Learn and PyTorch (2024)
126 pages
DS Interview Questions Guide 365DataScience
100% (6)
DS Interview Questions Guide 365DataScience
111 pages
200 Python Practice Exercises 1687850509
89% (9)
200 Python Practice Exercises 1687850509
122 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (17)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
EBOOK - Python Crash Course For Data Analysis
100% (12)
EBOOK - Python Crash Course For Data Analysis
168 pages
LLMs Guide for Developers & Data Scientists
100% (15)
LLMs Guide for Developers & Data Scientists
132 pages
Python Machine Learning Workbook For Beginners
100% (1)
Python Machine Learning Workbook For Beginners
264 pages
Head First Python PDF
96% (26)
Head First Python PDF
494 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
100% (9)
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
339 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Apress Understanding Large Language Models B0CJ2C8TXQ
100% (12)
Apress Understanding Large Language Models B0CJ2C8TXQ
166 pages
Create LLM Application Using Langchain With Ease
100% (5)
Create LLM Application Using Langchain With Ease
12 pages
Tolet 3
No ratings yet
Tolet 3
2 pages
Magento 2 Custom Theme Guide
No ratings yet
Magento 2 Custom Theme Guide
12 pages
San Bill
No ratings yet
San Bill
1 page
Byju
No ratings yet
Byju
2 pages
Magento Testing Framework Guide Magento2 Extensions and Modules
No ratings yet
Magento Testing Framework Guide Magento2 Extensions and Modules
3 pages
Viktor Khliupko (Auth.) - Magento 2 DIY - Apress (2017) PDF
100% (2)
Viktor Khliupko (Auth.) - Magento 2 DIY - Apress (2017) PDF
181 pages
How Does Database Indexing Work
No ratings yet
How Does Database Indexing Work
4 pages
Tutorial 25:: Simple Arguments
No ratings yet
Tutorial 25:: Simple Arguments
3 pages
Overloading and Overriding in PHP March 24
No ratings yet
Overloading and Overriding in PHP March 24
3 pages
PSR 0 & 2
No ratings yet
PSR 0 & 2
23 pages
MySQL JOIN Types Explained
No ratings yet
MySQL JOIN Types Explained
1 page
Tips On How To Prioritize Task
50% (2)
Tips On How To Prioritize Task
2 pages
Abstract Classes and Interface in PHP March 24
No ratings yet
Abstract Classes and Interface in PHP March 24
6 pages
Magento Cache System Guide
No ratings yet
Magento Cache System Guide
1 page
Override Magento Core Files
No ratings yet
Override Magento Core Files
3 pages
Full Page Caching in Magento 2 For Humans: 2016 / Opatija / Croatia
No ratings yet
Full Page Caching in Magento 2 For Humans: 2016 / Opatija / Croatia
29 pages
How To Log All Magento SQL
No ratings yet
How To Log All Magento SQL
2 pages
Magento URL Rewrite Guide
No ratings yet
Magento URL Rewrite Guide
7 pages
What Is The Difference Between Order Status and Order State
No ratings yet
What Is The Difference Between Order Status and Order State
2 pages
Agile Requirements Change Management
No ratings yet
Agile Requirements Change Management
5 pages
Magento Custom Module Development
No ratings yet
Magento Custom Module Development
9 pages
Models, Resource Models and Collections in Magento
No ratings yet
Models, Resource Models and Collections in Magento
2 pages
Magento Interview Questions and Answers
No ratings yet
Magento Interview Questions and Answers
14 pages
What Is Difference Between $ - Product - Setdata & $ - Product - Save ?
No ratings yet
What Is Difference Between $ - Product - Setdata & $ - Product - Save ?
1 page
Enable Backorders in Magento
No ratings yet
Enable Backorders in Magento
2 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Question Bank
No ratings yet
Question Bank
12 pages
AWS DAS-C01 Sample Questions
No ratings yet
AWS DAS-C01 Sample Questions
5 pages
Digital and Web Analyst (AutoRecovered)
No ratings yet
Digital and Web Analyst (AutoRecovered)
39 pages
Big Data: Unlocking Business Growth
No ratings yet
Big Data: Unlocking Business Growth
22 pages
Bi & Analytics On A Data Lake
No ratings yet
Bi & Analytics On A Data Lake
57 pages
SumanaV Bigdata
No ratings yet
SumanaV Bigdata
6 pages
Unit 3
No ratings yet
Unit 3
44 pages
Big Data & Hadoop Developer Resume
No ratings yet
Big Data & Hadoop Developer Resume
8 pages
Lab6F - Creating Hive Table With Complex Data Type
No ratings yet
Lab6F - Creating Hive Table With Complex Data Type
11 pages
Immediate Download Big Data Anil Maheshwari Ebooks 2024
100% (3)
Immediate Download Big Data Anil Maheshwari Ebooks 2024
66 pages
Distributed Parallel Architecture For "Big Data"
No ratings yet
Distributed Parallel Architecture For "Big Data"
12 pages
Bda PJ Report
No ratings yet
Bda PJ Report
24 pages
Bda Unit-4 PDF
No ratings yet
Bda Unit-4 PDF
63 pages
Big Data Engineering Curriculum
No ratings yet
Big Data Engineering Curriculum
4 pages
Database Normalization & Big Data Analysis
No ratings yet
Database Normalization & Big Data Analysis
7 pages
Mastering Spark SQL PDF
100% (1)
Mastering Spark SQL PDF
1,776 pages
Hive Setup for Data Engineers
No ratings yet
Hive Setup for Data Engineers
8 pages
Voltage - SecureData - Hadoop - 5.0 - Jul2022Update - Developer 1
No ratings yet
Voltage - SecureData - Hadoop - 5.0 - Jul2022Update - Developer 1
338 pages
Cloudera Administration
No ratings yet
Cloudera Administration
481 pages
Big Data & Hadoop Course Overview
No ratings yet
Big Data & Hadoop Course Overview
3 pages
M Tech Artificial Intelligence - MODIFIED
No ratings yet
M Tech Artificial Intelligence - MODIFIED
40 pages
Unit 5 PIG&HIVE
No ratings yet
Unit 5 PIG&HIVE
115 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
The Data Analytics Handbook2 PDF
100% (6)
The Data Analytics Handbook2 PDF
43 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Hadoop Learning Guide for Beginners
No ratings yet
Hadoop Learning Guide for Beginners
54 pages
Advanced Data Cleaning Techniques With PySpark
No ratings yet
Advanced Data Cleaning Techniques With PySpark
25 pages
Big Data Analystics - Bhuvaneswari - Contents
No ratings yet
Big Data Analystics - Bhuvaneswari - Contents
7 pages
Big Data Exercise List
No ratings yet
Big Data Exercise List
2 pages