0% found this document useful (0 votes)

50 views60 pages

Unit Iv Part - 1

The document discusses the architecture and components of Hive including the user interface, driver, compiler, metastore, execution engine. It also covers how to get started with Hive by installing and configuring it. The document further explains Hive services, variables, properties, queries, data types, DDL, and DML operations in Hive like creating, altering and manipulating tables.

Uploaded by

Nithya Naraparaju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views60 pages

Unit Iv Part - 1

Uploaded by

Nithya Naraparaju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 60

19ECS442

BIG DATA
UNIT IV
Introducing Hive
• Hive provides a Structured Query Language (SQL) interface,
HiveQL, or the Hive Query Language.
• This interface translates the given query into a MapReduce code.
• Hive can be seen as a mechanism through which one can access the
data stored in the HDFS.
• HiveQL enables users to perform tasks using the MapReduce
concept but without explicitly writing the code in terms of the map
and reduce functions.
• The data stored in HDFS can be accessed through HiveQL, which
contains the features of SQL but runs on the MapReduce
framework.
Introducing Hive
• It should be noted that Hive is not a complete database and is
not meant to be used in Online Transactional Processing
Systems, such as online ticketing, bank transactions, etc.
• It is mostly used in data warehousing kind of applications,
where you need to perform batch processing on a huge
amount of data.
• Typical examples of this kind of data include Web logs, call
data records, weather data, etc.
The architecture of Hive consists of various
components.
• User Interface (UI)—Allows you to submit queries to the Hive
system for execution.

• Driver—Receives the submitted queries. This driver component

creates a session handle for the submitted query and then sends the
query to the compiler to generate an execution plan.

• Compiler—Parses the query, performs semantic analysis on

different query blocks and query expressions, and generates an
execution plan
• Metastore—Stores all the information related to the structure of the
various tables and partitions in the data warehouse.
• It also includes column and column type information and the serializers
and deserializers necessary to read and write data.
• It also contains information about the corresponding HDFS files where
your data is stored.
• Execution Engine—Executes the execution plan created by the
compiler. The plan is in the form of a Directed Acyclic Graph (DAG) to
be executed in various stages.
• This engine manages the dependencies between the different stages of a
plan and is also responsible to execute these stages on the appropriate
system components.
Getting Started with Hive
As we know that before using Hive, we need to first install and set the
home variable to use Hive on Hadoop.
The steps for installing Hive on Hadoop are as follows:
Installing Hive with embedded metastore:
1. Download the latest version of Hive.
2. Untar the package:
$tar –xzvf apache-hive-0.13.1-bin.tar.gz
3. Add following to ~/.bash_
profile:
$sudo nano ~/.bash_profile
export HIVE_HOME=/home/hduser/hive-0.13.1
export PATH=$PATH:$HIVE_HOME/bin
Where hduser is the user name and Hive-0.13.1 is the Hive directory
extracted from tar.
4. Install a sql database like MySql on the same machine where you
want to run Hive.
5. Add following entries to Hive-site.xml
6. Run Hive from terminal:
Hive commands to be used in the shell interactive mode:
In the preceding examples, the set command prints a list of configuration
variables, the select query displays the columns beginning with my. from
the mytable table, and dfs –ls executes the dfs command to list the
directory content
4.2 Hive Services
Hive services are as follows:
• CLI—It is the black window or the panel that we get after the
installation of Hive.
• This is nothing but a command line interface of Hive. This is an
inbuilt default service present in Hive.
• Hive server—Runs Hive as an integrated server to expose a thrift
service, which integrates the access to the number of clients that are
written in different types of languages.
• There are many applications like JDBC, ODBC connectors that
need to execute a Hive server to get in communication loop with
Hive.
• Hive Web Interface (HWI)—The Hive Web Interface is the GUI
of Hive on which we can execute the queries. It is an alternative to
the shell.
HWI uses the following command:

JAR—Hive is somewhat equal to Hadoop JAR, as it is

convenient to run Java applications including the java
applications, Hadoop, and Hive classes on the classpath.
It is not required to make a special jar to do factual analysis.
Hive itself does it just by a simple query.
Hive Services
• Metastore—It is the service that runs with the Hive services
whenever Hive starts. This is a default process.
• Using the metastore service, it is possible to run it on a standalone
(remote) process.
• For this, you just need to set the property of the Hive
“METASTORE_PORT” environment variable so that the specified
port gets listened by the server.

• Hive client—There are many different mechanisms to get in

contact with the applications when you run Hive as a server
that is hiveserver.
Hive Variables
• Hive allows you to set variables that can be referred in the Hive
script.
• For this purpose, you need to use the –d or –define option, as
shown in the following commands

In the preceding commands, a table named sample table is created in

the database sample database. By default, the variable substitution
option is enabled.
Hive Properties
• The hive-site.xml file stores the configuration properties of Hive.
These properties can be overwritten by the developers.
• To overwrite the properties of the hive-site.xml file, the set
command is used.
• For example, the following command sets the path of the current
directory to /tmp/mydir for all subsequent commands :
Hive Queries
• Hive allows you to simultaneously execute one or more queries.
These queries can be stored in and executed from files.
• The extension of a query file in Hive is .hql or .q. Let’s take an
example of a Hive query file, named ourquery.hql, stored in the
…/home/weusers/ queries/folder.
• Now, type the following commands to execute the query stored in
the ourquery.hql file:
4. 3 Data Types in Hive
• Hive supports two kinds of data types:
• primitive type and complex type.
• Primitive data types are built-in data types, which also act as basic
structures for building more sophisticated data types.
• Primitive data types are associated with columns of a table.
The following types of complex data types are available in Hive:
• Structs—The elements within this type can be accessed by using
the Dot (.) operator.

• Maps—The elements in a map are accessed by using the ['element

name'] notation.

• Arrays—The elements in an array have to be of the same data type.

Elements can be accessed by using the [n] notation where n
represents the index of the array.
Built- in Functions
• You must already be aware of the concept of functions and the role
they play in the development of an application.

• A function is a group of commands used to perform a particular

task in a program and return an outcome.

• Like every programming language, Hive also has its set of built-in
functions (also known as pre-defined functions).
lists the built-in functions available in Hive:
4.4 Hive-DDL
• Data Definition Language (DDL) is used to describe data and data
structures of a database.

• Hive has its own DDL, such as SQL DDL, which is used for
managing, creating, altering, and dropping databases, tables, and
other objects in a database.

• Similar to other SQL databases, Hive databases also contain

namespaces for tables
• Some of the main commands used in DDL are as
follows
• In the preceding command, the name of the database is added before
the name of the table.
• Therefore, temp_table gets added in the temp_database.
• In addition, you can also create a table in the database by using the
following commands:
• In the preceding commands, the USE statement is used for
setting the current database to execute all the subsequent
HiveQL statements. In this way, you do not need to add the name
of the database before the table name.
• The table temp_table is created in the database temp_database
• Furthermore, you can specify DBPROPERTIES in the form of
key-value pairs in the following manner:
• Viewing a Database You can view all the databases present in a particular path by using the following command:

• Dropping a Database
• Dropping a database means deleting it from its storage location
• The database can be deleted by using the following command:
Creating Tables
• You can create a table in a database by using the CREATE
command, as discussed earlier.
• Now, let’s learn how to provide the complete definition of a table
in a database by using the following commands:
Altering Tables
Altering a table means modifying or changing an existing table.
By altering a table, you can modify the metadata associated with the table.
The table can be modified by using the ALTER TABLE
The altering of a table allows you to:
 Rename tables
 Modify columns
 Add new columns
 Delete some columns
 Change table properties
 Alter tables for adding partition
4.5 Data Manipulation
After specifying the database schema and creating a database, the data
can be modified by using a set of procedures/mechanisms defined
by a special language known as Data Manipulation Language
(DML).
Data can be manipulated in the following ways:
• Loading files into tables
• Inserting data into Hive table from queries
• Updating existing tables
• Deleting records in tables Let’s learn about each of these
Loading Files into Tables
• While loading data into tables, Hive does not perform any type of
transformations.
• The data load operations in Hive are, at present, pure copy/move
operations, which move data files from one location to another.
• You can upload data into Hive tables from the local file system as
well as from HDFS.
Inserting Data into Tables
• Consider two tables, T1 and T2. We want to copy the sal column
from T2 to T1 by using the INSERT command. It can be done as
follows:
Static Partition Insertion
• Static partition insertion refers to the task of inserting data into a
table by specifying a partition column value.

Consider the following example:

Dynamic Partition Insertion
• In dynamic partition insertion, you need to specify a list of partition column
names in the PARTITION() clause along with the optional column values.
• A dynamic partition column always has a corresponding input column in
the SELECT statement.

• If the SELECT statement has multiple column names, the dynamic partition
columns must be specified at the end of the columns and in the same order
in which they appear in the PARTITION() clause.
Inserting Data into Local Files
• Sometimes, you might require to save the result of the SELECT
query in flat files so that you do not have to execute the queries
again and again.
Consider the following example:
Creating and Inserting Data into a Table Using a Single Query

Delete in Hive
• The delete operation is available in Hive from the Hive 0.14 version. The delete operation can only be performed on those tables that support the ACID property.
• The syntax for performing the delete operation is as follows:
4.6 Data Retrieval Queries
Hive allows you to perform data retrieval queries by using the
SELECT command along with various types of operators and
clauses. In this section, you learn about the following:
• Using the SELECT command
• Using the WHERE clause
• Using the GROUP BY clause
• Using the HAVING clause
• Using the LIMIT clause
• Executing HiveQL queries
Using the SELECT Command
The SELECT statement is the most common operation in SQL.
You can filter the required columns, rows, or both.
The syntax for using the SELECT command is as follows
Using the GROUP BY Clause
• The GROUP BY clause is used to put all the related records
together. It can also be used with aggregate functions. Often, it is
required to group the resultsets in complex queries.
• In such scenarios, the ‘GROUP BY’ clause can be used.
Using the HAVING Clause
• The HAVING clause is used to specify a condition on the use of
the GROUP BY clause. The use of the HAVING clause is added
in the version 0.7.0 of Hive.
• The following query shows an example of using the HAVING
clause:
4.7 Using JOINS in Hive
Hive supports joining of one or more tables to aggregate information. The
various joins supported by Hive are:
• Inner joins
• Outer joins
Inner Joins
• In case of inner joins, only the records satisfying the given condition get
selected. All the other records get discarded.
• Let’s take an example to describe the concept of inner joins.
Consider two tables, order and customer
Table 12.6 lists the data of the order table
Outer Joins
• Sometimes, you need to retrieve all the records from one table and
only some records from the other table.
• In such cases, you have to use the outer join.
• Figure 12.12 illustrates the concept of outer joins:
Outer joins are of three types:
• Right Outer Join
• Left Outer Join
• Full Outer Join
Right Outer Join
• In this type of join, all the records from the table on the right side of the
join are retained.
Left Outer Join
• In this type of join, all the records from the table on the left side of
the join are retained.
• Figure 12.14 illustrates the concept of left outer joins:
Full Outer Join
• In this case, all the fields from both tables are included. For the
entries that do not have any match, a NULL value would be
displayed.
Joining Tables
• You can combine the data of two or more tables in Hive by using
HiveQL queries.
• For this, we need to create tables and load them into Hive from
HDFS.

Unit IV Notes
No ratings yet
Unit IV Notes
47 pages
Unit 5 BDT
No ratings yet
Unit 5 BDT
49 pages
Unit-5 1
No ratings yet
Unit-5 1
29 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Hive Part 2
No ratings yet
Hive Part 2
53 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
HIVE
No ratings yet
HIVE
28 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive
No ratings yet
Hive
30 pages
Hive Final
No ratings yet
Hive Final
75 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
HIVE
No ratings yet
HIVE
80 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Bda Unit 5 - Hive
No ratings yet
Bda Unit 5 - Hive
10 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Unit IV
No ratings yet
Unit IV
22 pages
Unit 3
No ratings yet
Unit 3
75 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Unit V BD LM Cse
No ratings yet
Unit V BD LM Cse
34 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Hive Part 2
No ratings yet
Hive Part 2
47 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Unit V
No ratings yet
Unit V
23 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
7 Hive
No ratings yet
7 Hive
30 pages
Hive
No ratings yet
Hive
49 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Hive
No ratings yet
Hive
63 pages
Unit IV
No ratings yet
Unit IV
64 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
59 pages
Hive
No ratings yet
Hive
45 pages
Hive
No ratings yet
Hive
9 pages
Hive
No ratings yet
Hive
12 pages
Apache Hive Lessons For Beginner
No ratings yet
Apache Hive Lessons For Beginner
93 pages
BCS L3 Digital Marketer IfATE V1.1 Project H Overview
No ratings yet
BCS L3 Digital Marketer IfATE V1.1 Project H Overview
6 pages
A6V10366162 - BACnet Router For BACnet - IP BACnet - LonTalk BACne - en
No ratings yet
A6V10366162 - BACnet Router For BACnet - IP BACnet - LonTalk BACne - en
10 pages
Types of Hackers Explained
No ratings yet
Types of Hackers Explained
5 pages
Lab 2a Computer Network
No ratings yet
Lab 2a Computer Network
7 pages
18cse383t - Unit 1
No ratings yet
18cse383t - Unit 1
171 pages
Bsbcrt611-Solve A Complex Workplace Problem: - Task 1
No ratings yet
Bsbcrt611-Solve A Complex Workplace Problem: - Task 1
6 pages
Leads & Opportunities For Lightning Experience
No ratings yet
Leads & Opportunities For Lightning Experience
7 pages
Name: Sailabala Pradhan: Objective
No ratings yet
Name: Sailabala Pradhan: Objective
5 pages
1.1 Work Breakdown Structure
No ratings yet
1.1 Work Breakdown Structure
9 pages
SQL Server Port Guide
No ratings yet
SQL Server Port Guide
1 page
The Transport
No ratings yet
The Transport
9 pages
Seminar On Hacking
No ratings yet
Seminar On Hacking
12 pages
Information Gathering Using Recon Lab Manual
No ratings yet
Information Gathering Using Recon Lab Manual
8 pages
Technology Vocabulary
0% (1)
Technology Vocabulary
3 pages
Oracle DBA Guide for Beginners
No ratings yet
Oracle DBA Guide for Beginners
53 pages
A Blockchain Architecture For Secure Decentralized
No ratings yet
A Blockchain Architecture For Secure Decentralized
10 pages
4.0 Digital Assets
No ratings yet
4.0 Digital Assets
7 pages
Thesis On Business Intelligence
100% (3)
Thesis On Business Intelligence
5 pages
SQ L Server Builds
No ratings yet
SQ L Server Builds
264 pages
Tower Lift System Analysis
No ratings yet
Tower Lift System Analysis
3 pages
OAuth 2.0 Security for Developers
No ratings yet
OAuth 2.0 Security for Developers
48 pages
How To Configure DID Devices For Replication Using EMC SRDF (Sun Cluster System Administration Guide For Solaris OS)
No ratings yet
How To Configure DID Devices For Replication Using EMC SRDF (Sun Cluster System Administration Guide For Solaris OS)
2 pages
Music Playlist Using A Linked List Dsa
No ratings yet
Music Playlist Using A Linked List Dsa
28 pages
Microsoft Certified System Administrator - MCSA
No ratings yet
Microsoft Certified System Administrator - MCSA
3 pages
Atm Project
No ratings yet
Atm Project
45 pages
DM Unit 5
No ratings yet
DM Unit 5
9 pages
SAP Business Objects Process Control 10.0 Automated Monitoring Overview
No ratings yet
SAP Business Objects Process Control 10.0 Automated Monitoring Overview
70 pages
CS120 Chapter1 2024
No ratings yet
CS120 Chapter1 2024
45 pages
1.3.1.6 Lab - Threat Identification
No ratings yet
1.3.1.6 Lab - Threat Identification
3 pages
History of Distributed Systems
No ratings yet
History of Distributed Systems
2 pages

Unit Iv Part - 1

Uploaded by

Unit Iv Part - 1

Uploaded by

19ECS442

• Driver—Receives the submitted queries. This driver component

• Compiler—Parses the query, performs semantic analysis on

JAR—Hive is somewhat equal to Hadoop JAR, as it is

• Hive client—There are many different mechanisms to get in

In the preceding commands, a table named sample table is created in

• Maps—The elements in a map are accessed by using the ['element

• Arrays—The elements in an array have to be of the same data type.

• A function is a group of commands used to perform a particular

• Similar to other SQL databases, Hive databases also contain

Consider the following example:

You might also like