0% found this document useful (0 votes)

43 views5 pages

1 - Introduction

This document provides an introduction to Hive, including what it is, its features and architecture. Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It allows querying and analyzing large datasets in a similar manner to SQL.

Uploaded by

boxbe9876

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views5 pages

1 - Introduction

Uploaded by

boxbe9876

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

4/2/24, 7:27 AM Hive - Introduction

Hive - Introduction

The term ‘Big Data’ is used for collections of large datasets that include huge
volume, high velocity, and a variety of data that is increasing day by day. Using
traditional data management systems, it is difficult to process Big Data. Therefore,
the Apache Software Foundation introduced a framework called Hadoop to solve Big
Data management and processing challenges.

Hadoop
Hadoop is an open-source framework to store and process Big Data in a distributed
environment. It contains two modules, one is MapReduce and another is Hadoop
Distributed File System (HDFS).

MapReduce: It is a parallel programming model for processing large

amounts of structured, semi-structured, and unstructured data on large
clusters of commodity hardware.
HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to
store and process the datasets. It provides a fault-tolerant file system to run
on commodity hardware.

The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig,
and Hive that are used to help Hadoop modules.

Sqoop: It is used to import and export data to and from between HDFS and
RDBMS.
Pig: It is a procedural language platform used to develop a script for
MapReduce operations.
Hive: It is a platform used to develop SQL type scripts to do MapReduce
operations.

Note: There are various ways to execute MapReduce operations:

The traditional approach using Java MapReduce program for structured, semi-
structured, and unstructured data.

https://www.tutorialspoint.com/hive/hive_introduction.htm 1/5
4/2/24, 7:27 AM Hive - Introduction

The scripting approach for MapReduce to process structured and semi

structured data using Pig.
The Hive Query Language (HiveQL or HQL) for MapReduce to process
structured data using Hive.

What is Hive
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It
resides on top of Hadoop to summarize Big Data, and makes querying and analyzing
easy.

Initially Hive was developed by Facebook, later the Apache Software Foundation took
it up and developed it further as an open source under the name Apache Hive. It is
used by different companies. For example, Amazon uses it in Amazon Elastic
MapReduce.

Hive is not

A relational database
A design for OnLine Transaction Processing (OLTP)

A language for real-time queries and row-level updates

Features of Hive

It stores schema in a database and processed data into HDFS.

It is designed for OLAP.

It provides SQL type language for querying called HiveQL or HQL.

It is familiar, fast, scalable, and extensible.

Architecture of Hive
The following component diagram depicts the architecture of Hive:

https://www.tutorialspoint.com/hive/hive_introduction.htm 2/5
4/2/24, 7:27 AM Hive - Introduction

This component diagram contains different units. The following table describes each
unit:

Unit Name Operation

Hive is a data warehouse infrastructure software that can create

interaction between user and HDFS. The user interfaces that Hive
User Interface
supports are Hive Web UI, Hive command line, and Hive HD
Insight (In Windows server).

Hive chooses respective database servers to store the schema or

Meta Store Metadata of tables, databases, columns in a table, their data
types, and HDFS mapping.

HiveQL is similar to SQL for querying on schema info on the

HiveQL Metastore. It is one of the replacements of traditional approach
Process Engine for MapReduce program. Instead of writing MapReduce program
in Java, we can write a query for MapReduce job and process it.

The conjunction part of HiveQL process Engine and MapReduce is

Execution Hive Execution Engine. Execution engine processes the query and
Engine generates results as same as MapReduce results. It uses the
flavor of MapReduce.

HDFS or Hadoop distributed file system or HBASE are the data storage
HBASE techniques to store data into file system.

Working of Hive

https://www.tutorialspoint.com/hive/hive_introduction.htm 3/5
4/2/24, 7:27 AM Hive - Introduction

The following diagram depicts the workflow between Hive and Hadoop.

The following table defines how Hive interacts with Hadoop framework:

Step
Operation
No.

Execute Query
1 The Hive interface such as Command Line or Web UI sends query to
Driver (any database driver such as JDBC, ODBC, etc.) to execute.

Get Plan
2 The driver takes the help of query compiler that parses the query to
check the syntax and query plan or the requirement of query.

Get Metadata
3
The compiler sends metadata request to Metastore (any database).

Send Metadata
4
Metastore sends metadata as a response to the compiler.

Send Plan
5 The compiler checks the requirement and resends the plan to the
driver. Up to here, the parsing and compiling of a query is complete.

Execute Plan
6
The driver sends the execute plan to the execution engine.

7 Execute Job

https://www.tutorialspoint.com/hive/hive_introduction.htm 4/5
4/2/24, 7:27 AM Hive - Introduction

Internally, the process of execution job is a MapReduce job. The

execution engine sends the job to JobTracker, which is in Name node
and it assigns this job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.

Metadata Ops
7.1 Meanwhile in execution, the execution engine can execute metadata
operations with Metastore.

Fetch Result
8
The execution engine receives the results from Data nodes.

Send Results
9
The execution engine sends those resultant values to the driver.

Send Results
10
The driver sends the results to Hive Interfaces.

https://www.tutorialspoint.com/hive/hive_introduction.htm 5/5

HIVE
No ratings yet
HIVE
7 pages
Hive for Big Data Professionals
No ratings yet
Hive for Big Data Professionals
17 pages
What Is Hive
No ratings yet
What Is Hive
4 pages
Introduction To Hive-5
No ratings yet
Introduction To Hive-5
4 pages
Hadoop Ecosystem: Hive and MapReduce
No ratings yet
Hadoop Ecosystem: Hive and MapReduce
14 pages
Apache Hive for Big Data Processing
No ratings yet
Apache Hive for Big Data Processing
19 pages
01 Introduction To Hive (1) 2 15
No ratings yet
01 Introduction To Hive (1) 2 15
14 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive
No ratings yet
Hive
49 pages
Assignment 4-Gcc: Hive Is Not
No ratings yet
Assignment 4-Gcc: Hive Is Not
3 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
HIVE
No ratings yet
HIVE
18 pages
Hive
No ratings yet
Hive
12 pages
Hadoop and Hive Architecture 1
No ratings yet
Hadoop and Hive Architecture 1
11 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Web Based Data Management of Apache Hive
No ratings yet
Web Based Data Management of Apache Hive
22 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hive
No ratings yet
Hive
52 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Hive
No ratings yet
Hive
30 pages
Apache Hive: Structure & Data Analysis
No ratings yet
Apache Hive: Structure & Data Analysis
25 pages
Apache Hive: Data Warehousing on Hadoop
No ratings yet
Apache Hive: Data Warehousing on Hadoop
28 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Hive
No ratings yet
Hive
5 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
7 Hive
No ratings yet
7 Hive
30 pages
Big Data: Week - 11
No ratings yet
Big Data: Week - 11
28 pages
Bigdata Lecture 5
No ratings yet
Bigdata Lecture 5
19 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive
No ratings yet
Hive
28 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
182 pages
Bda Report
No ratings yet
Bda Report
16 pages
Hands-On Lab: IBM Software Information Management
No ratings yet
Hands-On Lab: IBM Software Information Management
25 pages
Hive
No ratings yet
Hive
2 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Hive
No ratings yet
Hive
12 pages
Bda Ia-3 QB-1
No ratings yet
Bda Ia-3 QB-1
17 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Hive
No ratings yet
Hive
65 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Week 14 Hive
No ratings yet
Week 14 Hive
6 pages
Hive and Hadoop Workflow Explained
No ratings yet
Hive and Hadoop Workflow Explained
3 pages
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
Circular Linked List (Code)
No ratings yet
Circular Linked List (Code)
4 pages
Oracle Autonomous Database 2021 Specialist 1Z0 931 21 Dump
No ratings yet
Oracle Autonomous Database 2021 Specialist 1Z0 931 21 Dump
22 pages
TCP vs UDP Protocols in Packet Tracer
No ratings yet
TCP vs UDP Protocols in Packet Tracer
6 pages
SQL Server
No ratings yet
SQL Server
22 pages
PeopleCode Arrays Guide
No ratings yet
PeopleCode Arrays Guide
3 pages
Core-Java Material
No ratings yet
Core-Java Material
18 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
16 pages
Oracle EBS R12 Item Import Interface Guide
No ratings yet
Oracle EBS R12 Item Import Interface Guide
68 pages
Erori de Comunicatie
No ratings yet
Erori de Comunicatie
1 page
Installation Instructions
No ratings yet
Installation Instructions
72 pages
Parminder Bhatia's Tech Resume
No ratings yet
Parminder Bhatia's Tech Resume
3 pages
VulnVoIP (Vulnerable VoIP) Solutions
No ratings yet
VulnVoIP (Vulnerable VoIP) Solutions
10 pages
Dell 1 Files 10043 13275 Folder Median
100% (2)
Dell 1 Files 10043 13275 Folder Median
808 pages
SqlDependency - Start ( - Connect) Makes These DB Calls: Select Is - Broker - Enabled
No ratings yet
SqlDependency - Start ( - Connect) Makes These DB Calls: Select Is - Broker - Enabled
4 pages
Cron Job PHP
No ratings yet
Cron Job PHP
8 pages
PERT 11 - Array 2 Dimensi
No ratings yet
PERT 11 - Array 2 Dimensi
13 pages
Architectural and Operating System Support For Virtual Memory
No ratings yet
Architectural and Operating System Support For Virtual Memory
16 pages
s5860 Series Switches Software Upgrade Guide
No ratings yet
s5860 Series Switches Software Upgrade Guide
11 pages
Computer Networks Assignment
No ratings yet
Computer Networks Assignment
1 page
Ch#1 SQ 12 Comp. by Abdullah Mughal
No ratings yet
Ch#1 SQ 12 Comp. by Abdullah Mughal
9 pages
DBMS Notes VTu
No ratings yet
DBMS Notes VTu
26 pages
Jpeg Join
No ratings yet
Jpeg Join
3 pages
Acronyms
No ratings yet
Acronyms
167 pages
Technical Interview Questions
No ratings yet
Technical Interview Questions
9 pages
Multimedia Tools BCA Question Papers
No ratings yet
Multimedia Tools BCA Question Papers
11 pages
Psuedocodes MCQs
No ratings yet
Psuedocodes MCQs
51 pages
It3401 Unit 5 Notes
No ratings yet
It3401 Unit 5 Notes
73 pages
8 Designing Solutions Using Dell EMC RecoverPoint
No ratings yet
8 Designing Solutions Using Dell EMC RecoverPoint
34 pages
Advanced Transaction Models
No ratings yet
Advanced Transaction Models
93 pages
Increase ASM Disk Group SOP
No ratings yet
Increase ASM Disk Group SOP
5 pages

1 - Introduction

Uploaded by

1 - Introduction

Uploaded by

4/2/24, 7:27 AM Hive - Introduction

MapReduce: It is a parallel programming model for processing large

Note: There are various ways to execute MapReduce operations:

The scripting approach for MapReduce to process structured and semi

A language for real-time queries and row-level updates

It stores schema in a database and processed data into HDFS.

It is designed for OLAP.

It is familiar, fast, scalable, and extensible.

Unit Name Operation

Hive is a data warehouse infrastructure software that can create

Hive chooses respective database servers to store the schema or

HiveQL is similar to SQL for querying on schema info on the

The conjunction part of HiveQL process Engine and MapReduce is

Internally, the process of execution job is a MapReduce job. The

You might also like