0% found this document useful (0 votes)
39 views47 pages

Hive Introduction

The document provides an introduction to Hive, a data warehouse infrastructure built on Apache Hadoop that allows for SQL-like querying of large datasets. It covers Hive's architecture, features, data types, and file formats, as well as user-defined functions and various metastore configurations. Additionally, it includes resources for further learning and contact information for the author.

Uploaded by

relaxeddavinci0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views47 pages

Hive Introduction

The document provides an introduction to Hive, a data warehouse infrastructure built on Apache Hadoop that allows for SQL-like querying of large datasets. It covers Hive's architecture, features, data types, and file formats, as well as user-defined functions and various metastore configurations. Additionally, it includes resources for further learning and contact information for the author.

Uploaded by

relaxeddavinci0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

INTRODUCTION TO

HIVE

© Big Data Analytics By Rashmi Benni 1


AGENDA:
❏ Overview and Architecture of Hive
❏ Hive Data Types
❏ Hive File Format
❏ Hive Query
❏ Language (HQL)
❏ RCFile Implementation
❏ User-Defined Function (UDF)

© Big Data Analytics By Rashmi Benni 2


WHAT IS HIVE?
Hive is a data warehouse infrastructure built on top of
Apache Hadoop. It provides a SQL-like interface to query
data stored in various databases and file systems that
integrate with Hadoop. Hive is designed for managing and
querying large datasets stored in Hadoop Distributed File
System (HDFS) using HiveQL, a SQL-like language.

© Big Data Analytics By Rashmi Benni 3


© Big Data Analytics By Rashmi Benni 4
FEATURES OF HIVE:
❏ Open source – Hive is free to use and supported by a strong open-source
community.

❏ Multiple users – It supports concurrent access by multiple users for


collaborative data analysis.

❏ File formats – Hive can process data in various formats like Text, ORC,
Parquet, Avro, etc.

❏ Built-in function – It offers a rich set of built-in functions for data


manipulation and analysis.

© Big Data Analytics By Rashmi Benni 5


FEATURES OF HIVE:
❏ External table – Allows linking to data stored outside Hive without
moving it into the warehouse.

❏ Fast – Optimized for fast querying of large datasets using


execution engines like Tez or Spark.

❏ Table structure – Data is organized into tables with partitions and


buckets for efficient access.

© Big Data Analytics By Rashmi Benni 6


FEATURES OF HIVE:
❏ ETL support – Hive is widely used for ETL (Extract, Transform,
Load) operations in data pipelines.

❏ Storage – It integrates with HDFS and other storage systems to


manage large-scale data efficiently.

❏ Ad-hoc queries – Enables users to run quick, flexible queries


without pre-defined reports or jobs.

© Big Data Analytics By Rashmi Benni 7


© Big Data Analytics By Rashmi Benni 8
HIVE ARCHITECTURE:
▣ Hive – Acts as a data warehouse infrastructure built on top of
Hadoop for querying and analyzing large datasets.

▣ Command-line Interface (CLI) – Provides a terminal-based interface


for users to submit HiveQL queries.

▣ Hive Web Interface – A GUI that allows users to interact with Hive
through a browser.

© Big Data Analytics By Rashmi Benni 9
HIVE ARCHITECTURE:
▣ Hive Server (Thrift) – Enables remote clients to execute queries via
a network using Thrift protocol.

▣ Driver (Query Compiler, Executor) – Manages the lifecycle of a


HiveQL query including parsing, compiling, optimizing, and executing
it.

▣ Metastore – Stores metadata about databases, tables, partitions,


columns, and their data types.
© Big Data Analytics By Rashmi Benni 10
HIVE ARCHITECTURE:
▣ Hive JobTracker – Manages and schedules MapReduce jobs across
the Hadoop cluster.

▣ TaskTracker – Executes individual tasks as assigned by the


JobTracker on slave nodes.

© Big Data Analytics By Rashmi Benni 11


HIVE ARCHITECTURE:
▣ HDFS (Hadoop Distributed File System) – Stores the actual data
files managed by Hive tables.

▣ Hadoop – The underlying distributed computing framework that Hive


uses for data processing.


© Big Data Analytics By Rashmi Benni 12
© Big Data Analytics By Rashmi Benni 13
EMBEDDED METASTORE:
▣ Metastore runs in the same JVM as the Hive service (Driver)
using an embedded Derby database.

▣ Suitable for single-user or test environments due to limited


concurrency.

▣ Only one Hive session can access the metastore at a time, making
it ideal for development setups.

© Big Data Analytics By Rashmi Benni 14


LOCAL METASTORE:
▣ Metastore is still in the same JVM as the Hive Driver but
connects to an external database like MySQL.

▣ Allows multiple Hive sessions (Drivers) to connect to a shared


metastore database.

▣ Offers better concurrency and is suitable for small-scale


production environments.

© Big Data Analytics By Rashmi Benni 15


REMOTE METASTORE:
▣ Metastore runs in a separate JVM (Metastore Server JVM) and
is accessed over a network.

▣ Multiple Hive clients (Drivers) can access the centralized


metastore concurrently.

▣ Ideal for large-scale production systems with high concurrency


and separation of services.

© Big Data Analytics By Rashmi Benni 16


DATA TYPES IN HIVE

© Big Data Analytics By Rashmi Benni 17


© Big Data Analytics By Rashmi Benni 18
NUMERIC DATA TYPES:
-- Create table with numeric types
CREATE TABLE student_numeric (
id INT,
age TINYINT,
marks FLOAT,
total_score DECIMAL(5,2)
);

-- Insert data
INSERT INTO student_numeric VALUES (1, 20, 85.5, 87.75);

© Big Data Analytics By Rashmi Benni 19


© Big Data Analytics By Rashmi Benni 20
STRING DATA TYPES:
-- Create table with string types
CREATE TABLE student_string (
name STRING,
nickname VARCHAR(20),
code CHAR(5)
);

-- Insert data
INSERT INTO student_string VALUES ('Rahul', 'Rahu', 'C1234');

© Big Data Analytics By Rashmi Benni 21


© Big Data Analytics By Rashmi Benni 22
MISCELLANEOUS DATA TYPES:
-- Create table with miscellaneous types
CREATE TABLE student_misc (
is_active BOOLEAN,
dob DATE,
login_time TIMESTAMP,
photo BINARY
);

-- Insert data (note: BINARY insert via LOAD or programmatically)


INSERT INTO student_misc VALUES (true, '2002-05-12', '2024-01-01
10:00:00', NULL);

© Big Data Analytics By Rashmi Benni 23


© Big Data Analytics By Rashmi Benni 24
ARRAY:
-- Table with ARRAY type
CREATE TABLE student_array (
name STRING,
subjects ARRAY<STRING>
);

-- Insert data
INSERT INTO student_array VALUES ('Ravi', ARRAY('Math', 'Physics',
'Chemistry'));

© Big Data Analytics By Rashmi Benni 25


MAP:
-- Table with MAP type
CREATE TABLE student_map (
name STRING,
subject_marks MAP<STRING, INT>
);

-- Insert data
INSERT INTO student_map VALUES ('Anu', MAP('Math', 90, 'Science', 85));

© Big Data Analytics By Rashmi Benni 26


STRUCT:
-- Table with STRUCT type
CREATE TABLE student_struct (
id INT,
details STRUCT<name:STRING, age:INT, grade:STRING>
);

-- Insert data
INSERT INTO student_struct VALUES (101, NAMED_STRUCT('name','Kiran',
'age',22, 'grade','A'));

© Big Data Analytics By Rashmi Benni 27


HIVE FILE FORMAT
TYPES

© Big Data Analytics By Rashmi Benni 28


© Big Data Analytics By Rashmi Benni 29
© Big Data Analytics By Rashmi Benni 30
© Big Data Analytics By Rashmi Benni 31
HIVE DATA
WAREHOUSE SOLUTION

© Big Data Analytics By Rashmi Benni 32


© Big Data Analytics By Rashmi Benni 33
© Big Data Analytics By Rashmi Benni 34
TABLES IN HIVE

© Big Data Analytics By Rashmi Benni 35


© Big Data Analytics By Rashmi Benni 36
© Big Data Analytics By Rashmi Benni 37
© Big Data Analytics By Rashmi Benni 38
© Big Data Analytics By Rashmi Benni 39
© Big Data Analytics By Rashmi Benni 40
© Big Data Analytics By Rashmi Benni 41
© Big Data Analytics By Rashmi Benni 42
© Big Data Analytics By Rashmi Benni 43
RESOURCES:
1. “Hive Course For Beginners”
https://youtu.be/nVI4xEH7yU8?si=xKW4RSnX-ogzPCOb

2. “Hive Query Language Tutorial | HQL | Cloudera| Hands on Training”


https://youtu.be/gVDRTqMomDs?si=aoRIhaGdVbwiTHN-

3. “Hive Query Language Tutorial | HQL | Working with Joins | Cloudera|


Hands on Training”
https://youtu.be/8Pk5X5NNLWo?si=N-5IcBtkF8To1Box

© Big Data Analytics By Rashmi Benni 44


RESOURCES:
4. “Hive Static and Dynamic Table Partition | HQL | Index & View
| Cloudera| Hands on Training”
https://youtu.be/tUJmq4OnESs?si=AyZGZplLlsFDeHxr

5. “Apache Hive Tutorial For Beginners | Big Data Training |


Edureka | Big Data Rewind”
https://www.youtube.com/live/HhJX6KkdjRM?si=JdVG8QOlYf8ep4Cp

© Big Data Analytics By Rashmi Benni 45


Thank YOU!
Any questions?
You can find me at
rashmi.benni@kletech.ac.in
rashmi.benni16@gmail.com

© Big Data Analytics By Rashmi Benni 46


HAPPY LEARNING!

© Big Data Analytics By Rashmi Benni 47

You might also like