Big Data Hadoop and Spark Developer
Lesson 4—Basics of Hive and Impala
© Simplilearn. All rights reserved.
      Learning Objectives
Identify the features of Hive and Impala
Understand the methods to interact with Hive and Impala
Basics of Hive and Impala
Topic 1: Features of Hive and Impala
                                Introduction to Hive and Impala
  Batch                                          Interactive
Processing                                           SQL
              SELECT t1.a1 as c1, t2.b1 as c2
             FROM t1 JOIN t2 ON (t1.a2=t2.b2);
                                                               •   Hive and Impala provide an SQL-like interface
                                                                   for users to extract data from the Hadoop
                                                                   system.
                   Resource Management                         •   They reside on top of Hadoop and can be used
                                                                   to query data from the underlying storage
                         Storage                                   components.
                HDFS                  HBase
     Hive and Impala: Similarities
•   Hive is very similar to Impala in the following ways:
                                  Hive and Impala: Differences
Hive was developed by Facebook around 2007.              Impala was developed by Cloudera around
                                                         2012.
           It is an Open source Apache project.          It is an incubating Apache project.
   It has a high level abstraction layer on top of       It has a high performance dedicated SQL
                  MapReduce and Apache Spark.
                                                         engine.
It uses HiveQL to query the structured data in a
                                     metastore.          It uses Impala SQL for ad hoc queries.
               It is suitable for structured data.       It is designed for high concurrency and ad
                                                     z
                                                         hoc queries.
                     Hive and Impala: Comparison
     Hive                                                       Impala
                                        •   Comprises a specialized SQL
•   Provides more features than             engine that offers five to fifty
    Impala                                  times faster performance
                                            than Hive
•   Is highly extensible
                                        •   Used mainly for interactive
                                            queries and data analysis
•   Used mostly for batch
    processing                          •   Accommodates many
                                            concurrent users
Relational Databases vs. Hive vs. Impala
             Use Case: Hive and Impala
Hive and Impala are commonly used to analyze social media coverage.
Basics of Hive and Impala
Topic 2: Interacting with Hive and Impala
   Executing a Query in Hive and Impala
        Receive SQL query           Receive SQL query
              Parse Hive QL     1   Parse Impala SQL
        Make optimizations      2   Make optimizations
             Plan execution     3   Plan execution
     Submit job(s) to cluster   4   Execute query on cluster
          Monitor progress      5   Store the data in HDFS
           Process data—
                                6
MapReduce or Apache Spark
      Store the data in HDFS    7
Hive Query Editor
                      Interfaces to Run Hive and Impala Queries
                                     Hive and Impala offer numerous interfaces to run queries:
                                     •   Command-line shell:
                                          – Impala: Impala shell
                                          – Hive: Beeline
Impala Query Editor
                                     •   Hue Web UI:
                                         – Hive Query Editor
                                         – Impala Query Editor
                                     •   Metastore Manager:
                                         – ODBC/JDBC
                     Impala Lab Access Details
               •   The steps to start Impala in lab are as follows:
Step 1                                          Step 2
         • Log in to cloud lab                           • Connect to any
           web console with                                daemon server with
           your credentials                                the help of the
                                                           command below:
                                                           •impala-shell -i
                                                           cloudera-
                                                           slavenode3.cloudlab.
                                                           com
Demo
Starting Impala Lab
Demonstrate the method to start and connect to the Impala lab from command.
Impala Lab Access Details
Connecting with Hive and Impala Shell
 •   To execute Impala commands from Impala shell:
              •   To run Hive using Beeline:
          Running Impala Queries from Command Line
To check all options of Impala using the help option:     Impala-shell –help
                                                          Impala-shell –q ‘select *
To run direct queries from shell using the –q option:     from simple’
                                                          Impala-shell –d
To issue a use database on startup using the –d option:   Simplilearn
Demo
Connecting with Hive and Impala Shell
Demonstrate the method to connect with Hive and Impala shell, along with some basic
operations.
                                 Sample Queries
                                                SELECT version();
To explore a new Impala instance:
                                                SELECT current_database();
                                                CREATE DATABASE IF NOT EXISTS
To create a database:                           sample;
To verify a database:                           SHOW databases;
To specify the location where the database is   CREATE DATABASE IF NOT EXISTS
to be created:                                  database_name LOCATION hdfs_path;
                                  Sample Queries
To switch the current session to another
                                             USE db_name;
database:
                                             CREATE TABLE stockprice
                                             (stock_id INT,
                                             date STRING,
                                             open_price FLOAT,
                                             high_price FLOAT,
                                             low_price FLOAT,
                                             close_price FLOAT,
To create a table in Parquet format:         stock_volume INT,
                                             adjclose_price FLOAT)
                                             ROW FORMAT DELIMITED
                                             FIELDS TERMINATED BY ','
                                             LINES TERMINATED BY '\n'
                                             LOCATION
                                             '/home/singh25nov_gmail/input'
                                             stored as parquet ;
                                     Sample Queries
                                                CREATE EXTERNAL TABLE stop_loss
                                                (
                                                  stock_id INT,
                                                  stock_volume FLOAT,
                                                  stock_current_rate DOUBLE,
To load csv data from local files:                stock_trigger_price DOUBLE
                                                )
                                                ROW FORMAT DELIMITED FIELDS
                                                TERMINATED BY ','
                                                LOCATION
                                                '/user/cloudera/sample_data/tab1';
To list all tables in the current database in   SHOW tables;
Impala:
                          Sample Queries
                                     INSERT INTO stockprice
                                     (date,open_price,high_price,low_price,c
To insert a single row:              lose_price,stock_volume,adjclose_price
                                     ) VALUES ('15112017',102
                                     ,105,98.6,100154711,100);
                                     impala-shell -i <impala-daemon-uri> -f
To migrate from SQL:
                                     <filename>.sql;
                         Sample Queries
                                    SELECT stockprice.open_price,
                                    MAX(stockprice.stock_volume),
                                    MIN(stop_loss.stock_current_rate)
To aggregate and join:              FROM stop_loss JOIN stockprice USING
                                    (stock_id)
                                    GROUP BY high_price ORDER BY 1
                                    LIMIT 5;
                                    DROP (DATABASE|SCHEMA) [IF EXISTS]
To drop a database:                 database_name [RESTRICT | CASCADE]
                                    [LOCATION hdfs_path];
                                Sample Queries
                             • Interactive mode:
                                   SELECT count(*) FROM stockprice;
To query the Impala table:   • Set of commands contained in a file:
                                   impala-shell-i impala-host -f <filename>.sql;
                             • Single command to the impala-shell:
                                   impala-shell-i impala-host-q 'select count(*) from stockprice‘;
          Executing Queries in the Impala Shell
localhost.localdomain:21000] > select * from webpage where page_id > 40
                              > LIMIT 5;
Query: select * from webpage where page_id > 40
LIMIT 5
+---------+--------------------------+--------------------------------------+
| page_id | name                     | assoc_files                          |
+---------+--------------------------+--------------------------------------+
| 41      | sorrento_f31l_sales.html | theme2.css,code.js,sorrento_f31l.jpg |
| 42      | sorrento_f40l_sales.html | theme1.css,code.js,sorrento_f40l.jpg |
| 43      | sorrento_f01l_sales.html | theme1.css,code.js,sorrento_f01l.jpg |
| 44      | sorrento_f11l_sales.html | theme1.css,code.js,sorrento_f11l.jpg |
| 45      | titanic_2400_sales.html | theme1.css,code.js,titanic_2400.jpg |
+---------+--------------------------+--------------------------------------+
Fetched 5 row(s) in 1.32s
Demo
Impala Queries
Demonstrate the sample Impala queries.
Running Hive Queries Using Beeline
                     •   The character “!” is used to execute Beeline
                         commands.
                     The commands used to run Beeline:
                     • !exit: Used to exit the shell
                     • !help: Shows list of all commands
                     • !verbose: Shows added details of queries
Demo
Running Hive Queries Using Beeline
Demonstrate the method to connect with Beeline and execute basic queries.
                 Running Beeline from Command Line
                                                             beeline –u … -f
To execute file using the –u option:                        simplilearn.hql
To use HiveQL directly from the command line using the -e   beeline –u ... -e 'SELECT *
option:                                                     FROM users‘
To continue running script even after an error:             beeline –u … -force=TRUE
                                   Running Hive Query
Hive> select * from device
    > LIMIT 5;
OK                                                 •    All SQL commands are terminated
1 2008-10-21 00:00:00 Sorrento F00L phone               with a semicolon “;”
2 2010-04-19 00:00:00 Titanic 2100     phone
3 2011-02-18 00:00:00 MeeToo 3.0 phone
4 2011-09-21 00:00:00 MeeToo 3.1 phone
5 2008-10-21 00:00:00 iFruit 1 phone
Time taken: 0.296 seconds, Fetched: 5 row(s)
Connecting Hive and Impala Shell with Hue
                     •   Hue can be used to write Hive and Impala
                         queries from the User Interface.
Demo
Connecting Hive and Impala Shell with Hue
Demonstrate the method to connect Hive and Impala shell using Hue.
       Hive and Impala Editors in Hue
Diagram 1                          Diagram 2
              Key Takeaways
Hive and Impala are tools to perform SQL queries on data residing on HDFS
or HBase.
Hive and Impala are easy to learn for experienced SQL developers.
Hive and Impala solve the Big Data problem but cannot replace a traditional
RDBMS.
Hive runs MapReduce or Spark jobs on Hadoop based on HiveQL statements.
Impala uses a very fast specialized SQL engine that is faster than MapReduce.
Quiz
QUIZ
          Which of the following components can be used to accept command inputs from users?
 1
     a.   Command Line Interface
     b.   Query compiler
     c.   Execution engine
     d.   Thrift server
  QUIZ
             Which of the following components can be used to accept command inputs from users?
    1
        a.   Command Line Interface
        b.   Query compiler
        c.   Execution engine
        d.   Thrift server
The correct answer is a.
The Command Line Interface is used as an input medium to accept command input from users.
QUIZ
          Hive can be accessed from Hue using ________.
 2
     a.   Impala editor
     b.   Hive Editor
     c.   File browser
     d.   YARN UI
  QUIZ
             Hive can be accessed from Hue using ________.
    2
        a.   Impala editor
        b.   Hive Editor
        c.   File browser
        d.   YARN UI
The correct answer is b.
Hive can be accessed through the Hive editor in Hue.
QUIZ
          Impala can be accessed from Hue using ________.
 3
     a.   Impala editor
     b.   Hive Editor
     c.   File browser
     d.   YARN UI
  QUIZ
             Impala can be accessed from Hue using ________.
    3
        a.   Impala editor
        b.   Hive Editor
        c.   File browser
        d.   YARN UI
The correct answer is a.
Impala can be accessed through the Impala editor in Hue.
QUIZ
          Updating an individual record is possible in______.
 4
     a.   Impala
     b.   Hive
     c.   RDBMS
     d.   All of the above
  QUIZ
             Updating an individual record is possible in______.
    4
        a.   Impala
        b.   Hive
        c.   RDBMS
        d.   All of the above
The correct answer is c.
Hive and Impala cannot update individual records, but an RDBMS can.
QUIZ
          Deleting an individual record is possible in_______.
 5
     a.   RDBMS
     b.   Hive
     c.   Impala
     d.   All of the above
  QUIZ
             Deleting an individual record is possible in_______.
    5
        a.   RDBMS
        b.   Hive
        c.   Impala
        d.   All of the above
The correct answer is a.
Hive and Impala cannot delete individual records, but an RDBMS can.
                            This concludes “Basics of Hive and Impala.”
                                    The next lesson is “Working with Hive and Impala.”
©Simplilearn. All rights reserved