SQOOP
Karthikeyan
Sanjai
1
3vbigdata.training@gmail.com
INTRODUCTION
Bulk data transfer tool
To import and export data from a relational database into Hadoop for
processing
Map only job.
command-line tool
Integrates with Hive & Hbase
Support plugins via connector based architecture
3vbigdata.training@gmail.com
SQOOP ARCHITECTURE
3vbigdata.training@gmail.com
FILE FORMATS
Two file formats:
Delimited text
SequenceFiles
3vbigdata.training@gmail.com
Delimited Text File
Default import format, explicitly as --as-textfile
Default delimiters are comma(,) for fields , a newline (\n) for records.
Sequence File --as-sequencefile
SUPPORTED DATABASES
Uses JDBC compatibility layer to talk with the databases
Sample List of DBs
MySQL
MS SQL
PostgreSQL
Oracle
Need to add vendor specific drivers in $SQOOP_HOME/lib
3vbigdata.training@gmail.co
m
INSTALLATION
Download Sqoop-*.tar.gz
tar txvf sqoop-*.*.tar.gz
export HADOOP_HOME=/some/path/hadoop-dir
Please add the vendor Specific JDBC jar to $SQOOP_HOME/lib
Change to Sqoop Bin folder .
3vbigdata.training@gmail.com
/sqoop help
SQOOP COMMANDS
sqoop help
Or We can use: sqoop COMMAND [ARGS]
Available commands:
codegen
create-hive-table
Import tables from a database to HDFS
list-databases
Import a table from a database to HDFS
import-all-tables
Export an HDFS directory to a database table help List available commands
import
Evaluate a SQL statement and display the results
export
Import a table definition into Hive
eval
Generate code to interact with database records
3vbigdata.training@gmail.com
List available databases on a server
list-tables
List available tables in a database version Display version information
MYSQL CONNECTIVITY
mysql u root p
Enter password:root
show databases;
use test;
mysql>CREATE TABLE patient( pid INT(10),name VARCHAR(20),durg
VARCHAR(20),tot_amt INT(10));
mysql>insert into patient values(1,'saravanan','avil',100);
mysql>insert into patient values(2,'senthil','metacin',200);
mysql>insert into patient values(3,'Gowtham','paracetamol',300);
mysql>select * from patient;
3vbigdata.training@gmail.com
SQOOP EVALUATE
Evaluate a SQL statement
bin/sqoop eval --connect jdbc:mysql://localhost/test -username root -
3vbigdata.training@gmail.com
password root query "SELECT * FROM patient"
bin/sqoop eval --connect jdbc:mysql://localhost/test -username root password root --query "SELECT * FROM patient LIMIT 2"
bin/sqoop eval --connect jdbc:mysql://localhost/test -username root password root --query "INSERT INTO patient VALUES(4,
amudhan,'avil',400)"
SQOOP LIST
Sqoop-list-databases
bin/sqoop list-databases --connect jdbc:mysql://localhost/ information_schema
Sqoop-list-tables
bin/sqoop list-tables --connect jdbc:mysql://localhost/test username root password root
3vbigdata.training@gmail.com
-username root -password root
10
SQOOP IMPORT - MYSQL TO HDFS
bin/sqoop import --connect jdbc:mysql://localhost/test -username root -
password root --table patient -m 1
Imports patient table into HDFS directory
Sqoop generates java file(patient.java) for our use
Data imported as text or SequenceFiles
Instead we can use codegen
bin/hadoop dfs cat /user/username/patient/part-00000
All values re displayed
These files can be used as input to MR jobs.
3vbigdata.training@gmail.com
11
SQOOP IMPORT(CONTD)
Increasing parallelism (number of mappers)
bin/sqoop import --connect jdbc:mysql://localhost/test -username root -
target-directory
bin/sqoop import -connect jdbc:mysql://localhost/test -username root password root --table patient --target-dir /user/output -m 1
mysql to hdfs import-all-tables
bin/sqoop import-all-tables --connect jdbc:mysql://localhost/test -
3vbigdata.training@gmail.com
password root --table patient --split-by column name(pid) -m 2
username root -password root -m 1
12
HIVE INTEGRATION
bin/sqoop-import --connect jdbc:mysql://localhost/test -username root password root --table patient --hive-table patientthive --create-hive-table 3vbigdata.training@gmail.com
-hive-import -m 1
13
SQOOP - HIVE IMPORT
14
3vbigdata.training@gmail.com
HBASE
bin/sqoop import --connect jdbc:mysql://localhost/test --username root --password root
table patient --hbase-table patienthbase2 --column-family datasqoop --hbase-row3vbigdata.training@gmail.com
key pid --hbase-createtable -m 1
15
SQOOP HBASE IMPORT
16
3vbigdata.training@gmail.com
SQOOP EXPORT
Exports a set of files from HDFS back to an RDBMS
The target table must already exist in the database
The input files are read and parsed into a set of records according to the
Does not export from Hbase
hdfs to mysql
bin/sqoop export --connect jdbc:mysql://localhost/test - username root password root --table patient --export-dir /user/amudhan/patient
hive to mysql:
3vbigdata.training@gmail.com
user-specified delimiters.
bin/sqoop export --connect jdbc:mysql://localhost/test --table patient -export-dir /user/hive/warehouse/patient --username root -- password
root -m 1
17
THANK YOU!
18
3vbigdata.training@gmail.com