0% found this document useful (0 votes)
105 views18 pages

Sqoop v1.1

Sqoop is a tool used for transferring bulk data between HDFS and relational databases. It can import data from a database into HDFS or export data from HDFS to a database. Sqoop uses MapReduce jobs and JDBC to connect to databases. It supports importing/exporting to file formats like delimited text and SequenceFiles. Common databases supported include MySQL, SQL Server, PostgreSQL and Oracle. Sqoop also integrates with Hive for importing into tables and HBase for importing into column families.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views18 pages

Sqoop v1.1

Sqoop is a tool used for transferring bulk data between HDFS and relational databases. It can import data from a database into HDFS or export data from HDFS to a database. Sqoop uses MapReduce jobs and JDBC to connect to databases. It supports importing/exporting to file formats like delimited text and SequenceFiles. Common databases supported include MySQL, SQL Server, PostgreSQL and Oracle. Sqoop also integrates with Hive for importing into tables and HBase for importing into column families.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

SQOOP

Karthikeyan
Sanjai
1

3vbigdata.training@gmail.com

INTRODUCTION

Bulk data transfer tool

To import and export data from a relational database into Hadoop for
processing

Map only job.

command-line tool

Integrates with Hive & Hbase

Support plugins via connector based architecture

3vbigdata.training@gmail.com

SQOOP ARCHITECTURE

3vbigdata.training@gmail.com

FILE FORMATS

Two file formats:


Delimited text

SequenceFiles

3vbigdata.training@gmail.com

Delimited Text File

Default import format, explicitly as --as-textfile

Default delimiters are comma(,) for fields , a newline (\n) for records.

Sequence File --as-sequencefile

SUPPORTED DATABASES

Uses JDBC compatibility layer to talk with the databases

Sample List of DBs


MySQL

MS SQL

PostgreSQL

Oracle

Need to add vendor specific drivers in $SQOOP_HOME/lib

3vbigdata.training@gmail.co
m

INSTALLATION
Download Sqoop-*.tar.gz

tar txvf sqoop-*.*.tar.gz

export HADOOP_HOME=/some/path/hadoop-dir

Please add the vendor Specific JDBC jar to $SQOOP_HOME/lib

Change to Sqoop Bin folder .

3vbigdata.training@gmail.com

/sqoop help

SQOOP COMMANDS

sqoop help
Or We can use: sqoop COMMAND [ARGS]

Available commands:

codegen

create-hive-table

Import tables from a database to HDFS

list-databases

Import a table from a database to HDFS

import-all-tables

Export an HDFS directory to a database table help List available commands

import

Evaluate a SQL statement and display the results

export

Import a table definition into Hive

eval

Generate code to interact with database records

3vbigdata.training@gmail.com

List available databases on a server

list-tables

List available tables in a database version Display version information

MYSQL CONNECTIVITY

mysql u root p

Enter password:root

show databases;

use test;

mysql>CREATE TABLE patient( pid INT(10),name VARCHAR(20),durg


VARCHAR(20),tot_amt INT(10));

mysql>insert into patient values(1,'saravanan','avil',100);

mysql>insert into patient values(2,'senthil','metacin',200);

mysql>insert into patient values(3,'Gowtham','paracetamol',300);

mysql>select * from patient;

3vbigdata.training@gmail.com

SQOOP EVALUATE
Evaluate a SQL statement

bin/sqoop eval --connect jdbc:mysql://localhost/test -username root -

3vbigdata.training@gmail.com

password root query "SELECT * FROM patient"

bin/sqoop eval --connect jdbc:mysql://localhost/test -username root password root --query "SELECT * FROM patient LIMIT 2"

bin/sqoop eval --connect jdbc:mysql://localhost/test -username root password root --query "INSERT INTO patient VALUES(4,
amudhan,'avil',400)"

SQOOP LIST
Sqoop-list-databases

bin/sqoop list-databases --connect jdbc:mysql://localhost/ information_schema

Sqoop-list-tables
bin/sqoop list-tables --connect jdbc:mysql://localhost/test username root password root

3vbigdata.training@gmail.com

-username root -password root

10

SQOOP IMPORT - MYSQL TO HDFS

bin/sqoop import --connect jdbc:mysql://localhost/test -username root -

password root --table patient -m 1


Imports patient table into HDFS directory

Sqoop generates java file(patient.java) for our use

Data imported as text or SequenceFiles

Instead we can use codegen

bin/hadoop dfs cat /user/username/patient/part-00000

All values re displayed

These files can be used as input to MR jobs.

3vbigdata.training@gmail.com

11

SQOOP IMPORT(CONTD)

Increasing parallelism (number of mappers)

bin/sqoop import --connect jdbc:mysql://localhost/test -username root -

target-directory

bin/sqoop import -connect jdbc:mysql://localhost/test -username root password root --table patient --target-dir /user/output -m 1

mysql to hdfs import-all-tables

bin/sqoop import-all-tables --connect jdbc:mysql://localhost/test -

3vbigdata.training@gmail.com

password root --table patient --split-by column name(pid) -m 2

username root -password root -m 1

12

HIVE INTEGRATION

bin/sqoop-import --connect jdbc:mysql://localhost/test -username root password root --table patient --hive-table patientthive --create-hive-table 3vbigdata.training@gmail.com

-hive-import -m 1

13

SQOOP - HIVE IMPORT

14

3vbigdata.training@gmail.com

HBASE

bin/sqoop import --connect jdbc:mysql://localhost/test --username root --password root

table patient --hbase-table patienthbase2 --column-family datasqoop --hbase-row3vbigdata.training@gmail.com

key pid --hbase-createtable -m 1

15

SQOOP HBASE IMPORT

16

3vbigdata.training@gmail.com

SQOOP EXPORT

Exports a set of files from HDFS back to an RDBMS

The target table must already exist in the database

The input files are read and parsed into a set of records according to the

Does not export from Hbase

hdfs to mysql

bin/sqoop export --connect jdbc:mysql://localhost/test - username root password root --table patient --export-dir /user/amudhan/patient

hive to mysql:

3vbigdata.training@gmail.com

user-specified delimiters.

bin/sqoop export --connect jdbc:mysql://localhost/test --table patient -export-dir /user/hive/warehouse/patient --username root -- password
root -m 1
17

THANK YOU!

18

3vbigdata.training@gmail.com

You might also like