0% found this document useful (0 votes)

13 views30 pages

04 Sqoop

Chapter 4 focuses on importing relational data into a Hadoop cluster using Apache Sqoop. It covers the basics of Sqoop, including how to import tables, change delimiters, control imported data, and improve performance. The chapter also introduces Sqoop 2 and provides practical exercises for importing data from MySQL.

Uploaded by

priyanka chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views30 pages

04 Sqoop

Uploaded by

priyanka chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Impor&ng

Rela&onal Data with Sqoop

Chapter 4

201509
Course Chapters

1 Introduc&on Course Introduc&on

2 Introduc&on to Hadoop and the Hadoop Ecosystem
Introduc&on to Hadoop
3 Hadoop Architecture and HDFS
4 Impor*ng Rela*onal Data with Apache Sqoop
5 Introduc&on to Impala and Hive
Impor*ng and Modeling
6 Modeling and Managing Data with Impala and Hive
Structured Data
7 Data Formats
8 Data File Par&&oning
9 Capturing Data with Apache Flume Inges&ng Streaming Data

10 Spark Basics
11 Working with RDDs in Spark
12 Aggrega&ng Data with Pair RDDs
13 Wri&ng and Deploying Spark Applica&ons Distributed Data Processing with
14 Parallel Processing in Spark Spark
15 Spark RDD Persistence
16 Common PaHerns in Spark Data Processing
17 Spark SQL and DataFrames

18 Conclusion Course Conclusion

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐2
Impor&ng Rela&onal Data with Apache Sqoop

In this chapter you will learn

§ How to import tables from an RDBMS into your Hadoop cluster
§ How to change the delimiter and ﬁle format of imported tables
§ How to control which columns and rows are imported
§ What techniques you can use to improve Sqoop’s performance
§ How the next-‐genera*on version of Sqoop compares to the original

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐3
Chapter Topics

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi&ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐4
What is Apache Sqoop?

§ Open source Apache project originally developed by Cloudera

– The name is a contrac&on of “SQL-‐to-‐Hadoop”
§ Sqoop exchanges data between a database and HDFS
– Can import all tables, a single table, or a par&al table into HDFS
– Data can be imported a variety of formats
– Sqoop can also export data from HDFS to a database

Database Hadoop Cluster

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐5
How Does Sqoop Work?

§ Sqoop is a client-‐side applica*on that imports data using Hadoop

MapReduce
§ A basic import involves three steps Database Server
orchestrated by Sqoop
1. Examine table details
2. Create and submit job to cluster 1

3. Fetch records from table and Sqoop 3

write this data to HDFS User

Hadoop Cluster

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐6
Basic Syntax

§ Sqoop is a command-‐line u*lity with several subcommands, called tools

– There are tools for import, export, lis&ng database contents, and more
– Run sqoop help to see a list of all tools
– Run sqoop help tool-name for help on using a speciﬁc tool
§ Basic syntax of a Sqoop invoca*on

$ sqoop tool-name [tool-options]

§ This command will list all tables in the loudacre database in MySQL

$ sqoop list-tables \
--connect jdbc:mysql://dbhost/loudacre \
--username dbuser \
--password pw

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐7
Chapter Topics

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi&ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐8
Overview of the Import Process

§ Imports are performed using Hadoop MapReduce jobs

§ Sqoop begins by examining the table to be imported
– Determines the primary key, if possible
– Runs a boundary query to see how many records will be imported
– Divides result of boundary query by the number of tasks (mappers)
– Uses this to configure tasks so that they will have equal loads
§ Sqoop also generates a Java source file for each table being imported
– It compiles and uses this during the import process
– The file remains afer import, but can be safely deleted

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐9
Impor&ng an En&re Database with Sqoop

§ The import-all-tables tool imports an en*re database

– Stored as comma-‐delimited ﬁles
– Default base loca&on is your HDFS home directory
– Data will be in subdirectories corresponding to name of each table

$ sqoop import-all-tables \
--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw

§ Use the --warehouse-dir op*on to specify a diﬀerent base directory

$ sqoop import-all-tables \
--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--warehouse-dir /loudacre

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐10
Impor&ng a Single Table with Sqoop

§ The import tool imports a single table

§ This example imports the accounts table
– It stores the data in HDFS as comma-‐delimited ﬁelds

$ sqoop import --table accounts \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw

§ This varia*on writes tab-‐delimited ﬁelds instead

$ sqoop import --table accounts \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--fields-terminated-by "\t"

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐11
Incremental Imports (1)

§ What if records have changed since last import?

– Could re-‐import all records, but this is inefficient
§ Sqoop’s incremental lastmodified mode imports new and
modified records
– Based on a &mestamp in a specified column
– You must ensure &mestamps are updated when records are added or
changed in the database

$ sqoop import --table invoices \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--incremental lastmodified \
--check-column mod_dt \
--last-value '2015-09-30 16:00:00'

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐12
Incremental Imports (2)

§ Or use Sqoop’s incremental append mode to import only new records
– Based on value of last record in speciﬁed column

$ sqoop import --table invoices \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--incremental append \
--check-column id \
--last-value 9478306

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐13
Expor&ng Data from Hadoop to RDBMS with Sqoop

§ Sqoop's import tool pulls records from an RDBMS into HDFS
§ It is some*mes necessary to push data in HDFS back to an RDBMS
– Good solu&on when you must do batch processing on large data sets
– Export results to a rela&onal database for access by other systems
§ Sqoop supports this via the export tool
– The RDBMS table must already exist prior to export

$ sqoop export \
--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--export-dir /loudacre/recommender_output \
--update-mode allowinsert \
--table product_recommendations

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐14
Chapter Topics

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi*ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐15
Impor&ng Par&al Tables with Sqoop

§ Import only speciﬁed columns from accounts table

$ sqoop import --table accounts \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--columns "id,first_name,last_name,state"

§ Import only matching rows from accounts table

$ sqoop import --table accounts \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--where "state='CA'"

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐16
Using a Free-‐Form Query

§ You can also import the results of a query, rather than a single table
§ Supply a complete SQL query using the --query op*on
– You must add the literal WHERE $CONDITIONS token
– Use --split-by to iden&fy ﬁeld used to divide work among mappers
– The --target-dir op&on is required for free-‐form queries

$ sqoop import \
--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
--target-dir /data/loudacre/payable \
--split-by accounts.id \
--query 'SELECT accounts.id, first_name,
last_name, bill_amount FROM accounts JOIN invoices ON
(accounts.id = invoices.cust_id) WHERE $CONDITIONS'

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐17
Using a Free-‐Form Query with WHERE Criteria

§ The --where op*on is ignored in a free-‐form query

– You must specify your criteria using AND following the WHERE clause

© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriHen consent from Cloudera. 4-‐18
Chapter Topics

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi&ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

§ Generic (JDBC)
– Compa&ble with nearly any database
– Overhead imposed by JDBC can limit performance
§ Direct Mode
– Can improve performance through use of database-‐speciﬁc u&li&es
– Currently supports MySQL and Postgres (use --direct op&on)
– Not all Sqoop features are available in direct mode
§ Cloudera and partners oﬀer high-‐performance Sqoop connectors
– These use na&ve database protocols rather than JDBC
– Connectors available for Netezza, Teradata, and Oracle
– Download these from Cloudera’s Web site
– Not open source due to licensing issues, but free to use

§ By default, Sqoop typically imports data using four parallel tasks (called
mappers)
– Increasing the number of tasks might improve import speed
– Cau&on: Each task adds load to your database server
§ You can inﬂuence the number of tasks using the -m op*on
– Sqoop views this only as a hint and might not honor it

$ sqoop import --table accounts \

--connect jdbc:mysql://dbhost/loudacre \
--username dbuser --password pw \
-m 8

§ Sqoop assumes all tables have an evenly-‐distributed numeric primary key
– Sqoop uses this column to divide work among the tasks
– You can use a diﬀerent column with the --split-by op&on

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi&ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

§ Sqoop is stable and has been used successfully in produc*on for years
§ However, its client-‐side architecture does impose some limita*ons
– Requires connec&vity to RDBMS
from the client (client must have Database Server

JDBC drivers installed)

– Requires connec&vity to cluster
from the client 1

– Requires user to specify RDBMS Sqoop 3

username and password User

– Diﬃcult to integrate a CLI within 2

external applica&ons
§ Also *ghtly coupled to JDBC seman*cs
– A problem for NoSQL databases Hadoop Cluster

§ Sqoop 2 is the next-‐genera*on version of Sqoop

– Client-‐server design addresses limita&ons described earlier
– API changes also simplify development of other Sqoop connectors
§ Client requires connec*vity only to the Sqoop server
– DB connec&ons are conﬁgured
Sqoop Server Database Server
on the server by a system
administrator 2
– End users no longer need to
possess database creden&als 1 3 4

– Centralized audit trail

– BeHer resource management Sqoop
Client
– Sqoop server is accessible via
CLI, REST API, and Web UI
User
Hadoop Cluster

§ Sqoop 2 is being ac*vely developed

– It began shipping (alongside Sqoop) star&ng in CDH 4.2
§ Sqoop 2 is not yet at feature parity with Sqoop
– Implemented features are regarded as stable
– Consider using Sqoop 2 unless you require a feature it lacks
§ We use Sqoop, rather than Sqoop 2, in this class
– Primarily due to memory constraints in the VM

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi&ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

§ Sqoop exchanges data between a database and the Hadoop cluster

– Provides subcommands (tools) for impor&ng, expor&ng, and more
§ Tables are imported using MapReduce jobs
– These are wriHen as comma-‐delimited text by default
– You can specify alternate delimiters or ﬁle formats
– Uncompressed by default, but you can specify a codec to use
§ Sqoop provides many op*ons to control imports
– You can select only certain columns or limit rows
– Supports using joins in free-‐form queries
§ Sqoop 2 is the next-‐genera*on version of Sqoop
– Client-‐server design improves administra&on and resource management

The following oﬀer more informa*on on topics discussed in this chapter
§ Sqoop User Guide
– http://tiny.cloudera.com/sqoopuserguide
§ Apache Sqoop Cookbook (published by O’Reilly)
– http://tiny.cloudera.com/sqoopcookbook
§ A New Genera*on of Data Transfer Tools for Hadoop: Sqoop 2
– http://tiny.cloudera.com/adcc05c

Imporng Relaonal Data with Impor*ng and Modeling Structured

Apache Sqoop Data

§ Sqoop Overview
§ Basic Imports and Exports
§ Limi&ng Results
§ Improving Sqoop’s Performance
§ Sqoop 2
§ Conclusion
§ Homework: Import Data from MySQL Using Sqoop

§ In this homework , you will

– Use Sqoop to import web page and customer account data from an
RDBMS to HDFS
– Perform incremental imports of new and updated account data
§ Please refer to the Homework descrip*on

Sqoop: Interface for RDBMS & Hadoop
No ratings yet
Sqoop: Interface for RDBMS & Hadoop
39 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
Sqoop Import Techniques Guide
No ratings yet
Sqoop Import Techniques Guide
18 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Using Sqooptool To Transfer Data Between Hadoop and Mysql: Implementation
No ratings yet
Using Sqooptool To Transfer Data Between Hadoop and Mysql: Implementation
4 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Bda U3
No ratings yet
Bda U3
59 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
18 pages
Module 5 - Sqoop
No ratings yet
Module 5 - Sqoop
25 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
9 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Fundamentals of Apache Sqoop Notes
100% (1)
Fundamentals of Apache Sqoop Notes
66 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Sqoop in Hadoop: Features & Benefits
No ratings yet
Sqoop in Hadoop: Features & Benefits
8 pages
Sqoop Implementation Revised
No ratings yet
Sqoop Implementation Revised
7 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
Sqoop Tool for AI & DS Students
No ratings yet
Sqoop Tool for AI & DS Students
10 pages
Sqoop
No ratings yet
Sqoop
4 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Sqoop
No ratings yet
Sqoop
15 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
Introduction to Sqoop in Hadoop
No ratings yet
Introduction to Sqoop in Hadoop
6 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
This Documents Are About Apache Sqoop
No ratings yet
This Documents Are About Apache Sqoop
23 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Hadoop Data Transfer with Sqoop
No ratings yet
Hadoop Data Transfer with Sqoop
21 pages
Sqoop 1
No ratings yet
Sqoop 1
29 pages
SqoopTutorial Ver 2.0
No ratings yet
SqoopTutorial Ver 2.0
51 pages
Sqoop 2
No ratings yet
Sqoop 2
10 pages
Sqoopintro
No ratings yet
Sqoopintro
2 pages
32 BDA Exp2
No ratings yet
32 BDA Exp2
24 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
Sqoop Commands for MySQL Import
No ratings yet
Sqoop Commands for MySQL Import
12 pages
Sqoop: Data Transfer Tool for Hadoop
No ratings yet
Sqoop: Data Transfer Tool for Hadoop
28 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
Sqooprequestfiles
No ratings yet
Sqooprequestfiles
7 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
Sqoop (Data Transfer Tool)
No ratings yet
Sqoop (Data Transfer Tool)
5 pages
Sqoop Import & Export Guide
No ratings yet
Sqoop Import & Export Guide
9 pages
Top Sqoop Interview Questions
No ratings yet
Top Sqoop Interview Questions
6 pages
Sqoop Interview Guide for Big Data
No ratings yet
Sqoop Interview Guide for Big Data
25 pages
Bridging Databases Mastering Hadoop Sqoop Integration
No ratings yet
Bridging Databases Mastering Hadoop Sqoop Integration
10 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
Ex 7
No ratings yet
Ex 7
7 pages
Intro
No ratings yet
Intro
2 pages
Experiment-5 (Case Study On Sqoop)
No ratings yet
Experiment-5 (Case Study On Sqoop)
5 pages
Week 3
No ratings yet
Week 3
11 pages
Business Analyst: Priyanka Kilaru
No ratings yet
Business Analyst: Priyanka Kilaru
2 pages
Mapreduce
No ratings yet
Mapreduce
5 pages
Programming For Data Science - Assignment 1
No ratings yet
Programming For Data Science - Assignment 1
2 pages
06 ImpalaHiveDataModeling
No ratings yet
06 ImpalaHiveDataModeling
47 pages
UTD Resume Final
No ratings yet
UTD Resume Final
1 page
Group - 3
No ratings yet
Group - 3
24 pages
Lecture 2
No ratings yet
Lecture 2
63 pages
Group - 1
No ratings yet
Group - 1
27 pages
Group - 36
No ratings yet
Group - 36
9 pages
NO.1 A. B. C. D. E.: Answer
No ratings yet
NO.1 A. B. C. D. E.: Answer
4 pages
DBMS Lab Manual 2024
No ratings yet
DBMS Lab Manual 2024
112 pages
Answer: D Explanation: The Term "DDL" Stands For Data Definition Language, Used To
No ratings yet
Answer: D Explanation: The Term "DDL" Stands For Data Definition Language, Used To
17 pages
DP 203 ExamTopics
No ratings yet
DP 203 ExamTopics
47 pages
12th IP 10-2-2024 Set-B
No ratings yet
12th IP 10-2-2024 Set-B
1 page
All Worksheets MYSQL With Answer
No ratings yet
All Worksheets MYSQL With Answer
35 pages
Sanjana Dbms Work
No ratings yet
Sanjana Dbms Work
38 pages
Oracle 11g Blocking Sessions Guide
No ratings yet
Oracle 11g Blocking Sessions Guide
5 pages
Punjab University College of Information Technology (Pucit) : Database Systems Lab 3
No ratings yet
Punjab University College of Information Technology (Pucit) : Database Systems Lab 3
2 pages
Lab 2 - CLC
No ratings yet
Lab 2 - CLC
4 pages
CE223 w1 Introduction
No ratings yet
CE223 w1 Introduction
47 pages
How To Calculate Space Used by LOB Segments in The Database (Doc ID 369883.1)
No ratings yet
How To Calculate Space Used by LOB Segments in The Database (Doc ID 369883.1)
2 pages
Practice
No ratings yet
Practice
4 pages
Dbms 1
No ratings yet
Dbms 1
78 pages
SQL ORDER BY Guide for Beginners
No ratings yet
SQL ORDER BY Guide for Beginners
14 pages
PHP & MySQL: A Developer's Guide
No ratings yet
PHP & MySQL: A Developer's Guide
6 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
46 pages
Connect To An Oracle Database With JDBC: Web Site
No ratings yet
Connect To An Oracle Database With JDBC: Web Site
4 pages
College Database
No ratings yet
College Database
9 pages
SQL ALTER TABLE Guide
No ratings yet
SQL ALTER TABLE Guide
4 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Dbms Practical-1
No ratings yet
Dbms Practical-1
28 pages
SQL For Beginners To Advance Le - RAJPUT, ANANT
No ratings yet
SQL For Beginners To Advance Le - RAJPUT, ANANT
104 pages
MySQL Table Creation & Queries
No ratings yet
MySQL Table Creation & Queries
7 pages
SQL Server To Snowflake Data Modeling Migration Test Cases
No ratings yet
SQL Server To Snowflake Data Modeling Migration Test Cases
6 pages
R.Practical 20th
No ratings yet
R.Practical 20th
6 pages
DW Unit-2
No ratings yet
DW Unit-2
5 pages
Connecting To MySQL Database Using C
No ratings yet
Connecting To MySQL Database Using C
6 pages
Managing SAP ASE From The Command Line
No ratings yet
Managing SAP ASE From The Command Line
484 pages
Cursor and Triggers
No ratings yet
Cursor and Triggers
6 pages

04 Sqoop

Uploaded by

04 Sqoop

Uploaded by

Impor&ng

Rela&onal Data with Sqoop

1 Introduc&on Course Introduc&on

18 Conclusion Course Conclusion

In this chapter you will learn

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

§ Open source Apache project originally developed by Cloudera

Database Hadoop Cluster

§ Sqoop is a client-­‐side applica*on that imports data using Hadoop

3. Fetch records from table and Sqoop 3

§ Sqoop is a command-­‐line u*lity with several subcommands, called tools

$ sqoop tool-name [tool-options]

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

§ Imports are performed using Hadoop MapReduce jobs

§ The import-all-tables tool imports an en*re database

§ Use the --warehouse-dir op*on to specify a diﬀerent base directory

§ The import tool imports a single table

$ sqoop import --table accounts \

§ This varia*on writes tab-­‐delimited ﬁelds instead

$ sqoop import --table accounts \

§ What if records have changed since last import?

$ sqoop import --table invoices \

$ sqoop import --table invoices \

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

§ Import only speciﬁed columns from accounts table

$ sqoop import --table accounts \

§ Import only matching rows from accounts table

$ sqoop import --table accounts \

§ The --where op*on is ignored in a free-­‐form query

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

$ sqoop import --table accounts \

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

JDBC drivers installed)

– Requires user to specify RDBMS Sqoop 3

– Diﬃcult to integrate a CLI within 2

§ Sqoop 2 is the next-­‐genera*on version of Sqoop

– Centralized audit trail

§ Sqoop 2 is being ac*vely developed

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

§ Sqoop exchanges data between a database and the Hadoop cluster

Impor*ng Rela*onal Data with Impor*ng and Modeling Structured

§ In this homework , you will

You might also like

Imporng Relaonal Data with Impor*ng and Modeling Structured

§ Sqoop is a client-‐side applica*on that imports data using Hadoop

§ Sqoop is a command-‐line u*lity with several subcommands, called tools

Imporng Relaonal Data with Impor*ng and Modeling Structured

§ This varia*on writes tab-‐delimited ﬁelds instead

Imporng Relaonal Data with Impor*ng and Modeling Structured

§ The --where op*on is ignored in a free-‐form query

Imporng Relaonal Data with Impor*ng and Modeling Structured

Imporng Relaonal Data with Impor*ng and Modeling Structured

§ Sqoop 2 is the next-‐genera*on version of Sqoop

Imporng Relaonal Data with Impor*ng and Modeling Structured

Imporng Relaonal Data with Impor*ng and Modeling Structured