0% found this document useful (0 votes)
87 views15 pages

Hbase

HBase is a non-relational, scalable database built on HDFS. It is based on Google's BigTable and allows CRUD operations. HBase uses a master-region server architecture with Zookeeper for coordination. Data is stored in tables with rows identified by keys and columns grouped into column families that can contain many versions. HBase integrates with tools like Spark, Hive and Pig for large-scale data access and population.

Uploaded by

Bora Yüret
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views15 pages

Hbase

HBase is a non-relational, scalable database built on HDFS. It is based on Google's BigTable and allows CRUD operations. HBase uses a master-region server architecture with Zookeeper for coordination. Data is stored in tables with rows identified by keys and columns grouped into column families that can contain many versions. HBase integrates with tools like Spark, Hive and Pig for large-scale data access and population.

Uploaded by

Bora Yüret
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

HBASE

Non-relational, scalable database


built on HDFS
Based on Google’s BigTable
CRUD

■ Create
■ Read
■ Update
■ Delete
■ There is no query language, only CRUD API’s!
HBase architecture
Zookeeper HMaster
Zookeeper HMaster
Zookeeper HMaster

Region Region Region Region


Server Server Server Server
Auto-sharding!

HDFS
HBase data model

■ Fast access to any given ROW


■ A ROW is referenced by a unique KEY
■ Each ROW has some small number of COLUMN FAMILIES
■ A COLUMN FAMILY may contain arbitrary COLUMNS
■ You can have a very large number of COLUMNS in a COLUMN FAMILY
■ Each CELL can have many VERSIONS with given timestamps
■ Sparse data is A-OK – missing columns in a row consume no storage.
Example: One row of a web table
Contents column family Anchor column family
Key Contents: Anchor:cnnsi.com Anchor:my.look.ca
com.cnn.www <html><head>
<html><head> “CNN” “CNN.com”
CNN…
<html><head>
CNN…
CNN…
Some ways to access HBase

■ HBase shell
■ Java API
– Wrappers for Python, Scala, etc.
■ Spark, Hive, Pig
■ REST service
■ Thrift service
■ Avro service
LET’S PLAY WITH
HBASE
Creating a HBase table with Python via REST
What are we doing?
■ Create a HBase table for movie ratings by user
■ Then show we can quickly query it for individual users
■ Good example of sparse data

Column family: rating

Rating:50 Rating:33 Rating:223

UserID 1 5 5
How are we doing it?

Python client

REST service

HBase

HDFS
Let’s do this
HBASE / PIG
INTEGRATION
Populating HBase at scale
Integrating Pig with HBase

■ Must create HBase table ahead of time


■ Your relation must have a unique key as its first column, followed by
subsequent columns as you want them saved in Hbase
■ USING clause allows you to STORE into an HBase table
■ Can work at scale – Hbase is transactional on rows
Let’s do this

You might also like