0% found this document useful (0 votes)

26 views14 pages

Unit 5 Short

The document discusses the Hadoop ecosystem frameworks, focusing on Pig, Hive, and HBase for big data applications. Pig is a high-level platform for processing large datasets with a scripting language called Pig Latin, while Hive serves as a data warehouse infrastructure for structured data processing, and HBase is a column-oriented database for real-time access to large datasets. Each framework has unique applications, benefits, and execution modes, facilitating efficient data analysis and management.

Uploaded by

voila.alberto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views14 pages

Unit 5 Short

Uploaded by

voila.alberto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

 Hadoop Eco System Frameworks: Applications on Big Data using Pig, Hive and HBase

o Application of Big Data using Pig, Hive and HBase

2. Pig :

 Pig is a high-level platform or tool which is used to process large datasets.

 It provides a high level of abstraction for processing over MapReduce.

 It provides a high-level scripting language, known as Pig Latin which is used

to develop the data analysis codes.

 Applications :

 For exploring large datasets Pig Scripting is used.

 Provides supports across large data sets for Ad-hoc queries.

 In the prototyping of large data-sets processing algorithms.

 Required to process the time-sensitive data loads.

 For collecting large amounts of datasets in form of search logs and

web crawls.

 Used where the analytical insights are needed using the sampling.

3. Hive :

 Hive is a data warehouse infrastructure tool to process structured data in

Hadoop.

 It resides on top of Hadoop to summarize Big Data and makes querying and
analyzing easy.

 It is used by different companies. For example, Amazon uses it in Amazon

Elastic MapReduce.

 Benefits :

 Ease of use

 Accelerated initial insertion of data

 Superior scalability, flexibility, and cost-efficiency

 Streamlined security

 Low overhead

 Exceptional working capacity

4. HBase :

 HBase is a column-oriented non-relational database management system

that runs on top of the Hadoop Distributed File System (HDFS).
 HBase provides a fault-tolerant way of storing sparse data sets, which are
common in many big data use cases.

 HBase does support writing applications in Apache Avro, REST and Thrift.

 Application :

 Medical

 Sports

 Web

 Oil and petroleum

 e-commerce

 Pig

o Introduction to PIG

 Ø What is Apache Pig?

 Apache Pig is an abstraction over MapReduce. It is a tool/platform which is

used to analyze larger sets of data representing them as data flows. Pig is
generally used with Hadoop; we can perform all the data manipulation
operations in Hadoop using Apache Pig.

 To write data analysis programs, Pig provides a high-level language known as

Pig Latin. This language provides various operators using which programmers
can develop their own functions for reading, writing, and processing data.

 To analyze data using Apache Pig, programmers need to write scripts using
Pig Latin language. All these scripts are internally converted to Map and
Reduce tasks. Apache Pig has a component known as Pig Engine that accepts
the Pig Latin scripts as input and converts those scripts into MapReduce jobs.

o Execution Modes of Pig

 Apache Pig Execution Modes

 You can run Apache Pig in two modes, namely, Local Mode and HDFS
mode.

 Local Mode

 In this mode, all the files are installed and run from your local host
and local file system. There is no need of Hadoop or HDFS. This
mode is generally used for testing purpose.

 MapReduce Mode

 MapReduce mode is where we load or process the data that exists in

the Hadoop File System (HDFS) using Apache Pig. In this mode,
whenever we execute the Pig Latin statements to process the data, a
MapReduce job is invoked in the back-end to perform a particular
operation on the data that exists in the HDFS.

 Script Execution Modes

 Interactive Mode (Grunt shell) − You can run Apache Pig in

interactive mode using the Grunt shell. In this shell, you can enter
the Pig Latin statements and get the output (using Dump operator).

 Batch Mode (Script) − You can run Apache Pig in Batch mode by
writing the Pig Latin script in a single file with .pig extension.

 Embedded Mode (UDF) − Apache Pig provides the provision of

defining our own functions (User Defined Functions) in programming
languages such as Java, and using them in our script.

o Comparison of Pig with Databases

 There are several differences between the two languages, and between Pig
and relational database management systems (RDBMSs) in general.

 The most significant difference is that Pig Latin is a data flow programming
language, whereas SQL is a declarative programming language.

 Pig Latin programs are step-by-step transformations on input relations; SQL

statements declare constraints that the output must satisfy.

 RDBMSs store data in tables with predefined schemas; Pig is more relaxed
and allows optional schemas defined at runtime.

 Pig operates on any source of tuples (e.g., text files) without an import
process; data is loaded from the filesystem (usually HDFS) as the first step.

o Grunt

 Grunt is the interactive shell for Apache Pig.

 Provides line-editing facilities like GNU Readline (e.g., Ctrl-E moves to end of
line).

 Remembers command history (Ctrl-P/Ctrl-N or ↑/↓) and offers tab-

completion for Pig Latin keywords and functions.

 Supports utility commands: clear, help, history, quit, set, exec, kill, run.

 Allows OS commands via sh and HDFS commands via fs.

o Pig Latin

 Pig Latin is a high-level data flow language with SQL-like semantics.

 Abstracts Java MapReduce into textual notation; simple to learn for those
familiar with SQL.

 Supports primitive types (int, long, float, double, chararray, bytearray,

boolean, datetime, biginteger, bigdecimal) and complex types (tuple, bag,
map).
 Statements end with a semicolon (;), take relations as input, and produce
relations as output.

o User Defined Functions

 Apache Pig supports UDFs in Java (full), and limited support in Jython,
Python, JavaScript, Ruby, Groovy.

 Java UDF types:

 Filter Functions − Used in FILTER statements; accept a Pig value and

return a Boolean.

 Eval Functions − Used in FOREACH-GENERATE; accept a Pig value

and return a Pig result.

 Algebraic Functions − Act on inner bags in FOREACH-GENERATE,

performing full MapReduce on an inner bag.

 Piggybank is the Java UDF repository for sharing and reusing UDFs.

o Data Processing Operators

 Loading and Storing: LOAD, STORE

 Filtering: FILTER, DISTINCT, FOREACH…GENERATE, STREAM

 Grouping and Joining: GROUP, COGROUP, JOIN, CROSS

 Sorting: ORDER, LIMIT

 Combining and Splitting: UNION, SPLIT

 Diagnostics: DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE

Apache Hive Architecture and Installation

 Architecture Components

o User Interface (UI): Provides interfaces such as Hive Web UI, Hive CLI and Hive
HDInsight for submitting queries and operations

o Hive Server (Thrift): Receives client requests (JDBC/ODBC, Beeline, Hue) and
forwards them to the Driver

o Driver: Manages client sessions, forwards queries to the Compiler, and handles
result fetching

o Compiler: Parses HiveQL, performs semantic analysis, and generates an execution

plan (DAG of MapReduce/HDFS tasks) using metadata from the Metastore

o Metastore: Central repository storing table/partition schemas, column metadata,

serializers/deserializers and HDFS file locations in an RDBMS

o Execution Engine: Executes the compiled DAG, managing task dependencies and
coordinating with Hadoop/YARN

o HDFS/HBase: Underlying storage for table data, accessed via Hive’s execution engine
 Installation

o Prerequisite: Install same Hadoop version locally as used by your target cluster

o Download and unpack Hive:

o tar xzf apache-hive-x.y.z-bin.tar.gz

o Set environment variables:

o export HIVE_HOME=~/sw/apache-hive-x.y.z-bin

o export PATH=$PATH:$HIVE_HOME/bin

o Launch Hive shell:

o hive

Hive Shell

 Primary CLI for issuing HiveQL commands; heavily influenced by MySQL syntax

 Features:

o Commands terminated by semicolon; supports ! prefix for OS commands and dfs for
HDFS operations

o Command history navigation via ↑/↓ or Ctrl-P/Ctrl-N; line editing (Ctrl-E to end) and
Tab-based auto-completion

o Case-insensitive keywords (except in string comparisons)

 Modes:

o Interactive: Launch with hive; enter queries at hive> prompt

o Non-Interactive: Run scripts using hive -f script.q or single statements with hive -e
"SELECT …"

Hive Services

 Hive CLI: Shell for executing HiveQL queries and commands

 Hive Web UI: Web-based GUI alternative to CLI

 Hive MetaStore: Stores metadata (schemas, serializers, HDFS mappings) in a relational DB

 Hive Server (Thrift): Endpoint for remote client connections, forwarding requests to Driver

 Hive Driver: Manages sessions, forwards queries to Compiler, returns results to clients

 Hive Compiler: Parses HiveQL, performs semantic checks, generates execution plan using
Metastore data

 Hive Execution Engine: Executes the DAG of MapReduce/HDFS tasks, handling dependencies
◆
Hive Metastore

 Central repository for table and partition definitions, column types, serializers/deserializers,
and file paths; stored in an RDBMS (e.g., MySQL, PostgreSQL)

 Shared by Hive, Spark, Impala, and other tools to ensure consistent metadata across engines

Comparison with Traditional Databases

Aspect Hive Data Warehouse Traditional RDBMS

Purpose Data warehousing, OLAP OLTP and OLAP

Schema Schema on READ: schema applied at Schema on WRITE:

Enforcement query time enforced on data load

Scalability Highly scalable at low cost (HDFS) Scale-up (costly)

Read/Write
Write once, read many Read/write many
Patterns

Transactions &
Not supported Supported
Indexes

Supports normalized & de-normalized, Normalized data, manual

Data Organization
automatic partitioning sharding

Drop removes metadata only (external) Drop removes data and

Table Deletion
or metadata+data metadata

HiveQL

 SQL-like query language (HQL) that compiles into MapReduce jobs; supporting SELECT,
INSERT, UPDATE-like operations

 Extensions: Window functions (SQL:2003), multi-table inserts, TRANSFORM/MAP/REDUCE

clauses

 Data Types:

o Primitive: numeric types, Boolean, string (CHAR, VARCHAR), TIMESTAMP

o Complex: ARRAY, MAP, STRUCT

 File Formats: TEXTFILE, SEQUENCEFILE, ORC, RCFILE

Hive Tables

 Managed (Internal) Tables: Hive owns both data and metadata; stored by default under
/user/hive/warehouse; DROP TABLE removes data+metadata
 External Tables: Hive manages only metadata; data resides at user-specified HDFS/Storage
locations; DROP TABLE removes metadata only

 Partitioning & Bucketing: (not detailed in provided docs)

Querying Data & User-Defined Functions

 Hive supports three types of UDFs (written in Java):

o UDF: Operates on single row → single row (e.g., string/math functions)

o UDAF: Aggregates multiple rows → single result (e.g., COUNT, MAX)

o UDTF: Single row → multiple rows (table-generating, e.g., EXPLODE)

 APIs:

o Simple UDF API (org.apache.hadoop.hive.ql.exec.UDF): Use for writable types,

implement evaluate()

o Generic UDF (org.apache.hadoop.hive.ql.udf.generic.GenericUDF): Use for complex

types, manually handle ObjectInspectors

Sorting & Aggregating

 SORT BY: Sorts output by specified columns before reducer; use ASC/DESC; numeric vs
lexicographical order based on column type

 Aggregate Functions: Take a set of values and return a single value; can be used with or
without GROUP BY (commonly with GROUP BY)

MapReduce Scripts

 Used for large-scale data processing by breaking data into smaller parts and creating parallel
jobs automatically

 Can be run manually or scheduled; auto-yielding on governance limits for rescheduling

without disruption

 Ideal when logic is chunk-able; less suitable for complex per-record processing due to I/O
overhead

Joins & Subqueries

 Joins: Combine rows from multiple tables based on equality predicates; supports INNER,
LEFT OUTER, RIGHT OUTER, FULL OUTER; only equality joins; non-commutative and left-
associative

 Subqueries: Hive supports subqueries in FROM and WHERE clauses (since Hive 0.13); a
subquery returns a result set used by its outer query
 HBase Concepts

o What is HBase?

 HBase is a distributed column-oriented database built on top of the Hadoop

file system. It is an open-source project and is horizontally scalable.

 HBase is a data model that is similar to Google’s BigTable designed to

provide quick random access to huge amounts of structured data. It
leverages the fault tolerance provided by the Hadoop File System (HDFS).

 It is a part of the Hadoop ecosystem that provides random real-time

read/write access to data in the Hadoop File System.

 One can store the data in HDFS either directly or through HBase. Data
consumer reads/accesses the data in HDFS randomly using HBase. HBase sits
on top of the Hadoop File System and provides read and write access.

 Clients (HBase Client API)

o Class HBaseConfiguration

 Adds HBase configuration files to a Configuration. This class belongs to the

org.apache.hadoop.hbase package.

 static org.apache.hadoop.conf.Configuration create()

 This method creates a Configuration with HBase resources.

o Class HTable (org.apache.hadoop.hbase.client)

 Represents an HBase table, used to communicate with a single HBase table.

 Constructors

 HTable()

 HTable(TableName tableName, ClusterConnection connection,

ExecutorService pool)

 Using this constructor, you can create an object to access an

HBase table.

 Key Methods

 void close() – Releases all the resources of the HTable.

 void delete(Delete delete) – Deletes the specified cells/row.

 boolean exists(Get get) – Tests the existence of columns in the table,

as specified by Get.

 Result get(Get get) – Retrieves certain cells from a given row.

 Configuration getConfiguration() – Returns the Configuration object

used by this instance.
 TableName getName() – Returns the table name instance of this
table.

 HTableDescriptor getTableDescriptor() – Returns the table descriptor

for this table.

 byte[] getTableName() – Returns the name of this table.

 void put(Put put) – Inserts data into the table.

 Example Schema

o Table is a collection of rows.

o Row is a collection of column families.

o Column family is a collection of columns.

o Column is a collection of key value pairs.

o Example schema of table in HBase:

o Rowid | Column Family | Column Family | Column Family | Column Family

o col1 col2 col3 | col1 col2 col3 | col1 col2 col3 | col1 col2 col3

o 1

o 2

o 3

 HBase vs RDBMS

o SQL

 RDBMS: It requires SQL (Structured Query Language).

 HBase: SQL is not required in HBase.

o Schema

 RDBMS: It has a fixed schema.

 HBase: It does not have a fixed schema and allows for the addition of
columns on the fly.

o Database Type

 RDBMS: It is a row-oriented database.

 HBase: It is a column-oriented database.

o Scalability

 RDBMS: Allows for scaling up by upgrading existing servers.

 HBase: Scale-out is possible by adding new servers to the cluster.

o Nature
 RDBMS: Static in nature.

 HBase: Dynamic in nature.

o Data retrieval

 RDBMS: Slower retrieval of data.

 HBase: Faster retrieval of data.

o Rule

 RDBMS: Follows ACID (Atomicity, Consistency, Isolation, Durability).

 HBase: Follows CAP (Consistency, Availability, Partition-tolerance).

o Type of data

 RDBMS: Handles structured data.

 HBase: Handles structured, unstructured, and semi-structured data.

o Sparse data

 RDBMS: Cannot handle sparse data.

 HBase: Can handle sparse data.

o Volume of data

 RDBMS: Determined by server configuration.

 HBase: Depends on the number of machines deployed.

o Transaction Integrity

 RDBMS: Guarantees transaction integrity.

 HBase: No such guarantee.

o Referential Integrity

 RDBMS: Supported.

 HBase: No built-in support.

o Normalize

 RDBMS: Data can be normalized.

 HBase: Data is not normalized; no logical relationships across tables.

o Table size

 RDBMS: Designed for small tables; difficult to scale.

 HBase: Designed for large tables; scales horizontally.

 Advanced Usage (Applications & History)

o Applications of HBase
 It is used whenever there is a need to write heavy applications.

 HBase is used whenever we need to provide fast random access to available

data.

 Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase

internally.

o HBase History

 Nov 2006 – Google released the paper on BigTable.

 Feb 2007 – Initial HBase prototype was created as a Hadoop contribution.

 Oct 2007 – First usable HBase (with Hadoop 0.15.0) released.

 Jan 2008 – HBase became a Hadoop sub-project.

 Oct 2008 – HBase 0.18.1 released.

 Jan 2009 – HBase 0.19.0 released.

 Sept 2009 – HBase 0.20.0 released.

 May 2010 – HBase graduated to Apache top-level project.

 Schema Design

o HBase table can scale to billions of rows and many columns based on requirements;
supports terabytes of data and high read/write throughput at low latency.

o HBase Table Schema Design General Concepts

 Row key: Each table is indexed on a row key; data is sorted lexicographically
by row key. No secondary indices.

 Automaticity: Avoid designs requiring atomicity across multiple rows; all

operations are atomic at row level.

 Even distribution: Reads/writes should be uniformly distributed across all

nodes; design row keys so related entities are in adjacent rows.

o Size Limits

 Row keys: 4 KB per key

 Column families: ≤ 10 per table

 Column qualifiers: 16 KB per qualifier

 Individual cell values: < 10 MB

 All values in a single row: ≤ 10 MB

 Advance Indexing

o Advanced indexing is triggered when obj is:

 an ndarray of type integer or Boolean

 a tuple with at least one sequence object

 a non-tuple sequence object

o Advanced indexing returns a copy of data rather than a view.

o Two types of advanced indexing:

 Integer Indexing: Selects arbitrary items based on N-dimensional indices.

 Boolean Array Indexing: Selects items based on Boolean conditions.

 ZooKeeper – Monitoring a Cluster

o Key Performance Metrics to Monitor:

 Resource utilization details

 Thread and JVM usage

 Performance statistics

 Cluster and configuration details

o Resource utilization details: Automatically discover clusters, monitor heap/non-heap

memory on znodes, collect and alert on GC iterations, heap size, system usage, and
threads.

o Thread and JVM usage: Analyze JVM thread dumps; track daemon, peak, and live
thread counts.

o Performance statistics: Measure server response time to client requests, queued

requests, connections, and network usage.

o Cluster and configuration details: Track Znode count, watchers, follower count,
leader selection stats, session times, and session growth.

 Building Applications with ZooKeeper

o A Configuration Service:

 ZooKeeper can act as a highly available store for configuration; clients can
retrieve or update configuration files and receive notifications of changes via
watches.

 The write() method hides the difference between creating/updating a znode

by checking existence before performing the appropriate operation.

o The Resilient ZooKeeper Application:

 Programs assume reliable networks; ZooKeeper operations declare

InterruptedException and KeeperException.

 InterruptedException: Thrown if a ZooKeeper operation is interrupted;

adheres to Java’s standard interrupt mechanism.
 KeeperException: Thrown on server errors or communication failures;
subclasses like NoNodeException. Handling strategies: catch and test code or
catch specific subclasses.

 KeeperException Categories:

 State exceptions: Operation cannot be applied to znode tree (e.g.,

BadVersionException).

 Recoverable exceptions: Connection loss; ZooKeeper reconnects to

preserve session.

 Unrecoverable exceptions: Session expiration or auth failure;

ephemeral znodes lost, application must rebuild state.

o A Lock Service:

 Distributed locks provide mutual exclusion; used for leader election.

 Clients create sequential ephemeral znodes under a lock znode; the lowest
sequence acquires the lock.

 The herd effect: All clients watching lock znode get notified on any child
change, causing traffic spikes; solution is to watch only the predecessor
znode.

 Recoverable exceptions: Create failures due to connection loss; use

identifier-embedded znode names to detect successful creations.

 Unrecoverable exceptions: Session expiration deletes ephemeral znodes;

application must detect loss of lock, clean up, and retry.

🧰 IBM Big Data Tools Overview

1. IBM InfoSphere

 What it is: A suite of data integration and governance tools.

 Main Features:

o Data quality management

o Data cleansing and profiling

o ETL (Extract, Transform, Load) operations

 Use Case: Used to integrate, cleanse, and prepare data from diverse sources for analytics and
reporting.

2. IBM BigInsights

 What it is: IBM’s distribution of Apache Hadoop for big data analytics.
 Main Features:

o Supports Hadoop ecosystem components (Hive, Pig, HBase)

o Includes advanced analytics tools (machine learning, text analytics)

o Has a web-based console for administration

 Use Case: Large-scale analysis of structured and unstructured data.

3. IBM BigSheets

 What it is: A web-based spreadsheet-like tool built on top of BigInsights.

 Main Features:

o Lets non-programmers analyze large data sets using a familiar interface

o Uses spreadsheets to run Hadoop jobs in the background

 Use Case: Business users can visually explore and analyze big data without needing to write
MapReduce code.

4. IBM Big SQL

 What it is: An SQL-on-Hadoop engine by IBM.

 Main Features:

o Supports ANSI SQL queries over Hadoop data (including HDFS, Hive, HBase)

o Federates queries across traditional RDBMS (DB2, Oracle) and Hadoop

o Strong security and performance optimization

 Use Case: Enables analysts and DBAs to query big data using familiar SQL syntax across
hybrid data sources.

Would you like this in a formatted PDF or included in your study notes?

Unit-5 (1) BD
No ratings yet
Unit-5 (1) BD
18 pages
BIGDATUNIT5
No ratings yet
BIGDATUNIT5
32 pages
Unit 5 (Pig, Hive, Hbase)
No ratings yet
Unit 5 (Pig, Hive, Hbase)
18 pages
Big Data - Unit 5 - Frame Works - Mini Xerox - Easy Read
No ratings yet
Big Data - Unit 5 - Frame Works - Mini Xerox - Easy Read
23 pages
Unit V Notes
No ratings yet
Unit V Notes
17 pages
Big Data Processing with Hive & Pig
No ratings yet
Big Data Processing with Hive & Pig
18 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
Big Data Applications: Pig & Hive
No ratings yet
Big Data Applications: Pig & Hive
29 pages
BDA Unit 5-1
No ratings yet
BDA Unit 5-1
29 pages
Notes UNIT 5 Bigdata
No ratings yet
Notes UNIT 5 Bigdata
18 pages
Unit 4 Hadoop Eco System PDF
No ratings yet
Unit 4 Hadoop Eco System PDF
78 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
BD 5
No ratings yet
BD 5
28 pages
Notes of Aktu Btech 3 Yr Big Data
No ratings yet
Notes of Aktu Btech 3 Yr Big Data
15 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Hive - PIG - HBase - Zookeeper
100% (1)
Hive - PIG - HBase - Zookeeper
31 pages
Hadoop Big Data: Pig, Hive, HBase
No ratings yet
Hadoop Big Data: Pig, Hive, HBase
17 pages
Unit 5
No ratings yet
Unit 5
19 pages
Notes 5 Unit Big Data
No ratings yet
Notes 5 Unit Big Data
23 pages
BDA Module-4
No ratings yet
BDA Module-4
4 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Bda 06
No ratings yet
Bda 06
15 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
81 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
BD U-5 (Anupam Sir)
No ratings yet
BD U-5 (Anupam Sir)
12 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
19 pages
Big Data Unit 5 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 5 (Easy Notes) Edushine Classes
42 pages
Unit-V CC&BD CS62
No ratings yet
Unit-V CC&BD CS62
73 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Unit 5
No ratings yet
Unit 5
24 pages
Unit 5
No ratings yet
Unit 5
39 pages
U5 Big Data Aktu
No ratings yet
U5 Big Data Aktu
32 pages
Pig, Hive, and Jaql: IBM Information Management Cloud Computing Center of Competence IBM Toronto Lab
No ratings yet
Pig, Hive, and Jaql: IBM Information Management Cloud Computing Center of Competence IBM Toronto Lab
40 pages
Notes
No ratings yet
Notes
19 pages
S Pig Hive HBase Zookeeper 07
No ratings yet
S Pig Hive HBase Zookeeper 07
21 pages
Big Data Module V Notes
No ratings yet
Big Data Module V Notes
26 pages
Big Data Unit 5 Big Data Notes of Unit 5
No ratings yet
Big Data Unit 5 Big Data Notes of Unit 5
16 pages
Unit No 4 Hadoop Eco System
No ratings yet
Unit No 4 Hadoop Eco System
15 pages
Unit 5 Bigdata
No ratings yet
Unit 5 Bigdata
14 pages
Bda Notes Jntuk R20 Unit 4
100% (1)
Bda Notes Jntuk R20 Unit 4
14 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Bda Ia-3 QB-1
No ratings yet
Bda Ia-3 QB-1
17 pages
Units 5
No ratings yet
Units 5
3 pages
BigData Unit 4
No ratings yet
BigData Unit 4
13 pages
Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
Unit5 Part1 Notes
No ratings yet
Unit5 Part1 Notes
21 pages
Unit V
No ratings yet
Unit V
23 pages
Apache Pig in Nosql Databases
No ratings yet
Apache Pig in Nosql Databases
5 pages
Cse 17CS82 M2 S1 PPT
No ratings yet
Cse 17CS82 M2 S1 PPT
35 pages
Data Analytics Chapter 5
No ratings yet
Data Analytics Chapter 5
14 pages
6 H Data With Hive Big Data Analytics B.tech. Final Year
No ratings yet
6 H Data With Hive Big Data Analytics B.tech. Final Year
24 pages
5.15.3 Spring Example in MyEclipse
No ratings yet
5.15.3 Spring Example in MyEclipse
3 pages
Spring Setup Guide for Eclipse
No ratings yet
Spring Setup Guide for Eclipse
3 pages
5.15.1 Getting Started Building Web Applications With Spring Boot and Kotlin
No ratings yet
5.15.1 Getting Started Building Web Applications With Spring Boot and Kotlin
14 pages
Chapter8 Abstraction
No ratings yet
Chapter8 Abstraction
8 pages
String
No ratings yet
String
19 pages
Face Detection and Recognition
100% (3)
Face Detection and Recognition
39 pages
Robotics Periodical Reviewer
No ratings yet
Robotics Periodical Reviewer
11 pages
Session3-Lab Exercises
No ratings yet
Session3-Lab Exercises
15 pages
Debre Markos Online Voting System
No ratings yet
Debre Markos Online Voting System
59 pages
How To Become A Web3 Developer With Resources.
100% (1)
How To Become A Web3 Developer With Resources.
4 pages
Paloalto Networks-SSE-Engineer - Unlocked
No ratings yet
Paloalto Networks-SSE-Engineer - Unlocked
10 pages
PSY 320 L3 Data Presentation Methods 1
No ratings yet
PSY 320 L3 Data Presentation Methods 1
9 pages
Conky Configuration Script
No ratings yet
Conky Configuration Script
4 pages
Hindi Typing Shortcut Key PDF
100% (1)
Hindi Typing Shortcut Key PDF
4 pages
Docker Tutorial: Containers & Images
No ratings yet
Docker Tutorial: Containers & Images
109 pages
HP-UX Test
No ratings yet
HP-UX Test
9 pages
Phishing Page Creation Guide
No ratings yet
Phishing Page Creation Guide
2 pages
Introduction To VI Editor
No ratings yet
Introduction To VI Editor
6 pages
18-177-ATP Aquaventure-Technical Design-ICT&AV-A3 - (20190725)
No ratings yet
18-177-ATP Aquaventure-Technical Design-ICT&AV-A3 - (20190725)
9 pages
DC Unit 1 - Study Material - Part 1 PDF
No ratings yet
DC Unit 1 - Study Material - Part 1 PDF
26 pages
Top 3 Database Books for Students
No ratings yet
Top 3 Database Books for Students
12 pages
9.3.1.2 Lab - Configure ASA 5506-X Basic Settings and Firewall Using CLI - Instructor
No ratings yet
9.3.1.2 Lab - Configure ASA 5506-X Basic Settings and Firewall Using CLI - Instructor
39 pages
Employee Management - Employee - How To Activate Leave Pay
No ratings yet
Employee Management - Employee - How To Activate Leave Pay
5 pages
Power BI Facilitation Guide - Session 1
No ratings yet
Power BI Facilitation Guide - Session 1
13 pages
PSS 16 - PSS 32 - 1830 Photonic Service Switch - Amt
No ratings yet
PSS 16 - PSS 32 - 1830 Photonic Service Switch - Amt
3 pages
17 - Growth Apex Maths CT - 17 - (22-06-2024) - SC
No ratings yet
17 - Growth Apex Maths CT - 17 - (22-06-2024) - SC
2 pages
Erased Log by Sos
No ratings yet
Erased Log by Sos
1 page
HP 3par Replacing A Failed Disk - 3ParDude
100% (1)
HP 3par Replacing A Failed Disk - 3ParDude
4 pages
Chemistry
No ratings yet
Chemistry
2 pages
Backpack Design Challenge
No ratings yet
Backpack Design Challenge
9 pages
Literature Review of Face Detection
100% (1)
Literature Review of Face Detection
5 pages
User Acceptance Testing
No ratings yet
User Acceptance Testing
10 pages
DB - Lec - 1 and 2
No ratings yet
DB - Lec - 1 and 2
39 pages
EMAX Hawk Pro User Manual
No ratings yet
EMAX Hawk Pro User Manual
11 pages
Mathematics Grade 12 Summer School Question Paper 2025
No ratings yet
Mathematics Grade 12 Summer School Question Paper 2025
6 pages