HadCom.utils

Overview

This project creates a runnable jar file that can do some common advance functionality with hadoop.

This first version was built for CDH 4. I will be making it work for CDH 3 shortly.

##Functionality ###Put a collections of layered functionality for advance putting. For details on how to use this functionality click here The user will be able to use all the following:

Layer 1: Reading

CSV files, Delimiter Files, Flat Files, Variable Length Delimiter files, Variable Length Flat Files

Layer 2: Aggregating

Many files into a few

Appending file name to every row of aggregated files

Layer 3: Threading

Run in single or multi thread mode

Each thread writing to a different HDFS file to increase write speed

Layer 4: Listening

Report progress to console

Layer 5: Compresing

Use Snappy, Gzip, or Bzip2

Layer 6: Writing

Sequence, Avro Files, Rc Files, or to HBase

###Route This allows you to make one or more directories pumps files into HDFS as you favorite splittable formates (sequence, avro, or rc) Like the "put" functionality the route logic is also layered.

Layer 1: Route

Event driven

Schedule driven

Layer 2: Put Threads

Define number of put threads in the thread pool

Layer 3: Put

Get all the functionality and options from the above put command

###Get hadoop fs -get is good but. What if you want to get a sequence, avro, or rc file? And what if you want to be able to read the results? Well then you can use these get methods to uncompress sequence, avro or rc files into text to your local drive.

###Out hadoop fs -text only goes so far this takes us to the next step by being able to output rc files and avro files in clear text. Click here for more information.

###Env

Converting a {key}|{field}|{value} env files to an avro file with a generated schema

Converting a multiple row type file to multiple avro files each having a generated schema

###NonSplittableGzip

Converts a non-splittable gzip file stored in hdfs to a sequence file of your choose of compression (snappy, gzip, bzip2)

###NonSplittableZip

Converts a non-splittable zip file stored in hdfs to a sequence file(s) of your choose of compression (snappy, gzip, bzip2). There will be a sequence file for every file in the original zip file.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.settings		.settings
examples		examples
src		src
.DS_Store		.DS_Store
.classpath		.classpath
.project		.project
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.cdh3.xml		pom.cdh3.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HadCom.utils

Overview

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tmalaska/hadcom.utils

Folders and files

Latest commit

History

Repository files navigation

HadCom.utils

Overview

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages