Skip to content

chewy6i/mapreduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

MapReduce tasks to process quotes

  • Checking in the 3 job types plus the experimental 4th type.
  • Testing folder contains quotes.txt, Authors.txt, words.txt and Authorwords.txt
  • Project built using Eclipse (Juno) and JDK 1.6.0_37 with Maven Note: Fix the JDK path to fix any pom.xml error complaining of missing tools.jar
  • Hadoop 1.1.1 used to build and test the code which is > 0.20
  • Uses MRv1

#1 AuthorQuotes.java this is the main class which invokes the various jobs

There are 3 job types By Author, By Word , By Author By Word

#2 AggregationByAuthor.java By Author : each author is the key and value is count mapper emits authorname, 1 combiner emits authorname, count reducer emits authorname, finalcount Output: Agatha Christie 1 Alan Bleasdale 1 Albert Camus 1 Albert Einstein 1

#3 AggregationByWord.java By Word : each word (length more than 4 with sanitization) is the keya and value is count mapper emits word, 1 combiner emits word, count reducer emits word, finalcount Output: Advice 1 After 1 Alas 1 America 1 Twenty Nine 2

#4 AggregationByAuthorWord.java By Author By Word: each author is the key and value is word list mapper emits authorname, word combiner emits authorname, wordlist {word:count.....} reducer emits authorname, wordlist {word:count.....}

Output: Agatha Christie more:1,have:1,older:1,woman:1,gets:1,archaeologist:1,best:1,husband:1,interested:1, Alan Bleasdale body:1,parallel:1,heart:1,exercise:1,study:1,resist:1,must:2,love:1,vigor:1,frigidity:1,combine:1,mind:1,keep:1,these:1, Albert Camus after:1,face:1,Alas:1,responsible:1,certain:1,every:1,

#5 AggregationAuthorWordEx.java Experimental class with custom key. goal is to use single parsing step to get both authors and words. difficulty in getting the authorword (3rd job type).Shows usage of custom keys.

#6 AuthorWordTuple.java custom key class to use single parsing step to get authors and words. Used by the AggregationAuthorWordEx.java (see #5 above).

=======================

About

mapreduce sample code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages