GitHub - dario-ramos/apriori: Naive implementation of the Apriori data mining algorithm in Java. Nevertheless, it can process millions of records in seconds.

dario-ramos / apriori Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Naive implementation of the Apriori data mining algorithm in Java. Nevertheless, it can process millions of records in seconds.

GPL-2.0 license

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
doc		doc
nbproject		nbproject
src/apriori		src/apriori
test_data		test_data
LICENSE		LICENSE
apache-commons-lang.jar		apache-commons-lang.jar
build.xml		build.xml
jorphan-2.9.jar		jorphan-2.9.jar
manifest.mf		manifest.mf
output.txt		output.txt
readme.txt		readme.txt

Repository files navigation

Algorithm summary
-----------------

1. First pass: For each possible item value, count their support in the transaction set.
               Let this be C1, the first candidate set, in which all itemset are size 1.
2. N-th pass: While Ck, the k-size candidate set, is not empty:
    2.1. Generate the candidate set Ck from Ck-1 using the aprioriGen function.
        2.1.1. Join step: From the cartesian product Ck-1 x Ck-1,
                          if the left hand itemset's last item is smaller than
                          the right hand itemset's last item, create 
                          an itemset by adding the right hand itemset's
                          last item to the left hand itemset.
                          Add that itemset to the candidate set.
        2.1.2. Prune step: For each candidate itemset in Ck, get all the k-1
                           subsets. If any of those subsets is not contained in
                           Ck-1, remove the candidate from Ck.
    2.2. Count the support for each candidate in Ck.
    2.3. Remove those candidates with support below the given minimum.
    
Input file format
-----------------

-Each row is a transaction.
-Items are separated by commas.
-The last row does not end with a newline.