Introduction to Machine Learning
BY
DR. ANUPAM GHOSH                             DATE: 17.07.2023
Email: anupam.ghosh@rediffmail.com
LinkedIn Profile:
https://www.linkedin.com/in/anupam-ghosh-
1504273b/
Research Profile:
https://www.researchgate.net/profile/Anupa
m_Ghosh14
Association Rule Mining
Rule
X            Y
•   X implies Y
•   X is the cause and Y is the effect
•   X is antecedent and Y is Consequent
•   Y depends on X
      The Apriori Algorithm: Basics
   The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean
    association rules.
   Key Concepts :
   •Frequent Itemsets: The sets of item which has minimum support
   •Apriori Property: Any subset of frequent itemset must be frequent.
   •Join Operation: To find LK , a set of candidate k-itemsets is generated by joining Lk-1 with itself.
      The Apriori Algorithm in a Nutshell
   Find the frequent itemsets: the sets of items that have minimum support
   –A subset of a frequent itemset must also be a frequent itemset
•i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset
–Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
•Use the frequent itemsets to generate association rules
Hint:
C3:
{I1, I2, I3} = 2
{I1,I2,I5} =2
{I1,I2,I4} =1
{I1,I3,I5} = 1
{I1,I2,I3,I4} = 0
{I1,I2,I3,I5} = 1
{I1,I2,I4,I5} =0
{I2,I3,I4} = 0
{I2,I3,I5} =1
{I2,I4,I5} = 0
C4:
{I1,I2,I3,I5}=1
L4: Null
Hint:                               Hint:
{I1,I2,I3}                          {I1,I2,I5}
{I1},{I2},{I3}, {I1,I3}, {I1,I2},   {I1},{I2},{I5}, {I1,I5}, {I1,I2},
{I2,I3}, {I1,I2,I3}                 {I2,I5}, {I1,I2,I5}
{I1} -> {I2,I3}= 2/6=33.3%          {I1} -> {I2,I5}= 2/6=33.3%
{I2}-> {I1,I3} = 2/7=29%            {I2}-> {I1,I5} = 2/7=29%
{I3} -> {I1,I2}=2/6=33.3%           {I5} -> {I1,I2}=2/2=100%
{I1,I3}-> {I2}=2/4=50%              {I1,I5}-> {I2}=2/2=100%
{I1,I2}-> {I3}=2/4=50%              {I1,I2}-> {I5}=2/4=50%
{I2,I3}->{I1}=2/4=50%               {I2,I5}->{I1}=2/2=100%
{I1,I2,I3}->null                    {I1,I2,I5}->null
Conf: X->Y                          Conf: X->Y
Conf=                               Conf=
Supp(XUY)/Supp(X)                   Supp(XUY)/Supp(X)
      Methods to Improve Apriori’s Efficiency
   Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below
    the threshold cannot be frequent.
   •Transaction reduction: A transaction that does not contain any frequent k-itemsetis useless in
    subsequent scans.
   •Partitioning :Any itemset that is potentially frequent in DB must be frequent in at least one of
    the partitions of DB.
   •Sampling: mining on a subset of given data, lower support threshold + a method to determine
    the completeness.
   •Dynamic itemset counting: add new candidate itemsets only when all of their subsets are
    estimated to be frequent
            Support count threshold will
            be tuned from 2 to Ce[N/2]
                  N= no. of items;
            N=5; 2 to ce[5/2] = ┌2.5┐=3
                                            Assume
      C1   support                          Support
                                            Count =3
{a}         5
{b}         7
{c}         5                       L1     support
{d}         9                 {a}           5
{e}         6                 {b}           7
                              {c}           5
                              {d}           9
                              {e}           6
       C2=L1⋈L1
                   C2    support     L2    support
  L1   support
{a}     5        {a,b}   3         {a,b}   3
{b}     7        {a,c}   2         {a,d}   4
{c}     5        {a,d}   4         {a,e}   4
{d}     9        {a,e}   4         {b,c}   3
{e}     6        {b,c}   3         {b,d}   6
                 {b,d}   6         {b,e}   4
                 {b,e}   4         {c,d}   4
                 {c,d}   4         {d,e}   6
                 {c,e}   2
                 {d,e}   6
                    C3      support
                                        L3         support
  L2    support
                  {a,b,d}   2
                                      {a,d,e}      4
{a,b}   3         {a,b,e}   2
                                      {b,d,e}      4
{a,d}   4         {a,b,c}   1
{a,e}   4         {a,b,c,d} 0
{b,c}   3         {a,b,d,e} 2
{b,d}   6         {a,d,e}   4            C4        support
{b,e}   4         {a,c,d}   1
                                      {a,b,d,e} 2
{c,d}   4         {a,b,c,e} 0
{d,e}   6         {a,c,d,e} 1
                  {b,c,d}   2                 L4       Support
                  {b,c,e}   1            NULL
                  {b,c,d,e} 1
                  {b,d,e}   4
                  {c,d,e}   2