Weantuday: T Deuhh Anytha
Weantuday: T Deuhh Anytha
WeAntuday
                              DICITAAL             ASSIGTNMENT 1
  PARTA
On      P o a Ch
                                             andue          daa
                 daa               to        ADnd             )           uc
                                                                  umtur
                                                                                     au
                       Pott             nd    T n_ol
  VCCCiv    t
                                                                           dasa Ming
  uneuumty
                              YaHonLhsA                                              a u TS
                                                              hut        data
  w       nid        to            TYaA            AM                                       ad
                                                             hyttumu             t un
                                                                                 tuws
                                             pAtchls
   m to         cuHHret
  mlo dy         n
                                               t        deuHH             anythA
  nttniun                 )             uuLual
                                               lakHGNI
                     MAPPJ
                                                                          M      LIe
                                                            and      MAPtA             mioo
                              PRLu'b                    t         op             ioA
   u                                                         MAhodu       thas       unugrau
  data
                                                     Aud'e          ad         Viu      da
                 untnrowten
                                                                  paw     t0         oLaa
  w
                          vdla           to
                                                   bin
                                                        COMhlnUy
                                                                                 daa
                                                                    On
                wth           H                OLn          oYMAH
  to0
  PA    AAU
                     to           OUAJ                      MA
                                  UnvawA
                                                            furths 7     ouAlopMAS
 wich           alAD
                                                                  a CC                 ao
        MMAd
                                   dara
  aMoni Yaun
MUH- UMNU ONO                            w
                                                                  oloLAu
NAME : SAI SANDHYA S
REG NO: 19MIS0232
SLOT: B1 + TB1
                Part B – Question 1
1. Using a programming language that you are
familiar with, such as C++ or Java, implement three
frequent itemset mining algorithms in: (1) Apriori
[AS94b], (2) FPgrowth [HPY00], and (3) Eclat
[Zak00] (mining using the vertical data
format).Compare the performance of each
algorithm with various kinds of large data sets.Write
a report to analyze the situations (e.g., data size,
data distribution, minimal support threshold setting,
and pattern density) where one algorithm may
perform better than the others, and state why.
                                        Apriori algorithm
Name: Sai Sandhya S
Reg no: 19MIS0232
You are given the transaction data shown in the Table below from a fast-food restaurant. There are
9 distinct transactions (order: 1 – order: 9) and each transaction involves between 2 and 4 meal
items. There are a total of 5 meal items that are involved in the transactions. For simplicity we
assign the meal items short names (M1 – M5) rather than the full descriptive names.
For all of the parts below the minimum support is 2/9 and the minimum confidence is 7/9.
Apply the Apriori algorithm to the dataset of transactions and identify all frequent k itemset. Show
all of your work.
Code:
def load_data_set():
  data_set = [['M1', 'M2', 'M5'], ['M2', 'M4'], ['M2', 'M3'], ['M1', 'M2', 'M4'], ['M1', 'M3'], ['M2', 'M3'],
['M1','M3'], ['M1', 'M2', 'M3', 'M5'], ['M1', 'M2']]
  return data_set
def create_C1(data_set):
  C1 = set()
  for t in data_set:
     for item in t:
       item_set = frozenset([item])
       C1.add(item_set)
  return C1
  t_num = float(len(data_set))
  for item in item_count:
    if (item_count[item] / t_num) >= min_support:
        Lk.add(item)
        support_data[item] = item_count[item] / t_num
  return Lk
if __name__ == "__main__":
   data_set = load_data_set()
   L, support_data = generate_L(data_set, k = 3, min_support=0.222)
   big_rules_list = generate_big_rules(L, support_data, min_conf=0.555)
  for Lk in L:
    print("="*50)
    print("frequent " + str(len(list(Lk)[0])) + "-itemsets\t\tsupport")
    print("="*50)
print()
print("Big Rules")
for item in big_rules_list:
  print(item[0], "=>", item[1], "conf: ", item[2])
Powered by TCPDF (www.tcpdf.org)
4/27/22, 11:35 PM                          Untitled4 - Jupyter Notebook
localhost:8889/notebooks/Untitled4.ipynb                                  1/9
4/27/22, 11:35 PM                                       FP Growth - Jupyter Notebook
localhost:8889/notebooks/Untitled4.ipynb                                                                2/9
4/27/22, 11:35 PM                                                       FP Growth - Jupyter Notebook
                        # dataset
                        dataset = pd.read_csv("Market_Basket_Optimisation.csv")
         Out[18]:
                                                                                         whole
                                                                 vegetables     green                    cottage   energy   tomato    low fat
                              shrimp       almonds     avocado                            weat    yams
                                                                       mix     grapes                    cheese     drink     juice   yogurt
                                                                                          flour
0 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN NaN
1 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN NaN
                               low fat
                         4                     NaN        NaN          NaN       NaN      NaN     NaN       NaN      NaN      NaN       NaN
                               yogurt
localhost:8889/notebooks/Untitled4.ipynb                                                                                                   3/9
4/27/22, 11:35 PM                                               FP Growth - Jupyter Notebook
                        # Put 1 to Each Item For Making Countable Table, to be able to perform Group By
                        df["incident_count"] = 1
                        # Initial Visualizations
                        df_table.head(5).style.background_gradient(cmap='Blues')
         Out[20]:
                                     items     incident_count
1 eggs 1348
2 spaghetti 1306
4 chocolate 1230
localhost:8889/notebooks/Untitled4.ipynb                                                                  4/9
4/27/22, 11:35 PM                                             FP Growth - Jupyter Notebook
                                           Top 50 items
                                           mineral water   french fries         milk                burgers   low fat yogurt      sh
                                                                                                              turkey             coo
                                                                                                    cake
                                                                                ground beef
                                                           chocolate
                                           eggs
                                                                                                              chicken            sou
                                                                                                    cookies
                                                                                frozen vegetables
localhost:8889/notebooks/Untitled4.ipynb                                                                                            5/9
4/27/22, 11:35 PM                                                    FP Growth - Jupyter Notebook
         In [22]: # Transform Every Transaction to Seperate List & Gather Them into Numpy Array
                  transaction = []
                  for i in range(dataset.shape[0]):
                      transaction.append([str(dataset.values[i,j]) for j in range(dataset.shape[1])
         Out[22]:
                                                     antioxydant                             babies           barbecue   black
                             asparagus     almonds                  asparagus    avocado              bacon                      bluebe
                                                            juice                              food              sauce     tea
                        # Extract Top 30
                        dataset = dataset.loc[:,first30]
localhost:8889/notebooks/Untitled4.ipynb                                                                                             6/9
4/27/22, 11:35 PM                                                Untitled4 - Jupyter Notebook
                        # printing top 10
                        res.head(10)
          Out[9]:
                              support               itemsets
0 0.179733 (eggs)
1 0.087200 (burgers)
2 0.062533 (turkey)
5 0.129600 (milk)
9 0.050533 (soup)
localhost:8889/notebooks/Untitled4.ipynb                                                        7/9
4/27/22, 11:35 PM                                                     Untitled4 - Jupyter Notebook
         Out[10]:
                                                              antecedent   consequent
                             antecedents      consequents                                   support   confidence        lift   leverage   c
                                                                 support      support
                                   (mineral
                         0                          (eggs)      0.238267     0.179733      0.050933     0.213766   1.189351    0.008109
                                     water)
                                                  (mineral
                         1           (eggs)                     0.179733     0.238267      0.050933     0.283383   1.189351    0.008109
                                                    water)
                                   (mineral
                         2                      (spaghetti)     0.238267     0.174133      0.059733     0.250699   1.439698    0.018243
                                     water)
                                                  (mineral
                         3      (spaghetti)                     0.174133     0.238267      0.059733     0.343032   1.439698    0.018243
                                                    water)
                                   (mineral
                         4                      (chocolate)     0.238267     0.163867      0.052667     0.221041   1.348907    0.013623
                                     water)
                                                  (mineral
                         5     (chocolate)                      0.163867     0.238267      0.052667     0.321400   1.348907    0.013623
                                                    water)
         Out[11]:
                                                              antecedent   consequent
                             antecedents      consequents                                   support   confidence        lift   leverage   c
                                                                 support      support
                                                  (mineral
                         3      (spaghetti)                     0.174133     0.238267      0.059733     0.343032   1.439698    0.018243
                                                    water)
                                                  (mineral
                         5     (chocolate)                      0.163867     0.238267      0.052667     0.321400   1.348907    0.013623
                                                    water)
                                                  (mineral
                         1           (eggs)                     0.179733     0.238267      0.050933     0.283383   1.189351    0.008109
                                                    water)
                                   (mineral
                         2                      (spaghetti)     0.238267     0.174133      0.059733     0.250699   1.439698    0.018243
                                     water)
                                   (mineral
                         4                      (chocolate)     0.238267     0.163867      0.052667     0.221041   1.348907    0.013623
                                     water)
                                   (mineral
                         0                          (eggs)      0.238267     0.179733      0.050933     0.213766   1.189351    0.008109
                                     water)
In [ ]:
localhost:8889/notebooks/Untitled4.ipynb                                                                                                  8/9
4/27/22, 11:35 PM                          Untitled4 - Jupyter Notebook
localhost:8889/notebooks/Untitled4.ipynb                                  9/9
4/27/22, 11:35 PM                                       ECLAT - Jupyter Notebook
Collecting pyECLAT
localhost:8889/notebooks/Untitled2.ipynb                                                                  1/5
4/27/22, 11:35 PM                                                        ECLAT - Jupyter Notebook
Out[2]: 0 1 2 3 4 5 6
0 shrimp almonds avocado vegetables mix green grapes whole weat flour yams
4 mineral water milk energy bar whole wheat rice green tea NaN NaN
<class 'pandas.core.frame.DataFrame'>
dtypes: object(7)
localhost:8889/notebooks/Untitled2.ipynb                                                                                            2/5
4/27/22, 11:35 PM                                                            Untitled2 - Jupyter Notebook
          Out[4]:
                                  mashed                                              light    fresh             energy    olive
                                                 pickles    cereals    spaghetti                        cider                       burgers    ...   milk    s
                                   potato                                           cream       tuna              drink       oil
0 0 0 0 0 0 0 0 0 0 0 ... 0
1 0 0 0 0 0 0 0 0 0 1 ... 0
2 0 0 0 0 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 0
4 0 0 0 0 0 0 0 0 0 0 ... 1
... ... ... ... ... ... ... ... ... ... ... ... ...
2996 0 0 0 0 0 0 0 0 0 0 ... 0
2997 0 0 0 0 0 0 0 0 0 0 ... 0
2998 0 0 0 0 0 0 0 0 0 0 ... 0
2999 0 0 0 0 0 0 0 0 0 1 ... 0
3000 0 0 0 0 0 0 0 0 0 0 ... 0
pickles 17
cereals 54
spaghetti 549
light cream 50
...
ham 83
water spray 3
clothes accessories 16
localhost:8889/notebooks/Untitled2.ipynb                                                                                                                     3/5
4/27/22, 11:35 PM                                                Untitled2 - Jupyter Notebook
Out[6]: 0 7
1 3
2 1
3 2
4 5
..
2996 1
2997 2
2998 3
2999 7
3000 5
3 spaghetti 549
84 eggs 532
55 chocolate 485
localhost:8889/notebooks/Untitled2.ipynb                                                         4/5
4/27/22, 11:35 PM                                                   Untitled2 - Jupyter Notebook
Combination 2 by 2
Combination 3 by 3
Combination 4 by 4
Combination 5 by 5
Combination 6 by 6
Combination 7 by 7
localhost:8889/notebooks/Untitled2.ipynb                                                           5/5
Conclusion:
It is concluded that APRIORI algorithm is the
fastest algorithm for large dataset and FP-
GROWTH algorithm are the fastest algorithms for
small dataset.