0% found this document useful (0 votes)
92 views23 pages

Weantuday: T Deuhh Anytha

The document implements the FP-Growth algorithm to perform frequent itemset mining on a transactional dataset containing customer purchases from a fast food restaurant. The code loads the dataset, processes it using FP-Growth to find frequent itemsets that meet a minimum support threshold, and outputs the results. The dataset has 9 transactions containing purchases of 5 different food items. The code applies FP-Growth to find all itemsets with a minimum support of 2/9 transactions and outputs the frequent itemsets and their support levels.

Uploaded by

SRI GANESH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views23 pages

Weantuday: T Deuhh Anytha

The document implements the FP-Growth algorithm to perform frequent itemset mining on a transactional dataset containing customer purchases from a fast food restaurant. The code loads the dataset, processes it using FP-Growth to find frequent itemsets that meet a minimum support threshold, and outputs the results. The dataset has 9 transactions containing purchases of 5 different food items. The code applies FP-Growth to find all itemsets with a minimum support of 2/9 transactions and outputs the frequent itemsets and their support levels.

Uploaded by

SRI GANESH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

SAT SANDFYA

Apil Swt 2 0 09- DMT


MIS O2 32

WeAntuday
DICITAAL ASSIGTNMENT 1
PARTA

On P o a Ch
andue daa

daa to ADnd ) uc
umtur
au
Pott nd T n_ol
VCCCiv t
dasa Ming
uneuumty
YaHonLhsA a u TS
hut data
w nid to TYaA AM ad
hyttumu t un
tuws
pAtchls
m to cuHHret
mlo dy n
t deuHH anythA
nttniun ) uuLual

lakHGNI
MAPPJ
M LIe
and MAPtA mioo

PRLu'b t op ioA
u MAhodu thas unugrau
data

Aud'e ad Viu da
untnrowten
paw t0 oLaa
w
vdla to
bin
COMhlnUy
daa
On
wth H OLn oYMAH
to0
PA AAU
to OUAJ MA
UnvawA
furths 7 ouAlopMAS
wich alAD
a CC ao
MMAd

dara
aMoni Yaun
MUH- UMNU ONO w
oloLAu
NAME : SAI SANDHYA S
REG NO: 19MIS0232
SLOT: B1 + TB1
Part B – Question 1
1. Using a programming language that you are
familiar with, such as C++ or Java, implement three
frequent itemset mining algorithms in: (1) Apriori
[AS94b], (2) FPgrowth [HPY00], and (3) Eclat
[Zak00] (mining using the vertical data
format).Compare the performance of each
algorithm with various kinds of large data sets.Write
a report to analyze the situations (e.g., data size,
data distribution, minimal support threshold setting,
and pattern density) where one algorithm may
perform better than the others, and state why.
Apriori algorithm
Name: Sai Sandhya S
Reg no: 19MIS0232
You are given the transaction data shown in the Table below from a fast-food restaurant. There are
9 distinct transactions (order: 1 – order: 9) and each transaction involves between 2 and 4 meal
items. There are a total of 5 meal items that are involved in the transactions. For simplicity we
assign the meal items short names (M1 – M5) rather than the full descriptive names.

For all of the parts below the minimum support is 2/9 and the minimum confidence is 7/9.
Apply the Apriori algorithm to the dataset of transactions and identify all frequent k itemset. Show
all of your work.
Code:

def load_data_set():
data_set = [['M1', 'M2', 'M5'], ['M2', 'M4'], ['M2', 'M3'], ['M1', 'M2', 'M4'], ['M1', 'M3'], ['M2', 'M3'],
['M1','M3'], ['M1', 'M2', 'M3', 'M5'], ['M1', 'M2']]
return data_set

def create_C1(data_set):
C1 = set()
for t in data_set:
for item in t:
item_set = frozenset([item])
C1.add(item_set)
return C1

def is_apriori(Ck_item, Lksub1):


for item in Ck_item:
sub_Ck = Ck_item - frozenset([item])
if sub_Ck not in Lksub1:
return False
return True

def create_Ck(Lksub1, k):


Ck = set()
len_Lksub1 = len(Lksub1)
list_Lksub1 = list(Lksub1)
for i in range(len_Lksub1):
for j in range(1, len_Lksub1):
l1 = list(list_Lksub1[i])
l2 = list(list_Lksub1[j])
l1.sort()
l2.sort()
if l1[0:k-2] == l2[0:k-2]:
Ck_item = list_Lksub1[i] | list_Lksub1[j]
# pruning
if is_apriori(Ck_item, Lksub1):
Ck.add(Ck_item)
return Ck

def generate_Lk_by_Ck(data_set, Ck, min_support, support_data):


Lk = set()
item_count = {}
for t in data_set:
for item in Ck:
if item.issubset(t):
if item not in item_count:
item_count[item] = 1
else:
item_count[item] += 1

t_num = float(len(data_set))
for item in item_count:
if (item_count[item] / t_num) >= min_support:
Lk.add(item)
support_data[item] = item_count[item] / t_num
return Lk

def generate_L(data_set, k, min_support):


support_data = {}
C1 = create_C1(data_set)
L1 = generate_Lk_by_Ck(data_set, C1, min_support, support_data)
Lksub1 = L1.copy()
L = []
L.append(Lksub1)
for i in range(2, k+1):
Ci = create_Ck(Lksub1, i)
Li = generate_Lk_by_Ck(data_set, Ci, min_support, support_data)
Lksub1 = Li.copy()
L.append(Lksub1)
return L, support_data
def generate_big_rules(L, support_data, min_conf):
big_rule_list = []
sub_set_list = []
for i in range(0, len(L)):
for freq_set in L[i]:
for sub_set in sub_set_list:
if sub_set.issubset(freq_set):
conf = support_data[freq_set] / support_data[freq_set - sub_set]
big_rule = (freq_set - sub_set, sub_set, conf)

if conf >= min_conf and big_rule not in big_rule_list:


# print freq_set-sub_set, " => ", sub_set, "conf: ", conf
big_rule_list.append(big_rule)
sub_set_list.append(freq_set)
return big_rule_list

if __name__ == "__main__":
data_set = load_data_set()
L, support_data = generate_L(data_set, k = 3, min_support=0.222)
big_rules_list = generate_big_rules(L, support_data, min_conf=0.555)

for Lk in L:
print("="*50)
print("frequent " + str(len(list(Lk)[0])) + "-itemsets\t\tsupport")
print("="*50)

for freq_set in Lk:


print(freq_set, support_data[freq_set])

print()
print("Big Rules")
for item in big_rules_list:
print(item[0], "=>", item[1], "conf: ", item[2])
Powered by TCPDF (www.tcpdf.org)
4/27/22, 11:35 PM Untitled4 - Jupyter Notebook

localhost:8889/notebooks/Untitled4.ipynb 1/9
4/27/22, 11:35 PM FP Growth - Jupyter Notebook

In [16]: %pip install pandas


%pip install numpy
%pip install plotly
%pip install mlxtend

Requirement already satisfied: pandas in c:\users\lenovo\anaconda3\lib\site-p


ackages (1.3.4)

Requirement already satisfied: pytz>=2017.3 in c:\users\lenovo\anaconda3\lib


\site-packages (from pandas) (2021.3)

Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\lenovo\anac


onda3\lib\site-packages (from pandas) (2.8.2)

Requirement already satisfied: numpy>=1.17.3 in c:\users\lenovo\anaconda3\lib


\site-packages (from pandas) (1.20.3)

Requirement already satisfied: six>=1.5 in c:\users\lenovo\anaconda3\lib\site


-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: numpy in c:\users\lenovo\anaconda3\lib\site-pa
ckages (1.20.3)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: plotly in c:\users\lenovo\anaconda3\lib\site-p
ackages (5.6.0)
Requirement already satisfied: six in c:\users\lenovo\anaconda3\lib\site-pack
ages (from plotly) (1.16.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\lenovo\anaconda3\l
ib\site-packages (from plotly) (8.0.1)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: mlxtend in c:\users\lenovo\anaconda3\lib\site-
packages (0.19.0)Note: you may need to restart the kernel to use updated pack
ages.
Requirement already satisfied: matplotlib>=3.0.0 in c:\users\lenovo\anaconda3
\lib\site-packages (from mlxtend) (3.4.3)
Requirement already satisfied: scikit-learn>=0.20.3 in c:\users\lenovo\anacon
da3\lib\site-packages (from mlxtend) (1.0.2)
Requirement already satisfied: pandas>=0.24.2 in c:\users\lenovo\anaconda3\li
b\site-packages (from mlxtend) (1.3.4)
Requirement already satisfied: numpy>=1.16.2 in c:\users\lenovo\anaconda3\lib
\site-packages (from mlxtend) (1.20.3)
Requirement already satisfied: joblib>=0.13.2 in c:\users\lenovo\anaconda3\li
b\site-packages (from mlxtend) (1.1.0)
Requirement already satisfied: setuptools in c:\users\lenovo\anaconda3\lib\si
te-packages (from mlxtend) (58.0.4)
Requirement already satisfied: scipy>=1.2.1 in c:\users\lenovo\anaconda3\lib
\site-packages (from mlxtend) (1.7.1)
Requirement already satisfied: pyparsing>=2.2.1 in c:\users\lenovo\anaconda3
\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (3.0.4)
Requirement already satisfied: cycler>=0.10 in c:\users\lenovo\anaconda3\lib
\site-packages (from matplotlib>=3.0.0->mlxtend) (0.10.0)
Requirement already satisfied: pillow>=6.2.0 in c:\users\lenovo\anaconda3\lib
\site-packages (from matplotlib>=3.0.0->mlxtend) (8.4.0)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\lenovo\anacon
da3\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\lenovo\anaconda3
\lib\site-packages (from matplotlib>=3.0.0->mlxtend) (1.3.1)
Requirement already satisfied: six in c:\users\lenovo\anaconda3\lib\site-pack
ages (from cycler>=0.10->matplotlib>=3.0.0->mlxtend) (1.16.0)
Requirement already satisfied: pytz>=2017.3 in c:\users\lenovo\anaconda3\lib

localhost:8889/notebooks/Untitled4.ipynb 2/9
4/27/22, 11:35 PM FP Growth - Jupyter Notebook

\site-packages (from pandas>=0.24.2->mlxtend) (2021.3)

Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\lenovo\anacon


da3\lib\site-packages (from scikit-learn>=0.20.3->mlxtend) (2.2.0)

In [17]: # importing module


import pandas as pd

# dataset
dataset = pd.read_csv("Market_Basket_Optimisation.csv")

# printing the shape of the dataset


dataset.shape

Out[17]: (7500, 20)

In [18]: # printing the columns and few rows using head


dataset.head()

Out[18]:
whole
vegetables green cottage energy tomato low fat
shrimp almonds avocado weat yams
mix grapes cheese drink juice yogurt
flour

0 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN NaN

1 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

2 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN NaN

mineral energy whole green


3 milk NaN NaN NaN NaN NaN NaN
water bar wheat rice tea

low fat
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
yogurt

In [19]: # importing module


import numpy as np

# Gather All Items of Each Transactions into Numpy Array


transaction = []
for i in range(0, dataset.shape[0]):
for j in range(0, dataset.shape[1]):
transaction.append(dataset.values[i,j])

# converting to numpy array


transaction = np.array(transaction)
print(transaction)

['burgers' 'meatballs' 'eggs' ... 'nan' 'nan' 'nan']

localhost:8889/notebooks/Untitled4.ipynb 3/9
4/27/22, 11:35 PM FP Growth - Jupyter Notebook

In [20]: # Transform Them a Pandas DataFrame


df = pd.DataFrame(transaction, columns=["items"])

# Put 1 to Each Item For Making Countable Table, to be able to perform Group By
df["incident_count"] = 1

# Delete NaN Items from Dataset


indexNames = df[df['items'] == "nan" ].index
df.drop(indexNames , inplace=True)

# Making a New Appropriate Pandas DataFrame for Visualizations


df_table = df.groupby("items").sum().sort_values("incident_count", ascending=Fals

# Initial Visualizations
df_table.head(5).style.background_gradient(cmap='Blues')

Out[20]:
items incident_count

0 mineral water 1787

1 eggs 1348

2 spaghetti 1306

3 french fries 1282

4 chocolate 1230

localhost:8889/notebooks/Untitled4.ipynb 4/9
4/27/22, 11:35 PM FP Growth - Jupyter Notebook

In [21]: # importing required module


import plotly.express as px

# to have a same origin


df_table["all"] = "Top 50 items"

# creating tree map using plotly


fig = px.treemap(df_table.head(50), path=['all', "items"], values='incident_count
color=df_table["incident_count"].head(50), hover_data=['items']
color_continuous_scale='Blues',
)
# ploting the treemap
fig.show()

Top 50 items
mineral water french fries milk burgers low fat yogurt sh

turkey coo
cake
ground beef
chocolate
eggs
chicken sou

cookies
frozen vegetables

whole wheat rice herb

localhost:8889/notebooks/Untitled4.ipynb 5/9
4/27/22, 11:35 PM FP Growth - Jupyter Notebook

In [22]: # Transform Every Transaction to Seperate List & Gather Them into Numpy Array
transaction = []
for i in range(dataset.shape[0]):
transaction.append([str(dataset.values[i,j]) for j in range(dataset.shape[1])

# creating the numpy array of the transactions


transaction = np.array(transaction)

# importing the required module


from mlxtend.preprocessing import TransactionEncoder

# initializing the transactionEncoder


te = TransactionEncoder()
te_ary = te.fit(transaction).transform(transaction)
dataset = pd.DataFrame(te_ary, columns=te.columns_)

# dataset after encoded


dataset.head()

Out[22]:
antioxydant babies barbecue black
asparagus almonds asparagus avocado bacon bluebe
juice food sauce tea

0 False False False False False False False False False F

1 False False False False False False False False False F

2 False False False False True False False False False F

3 False False False False False False False False False F

4 False False False False False False False False False F

5 rows × 121 columns

In [8]: # select top 30 items


first30 = df_table["items"].head(30).values

# Extract Top 30
dataset = dataset.loc[:,first30]

# shape of the dataset


dataset.shape

Out[8]: (7500, 30)

localhost:8889/notebooks/Untitled4.ipynb 6/9
4/27/22, 11:35 PM Untitled4 - Jupyter Notebook

In [9]: #Importing Libraries


from mlxtend.frequent_patterns import fpgrowth

#running the fpgrowth algorithm


res=fpgrowth(dataset,min_support=0.05, use_colnames=True)

# printing top 10
res.head(10)

Out[9]:
support itemsets

0 0.179733 (eggs)

1 0.087200 (burgers)

2 0.062533 (turkey)

3 0.238267 (mineral water)

4 0.132000 (green tea)

5 0.129600 (milk)

6 0.058533 (whole wheat rice)

7 0.076400 (low fat yogurt)

8 0.170933 (french fries)

9 0.050533 (soup)

localhost:8889/notebooks/Untitled4.ipynb 7/9
4/27/22, 11:35 PM Untitled4 - Jupyter Notebook

In [10]: # importing required module


from mlxtend.frequent_patterns import association_rules

# creating asssociation rules


res=association_rules(res, metric="lift", min_threshold=1)

# printing association rules


res

Out[10]:
antecedent consequent
antecedents consequents support confidence lift leverage c
support support

(mineral
0 (eggs) 0.238267 0.179733 0.050933 0.213766 1.189351 0.008109
water)

(mineral
1 (eggs) 0.179733 0.238267 0.050933 0.283383 1.189351 0.008109
water)

(mineral
2 (spaghetti) 0.238267 0.174133 0.059733 0.250699 1.439698 0.018243
water)

(mineral
3 (spaghetti) 0.174133 0.238267 0.059733 0.343032 1.439698 0.018243
water)

(mineral
4 (chocolate) 0.238267 0.163867 0.052667 0.221041 1.348907 0.013623
water)

(mineral
5 (chocolate) 0.163867 0.238267 0.052667 0.321400 1.348907 0.013623
water)

In [11]: # Sort values based on confidence


res.sort_values("confidence",ascending=False)

Out[11]:
antecedent consequent
antecedents consequents support confidence lift leverage c
support support

(mineral
3 (spaghetti) 0.174133 0.238267 0.059733 0.343032 1.439698 0.018243
water)

(mineral
5 (chocolate) 0.163867 0.238267 0.052667 0.321400 1.348907 0.013623
water)

(mineral
1 (eggs) 0.179733 0.238267 0.050933 0.283383 1.189351 0.008109
water)

(mineral
2 (spaghetti) 0.238267 0.174133 0.059733 0.250699 1.439698 0.018243
water)

(mineral
4 (chocolate) 0.238267 0.163867 0.052667 0.221041 1.348907 0.013623
water)

(mineral
0 (eggs) 0.238267 0.179733 0.050933 0.213766 1.189351 0.008109
water)

In [ ]:

localhost:8889/notebooks/Untitled4.ipynb 8/9
4/27/22, 11:35 PM Untitled4 - Jupyter Notebook

localhost:8889/notebooks/Untitled4.ipynb 9/9
4/27/22, 11:35 PM ECLAT - Jupyter Notebook

In [1]: %pip install pyECLAT


%pip install numpy
%pip install pandas
%pip install plotly

Collecting pyECLAT

Downloading pyECLAT-1.0.2-py3-none-any.whl (6.3 kB)

Requirement already satisfied: pandas>=0.25.3 in /usr/local/lib/python3.7/dist-


packages (from pyECLAT) (1.3.5)

Requirement already satisfied: numpy>=1.17.4 in /usr/local/lib/python3.7/dist-p


ackages (from pyECLAT) (1.21.6)

Requirement already satisfied: tqdm>=4.41.1 in /usr/local/lib/python3.7/dist-pa


ckages (from pyECLAT) (4.64.0)

Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python


3.7/dist-packages (from pandas>=0.25.3->pyECLAT) (2.8.2)

Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-pa


ckages (from pandas>=0.25.3->pyECLAT) (2022.1)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packag


es (from python-dateutil>=2.7.3->pandas>=0.25.3->pyECLAT) (1.15.0)

Installing collected packages: pyECLAT

Successfully installed pyECLAT-1.0.2

Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages


(1.21.6)

Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages


(1.3.5)

Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-p


ackages (from pandas) (1.21.6)

Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-pa


ckages (from pandas) (2022.1)

Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python


3.7/dist-packages (from pandas) (2.8.2)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packag


es (from python-dateutil>=2.7.3->pandas) (1.15.0)

Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages


(5.5.0)

Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/dist


-packages (from plotly) (8.0.1)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (f
rom plotly) (1.15.0)

localhost:8889/notebooks/Untitled2.ipynb 1/5
4/27/22, 11:35 PM ECLAT - Jupyter Notebook

In [2]: # importing dataset ( example 1 and example 2 are datasets in pyECLAT)


from pyECLAT import Example2

# storing the dataset in a variable


dataset = Example2().get()

# printing the dataset


dataset.head()

Out[2]: 0 1 2 3 4 5 6

0 shrimp almonds avocado vegetables mix green grapes whole weat flour yams

1 burgers meatballs eggs NaN NaN NaN NaN

2 chutney NaN NaN NaN NaN NaN NaN

3 turkey avocado NaN NaN NaN NaN NaN

4 mineral water milk energy bar whole wheat rice green tea NaN NaN

In [3]: # printing the info


dataset.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3001 entries, 0 to 3000

Data columns (total 7 columns):

# Column Non-Null Count Dtype


--- ------ -------------- -----
0 0 3001 non-null object

1 1 2315 non-null object

2 2 1774 non-null object

3 3 1374 non-null object

4 4 1048 non-null object

5 5 775 non-null object

6 6 581 non-null object

dtypes: object(7)

memory usage: 164.2+ KB

localhost:8889/notebooks/Untitled2.ipynb 2/5
4/27/22, 11:35 PM Untitled2 - Jupyter Notebook

In [4]: # importing the ECLAT module


from pyECLAT import ECLAT

# loading transactions DataFrame to ECLAT class
eclat = ECLAT(data=dataset)

# DataFrame of binary values
eclat.df_bin

Out[4]:
mashed light fresh energy olive
pickles cereals spaghetti cider burgers ... milk s
potato cream tuna drink oil

0 0 0 0 0 0 0 0 0 0 0 ... 0

1 0 0 0 0 0 0 0 0 0 1 ... 0

2 0 0 0 0 0 0 0 0 0 0 ... 0

3 0 0 0 0 0 0 0 0 0 0 ... 0

4 0 0 0 0 0 0 0 0 0 0 ... 1

... ... ... ... ... ... ... ... ... ... ... ... ...

2996 0 0 0 0 0 0 0 0 0 0 ... 0

2997 0 0 0 0 0 0 0 0 0 0 ... 0

2998 0 0 0 0 0 0 0 0 0 0 ... 0

2999 0 0 0 0 0 0 0 0 0 1 ... 0

3000 0 0 0 0 0 0 0 0 0 0 ... 0

3001 rows × 119 columns

In [5]: # count items in each column


items_total = eclat.df_bin.astype(int).sum(axis=0)

items_total

Out[5]: mashed potato 10

pickles 17

cereals 54

spaghetti 549

light cream 50

...

low fat yogurt 170

ham 83

water spray 3

clothes accessories 16

extra dark chocolate 31

Length: 119, dtype: int64

localhost:8889/notebooks/Untitled2.ipynb 3/5
4/27/22, 11:35 PM Untitled2 - Jupyter Notebook

In [6]: # count items in each row


items_per_transaction = eclat.df_bin.astype(int).sum(axis=1)

items_per_transaction

Out[6]: 0 7

1 3

2 1

3 2

4 5

..

2996 1

2997 2

2998 3

2999 7

3000 5

Length: 3001, dtype: int64

In [7]: import pandas as pd



# Loading items per column stats to the DataFrame
df = pd.DataFrame({'items': items_total.index, 'transactions': items_total.values

# cloning pandas DataFrame for visualization purpose
df_table = df.sort_values("transactions", ascending=False)

# Top 5 most popular products/items
df_table.head(5).style.background_gradient(cmap='Blues')

Out[7]:   items transactions

96 mineral water 711

3 spaghetti 549

84 eggs 532

55 chocolate 485

74 french fries 463

In [8]: # importing required module


import plotly.express as px

# to have a same origin
df_table["all"] = "Tree Map"

# creating tree map using plotly
fig = px.treemap(df_table.head(50), path=['all', "items"], values='transactions',
color=df_table["transactions"].head(50), hover_data=['items'],
color_continuous_scale='Blues',
)
# ploting the treemap
fig.show()

localhost:8889/notebooks/Untitled2.ipynb 4/5
4/27/22, 11:35 PM Untitled2 - Jupyter Notebook

In [9]: # the item shoud appear at least at 5% of transactions


min_support = 5/100

# start from transactions containing at least 2 items
min_combination = 2

# up to maximum items per transaction
max_combination = max(items_per_transaction)

rule_indices, rule_supports = eclat.fit(min_support=min_support,
min_combination=min_combination,
max_combination=max_combination,
separator=' & ',
verbose=True)

Combination 2 by 2

253it [00:02, 121.98it/s]

Combination 3 by 3

1771it [00:25, 70.55it/s]

Combination 4 by 4

8855it [01:17, 113.95it/s]

Combination 5 by 5

33649it [05:16, 106.44it/s]

Combination 6 by 6

100947it [16:08, 104.18it/s]

Combination 7 by 7

245157it [41:05, 99.45it/s]

In [10]: import pandas as pd



result = pd.DataFrame(rule_supports.items(),columns=['Item', 'Support'])
result.sort_values(by=['Support'], ascending=False)

Out[10]: Item Support

0 spaghetti & mineral water 0.060646

localhost:8889/notebooks/Untitled2.ipynb 5/5
Conclusion:
It is concluded that APRIORI algorithm is the
fastest algorithm for large dataset and FP-
GROWTH algorithm are the fastest algorithms for
small dataset.

You might also like