0% found this document useful (0 votes)
13 views3 pages

2 Tws

The document explains TF-IDF (Term Frequency - Inverse Document Frequency) as a method to measure the importance of a word in a document based on its frequency and rarity across documents. It outlines the calculation steps for TF-IDF, providing a sample calculation for the word 'cat' resulting in a TF-IDF score of 0.04436. Additionally, the document mentions topics like text classification, topic modeling techniques, and web usage data mining without detailing them.

Uploaded by

zaincreative14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

2 Tws

The document explains TF-IDF (Term Frequency - Inverse Document Frequency) as a method to measure the importance of a word in a document based on its frequency and rarity across documents. It outlines the calculation steps for TF-IDF, providing a sample calculation for the word 'cat' resulting in a TF-IDF score of 0.04436. Additionally, the document mentions topics like text classification, topic modeling techniques, and web usage data mining without detailing them.

Uploaded by

zaincreative14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Long Answers:

1.Define TF-IDF and perform a calculation procedure of TF-IDF with a sample


example.
TF-IDF
●​ TF-IDF stands for Term Frequency - Inverse Document Frequency.
●​ It is a way to measure how important a word is in a document, based on how
often it appears and how rare it is across all documents.

TF (Term Frequency): How often a word appears in a document.

IDF (Inverse Document Frequency): How rare a word is across all documents.

Steps to Calculate TF-IDF

Step 1: Calculate Term Frequency (TF)

Formula:
TF = Number of times the word appears in the document ÷ Total number of words in the
document

If the word "cat" appears 2 times in a document with 10 words, then:

TF = 2 ÷ 10 = 0.2

Step 2: Calculate Inverse Document Frequency (IDF)


Formula:
IDF = log (Total number of documents ÷ Number of documents containing the word)

If there are 5 documents in total, and the word "cat" appears in 3 of them, then:
IDF = log (5 ÷ 3) = 0.2218

Step 3: Multiply TF and IDF to get TF-IDF


Formula:
TF-IDF = TF * IDF

= 0.2 * 0.2218 = 0.04436

Final Result:

TF-IDF for the word "cat" in this document = 0.04436

Conclusion
This method helps us find important words in a document compared to other
documents. If a word appears a lot in one document but not in others, it gets a high
TF-IDF score. If a word appears everywhere, it gets a low score.

2.Detailed explanation on text classification and demonstrate the process with


few relevant examples.

3.What is topic modeling? Explain two topic modeling techniques.


4.Explain web usage data mining, state its types, and give few applications of it.

You might also like