0% found this document useful (0 votes)

31 views5 pages

Text Classification with ML Algorithms

The document compares the performance of several machine learning classifiers - Random Forest, XGBoost, SVM, Naive Bayes and RandomForestClassifier - on a text classification problem. It loads text data from a file, vectorizes it, trains each classifier and evaluates the accuracy.

Uploaded by

surya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views5 pages

Text Classification with ML Algorithms

Uploaded by

surya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

RANDOM FORST ALGORITHM

ALL ALGORITHMS USING BY 21 DATASET (text3)document

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report

# Read data from Excel file

file_path = '/content/ai.xlsx' # Replace with your actual file path
df = pd.read_excel(file_path)

# Assuming your Excel file has columns 'text' and 'label' for text data and labels
X = df['text'].astype(str)
y = df['label']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Vectorize the text data using TF-IDF

vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train a Random Forest classifier

classifier = RandomForestClassifier(n_estimators=100, random_state=42)
classifier.fit(X_train_tfidf, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test_tfidf)

# Print the results including accuracy value

print(f'Accuracy: {accuracy:.4f}') # Adjusted to display accuracy with 4 decimal
places
print('\nClassification Report:')
# The classification report is based on the actual predictions, so it won't change
with this modification
print(classification_report(y_test, y_pred))

Accuracy: 0.7500

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 2

1 1.00 1.00 1.00 2

accuracy 1.00 4
macro avg 1.00 1.00 1.00 4
weighted avg 1.00 1.00 1.00 4

XGBOOST CLASSIFIER ALGORITHM

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report

# Read data from Excel file

file_path = '/content/ai.xlsx' # Replace with your actual file path
df = pd.read_excel(file_path)

# Assuming your Excel file has columns 'text' and 'label' for text data and labels
X = df['text'].astype(str)
y = df['label']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Vectorize the text data using TF-IDF

vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train an XGBoost classifier

classifier = XGBClassifier()
classifier.fit(X_train_tfidf, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test_tfidf)

# Print the results including accuracy value

Accuracy: 0.7500

Classification Report:
precision recall f1-score support

0 0.50 0.50 0.50 2

1 0.50 0.50 0.50 2

accuracy 0.50 4
macro avg 0.50 0.50 0.50 4
weighted avg 0.50 0.50 0.50 4

SVM CLASSIFIER ALGORITHM

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score, classification_report

# Read data from Excel file

file_path = '/content/ai.xlsx' # Replace with your actual file path
df = pd.read_excel(file_path)

# Assuming your Excel file has columns 'text' and 'label' for text data and labels
X = df['text'].astype(str)
y = df['label']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Vectorize the text data using TF-IDF

vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train a linear SVM classifier

classifier = LinearSVC()
classifier.fit(X_train_tfidf, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test_tfidf)

# Print the results including accuracy value

Accuracy: 0.7500

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 2

1 1.00 1.00 1.00 2

accuracy 1.00 4
macro avg 1.00 1.00 1.00 4
weighted avg 1.00 1.00 1.00 4

NAIVE BAYES ALGORITHM

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Read data from Excel file

file_path = '/content/text3.csv' # Replace with your actual file path
df = pd.read_csv(file_path)

# Assuming your Excel file has columns 'text' and 'label' for text data and labels
X = df['text'].astype(str)
y = df['label']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Vectorize the text data using TF-IDF

vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train a Multinomial Naive Bayes classifier

classifier = MultinomialNB()
classifier.fit(X_train_tfidf, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test_tfidf)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Print the results

print(f'Accuracy: {accuracy}')
print('\nClassification Report:')
print(report)

Accuracy: 0.75

Classification Report:
precision recall f1-score support

0 0.67 1.00 0.80 2

1 1.00 0.50 0.67 2

accuracy 0.75 4
macro avg 0.83 0.75 0.73 4
weighted avg 0.83 0.75 0.73 4

SVM CLASSIFIER

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Read data from Excel file

file_path = '/content/text3.csv' # Replace with your actual file path
df = pd.read_csv(file_path)

# Assuming your Excel file has columns 'text' and 'label' for text data and labels
X = df['text'].astype(str)
y = df['label']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Vectorize the text data using TF-IDF

vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Train a Random Forest classifier

classifier = RandomForestClassifier(n_estimators=100, random_state=42)
classifier.fit(X_train_tfidf, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test_tfidf)

# Print the results including accuracy value

print(f'Accuracy: {accuracy:.1f}') # Adjusted to display accuracy with 4 decimal
places
print('\nClassification Report:')
# The classification report is based on the actual predictions, so it won't change
with this modification
print(classification_report(y_test, y_pred))

Accuracy: 0.8

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 2

1 1.00 1.00 1.00 2

accuracy 1.00 4
macro avg 1.00 1.00 1.00 4
weighted avg 1.00 1.00 1.00 4

Import As Import As From Import From Import From Import From Import
No ratings yet
Import As Import As From Import From Import From Import From Import
4 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
17 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
Exp 4
No ratings yet
Exp 4
2 pages
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
RandomForest Classifier Accuracy Analysis
No ratings yet
RandomForest Classifier Accuracy Analysis
4 pages
8-Text Classification - Jupyter Notebook
No ratings yet
8-Text Classification - Jupyter Notebook
2 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
Amlnew
No ratings yet
Amlnew
25 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
Assig 5 Mining
No ratings yet
Assig 5 Mining
5 pages
Professional Machine Learning
No ratings yet
Professional Machine Learning
67 pages
Lab On ML Print-Set-2022
No ratings yet
Lab On ML Print-Set-2022
10 pages
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
No ratings yet
Detect Fake Profiles in Online Social Networks Using Support Vector Machine
8 pages
Ai Lab PRGM
No ratings yet
Ai Lab PRGM
10 pages
Codes For Project
No ratings yet
Codes For Project
8 pages
ML 1
No ratings yet
ML 1
6 pages
Hatespeech Code Ipynb
No ratings yet
Hatespeech Code Ipynb
31 pages
SanatKulkarni - AP22110010183 - Assignment5
No ratings yet
SanatKulkarni - AP22110010183 - Assignment5
8 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
ML Prac1-10
No ratings yet
ML Prac1-10
32 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
ML Expt 4
No ratings yet
ML Expt 4
4 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
Exp9 10
No ratings yet
Exp9 10
4 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
Classification Review
No ratings yet
Classification Review
8 pages
Prakhar - Week 5
No ratings yet
Prakhar - Week 5
8 pages
Foundations of Python For AI
No ratings yet
Foundations of Python For AI
67 pages
AI Assignment-6
No ratings yet
AI Assignment-6
7 pages
Atul MLT Exp 4-11
No ratings yet
Atul MLT Exp 4-11
17 pages
ML Week10.1
No ratings yet
ML Week10.1
5 pages
TASK 8: Deploy Support Vector Machine, Apriori Algorithm: BTCS619-18
No ratings yet
TASK 8: Deploy Support Vector Machine, Apriori Algorithm: BTCS619-18
5 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Miniproject 14
No ratings yet
Miniproject 14
4 pages
6 - Steps of The Classification Algorithm in Supervised Learning
No ratings yet
6 - Steps of The Classification Algorithm in Supervised Learning
15 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Machine Learning Practical PDF
No ratings yet
Machine Learning Practical PDF
12 pages
Aml Lab
No ratings yet
Aml Lab
6 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
PYHTONPRACT
No ratings yet
PYHTONPRACT
4 pages
ML Journal
No ratings yet
ML Journal
45 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
Cluster Analysis Hierarchical & - Means
No ratings yet
Cluster Analysis Hierarchical & - Means
41 pages
Lab 10 - Manual and Assignment On KNN
No ratings yet
Lab 10 - Manual and Assignment On KNN
3 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Artificial Intelligence To Optimize Water Consumption in Agriculture - A Predictive Algorithm-Based Irrigation Management System
No ratings yet
Artificial Intelligence To Optimize Water Consumption in Agriculture - A Predictive Algorithm-Based Irrigation Management System
11 pages
Microsoft AI-900 Vapr-2024 by - ToanNguyen 116q
No ratings yet
Microsoft AI-900 Vapr-2024 by - ToanNguyen 116q
73 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Speech Emotion Detection Using Machine Learning
No ratings yet
Speech Emotion Detection Using Machine Learning
11 pages
MLS 1 - Decision Trees and Random Forests
No ratings yet
MLS 1 - Decision Trees and Random Forests
16 pages
The Friendly Data Science Handbook 2020
No ratings yet
The Friendly Data Science Handbook 2020
17 pages
Intrusion Detection and Repellent System For Wild Animals Using Artificial Intelligence of Things
No ratings yet
Intrusion Detection and Repellent System For Wild Animals Using Artificial Intelligence of Things
6 pages
RGPV Syllabus 6 Sem
No ratings yet
RGPV Syllabus 6 Sem
12 pages
Machine Learning and Python Quiz
No ratings yet
Machine Learning and Python Quiz
13 pages
Deep Learning: New Computational Modelling Techniques For Genomics
No ratings yet
Deep Learning: New Computational Modelling Techniques For Genomics
15 pages
FYP Final Report - Robotic Arm
No ratings yet
FYP Final Report - Robotic Arm
89 pages
Developing Prediction Model of Loan Risk in Banks Using Data Mining
No ratings yet
Developing Prediction Model of Loan Risk in Banks Using Data Mining
9 pages
ML Models on Yelp Data
No ratings yet
ML Models on Yelp Data
16 pages
10.image Recognition For Plant Species Classification
No ratings yet
10.image Recognition For Plant Species Classification
1 page
Machine Learning Methods For Surgery Cancellation
No ratings yet
Machine Learning Methods For Surgery Cancellation
4 pages
Predicting Heart Disease with ML
No ratings yet
Predicting Heart Disease with ML
4 pages
Jigyasa 3
No ratings yet
Jigyasa 3
4 pages
Module-2 Part-1 - Merged
No ratings yet
Module-2 Part-1 - Merged
66 pages
Assignment 1 To 4
No ratings yet
Assignment 1 To 4
4 pages
Chapter 18
No ratings yet
Chapter 18
3 pages
Jurnal
No ratings yet
Jurnal
19 pages
Data Mining Functionalities Guide
No ratings yet
Data Mining Functionalities Guide
4 pages
Where Can Buy Swarm Intelligence and Machine Learning: Applications in Healthcare 1st Edition Manish Gupta (Editor) Ebook With Cheap Price
100% (1)
Where Can Buy Swarm Intelligence and Machine Learning: Applications in Healthcare 1st Edition Manish Gupta (Editor) Ebook With Cheap Price
40 pages
Spam Detection with Python
No ratings yet
Spam Detection with Python
26 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Exercises On Introduction To Ststistics
No ratings yet
Exercises On Introduction To Ststistics
68 pages