0% found this document useful (0 votes)

24 views30 pages

Asfaq Final

Uploaded by

sabhariguna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views30 pages

Asfaq Final

Uploaded by

sabhariguna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

SUMMER INTERNSHIP REPORT

SUBMITTED BY

ASFAQ AHAMED M(812022205007)

An Partial fulfillment for the award of the Degree

BACHELOR OF ENGINEERING
in
DEPARTMENT OF INFORMATION TECHNOLOGY

(Internship Duration: Start Date to End Date)

(30 Days : 23.06.2025 - 28.07.2025)

M.A.M COLLEGEOF ENGINEERING AND TECHNOLOGY

TIRUCHIRAPPALLI – 621105

ANNA UNIVERSITY : CHENNAI-600025

JUNE - 2025
BONAFIDE CERTIFICATE

This is to certify that,ASFAQ AHAMED M has Satisfactorily completed the 1

month Summer Internship at “THE MIND IT SOLUTION TRICHY- 620002”. This
report is being submitted in partial fulfillment for the award of degree of Bachelor of
Technology in Department of Information Technology to M.A.M College of
Engineering and Technology, under my guidance.

SIGNATURE SIGNATURE
Dr.K.Geetha,M.E.,Ph.D., Ms.T.Gurudharshini,B.E..,M.TECH..,

Head of the Department Assistant Professor

Department of IT Department of IT

M.A.M College of Engineering M.A.M College of Engineering

And Technology. And Technology.

Trichy-621105 Trichy-621105
CERTIFICATE
DECLARATION

I ASFAQ AHAMED M hereby declare that the Internship report submitted to

M.A.M College of Engineering and Technology in partial fulfillment of the

requirement for the award of the Degree of Bachelor of Technology in Department of
Information Technology is a record of original training undergone by me during
the period of June-2025 under the Supervisor and guidance of
Ms.T.Gurudharshini,B.E..,M.TECH.., Assistant Professor, Department of
Information Technology,M.A.M College of Engineering and Technology and it has
not formed the basis for the award of any Degree / Fellowship or other similar title to any
candidate of any University.

Place:

Date : Signature of the Student

ACKNOWLEDGEMENT

With warm hearts and immense pleasure, I Thank the almighty for his grace and
blessing bestowed on me, which drove me to the successful completion of this
project . I take this opportunity to express my sincere thanks to the respected Director
Dr.M.A.Maluk Mohammed,M.E.,Ph.D.,and Secretary & Correspondent Mrs.
Fathima Bathool Makul who is guiding light for all activities in our college.

I express my sincere and humble tone of thanks to our principal Dr.X.Susan

Christina,M.E.,Ph.D., for providing me with all facilities needed for the
successful completion of my work.

I would like thank to our Head of the department Dr.K.Geetha,M.E.,Ph.D., for her
cooperation, advice and suggestions at every stage of my project work.

I am very proud to extend my sincere thanks and gratitude to our Supervisor

Ms.T.Gurudharshini,B.E..,M.TECH..,Assistant professor, Department of
Information Technology, M.A.M College of Engineering & Technology for her
excellent guidance, advice and encouragement which boosted up our energy through-
out the project Development.

I also thank all the teaching staff and non-teaching staff of the Department of
Information Technology, my parents, and all my friends for their help and support
to complete this project successfully.
CONTENT

S.NO TITLE PAGE.NO

1 Introduction to Artificial Intelligence 07

2 Python Fundamentals for AI 09

3 Control Structures and Data Handling 11

4 Functions, Modules, and AI Libraries 12

5 Working with AI Datasets 14

6 Machine Learning Algorithms 15

7 Deep Learning with Neural Networks 16

8 Natural Language Processing 18

9 Implementation 20

10 Model Training and Testing 22

11 Sample Projects 24
CHAPTER 1

Introduction to Artificial Intelligence

Artificial Intelligence (AI) is a multidisciplinary branch of computer science that aims to
create machines capable of simulating human-like intelligence. This includes the ability to
perceive the environment, process information, learn from experience, reason logically, and
make decisions. AI combines principles from mathematics, statistics, neuroscience,
linguistics, and computer science to design algorithms that can adapt and improve
performance over time without direct human intervention.
History and Evolution :
The roots of AI date back to the 1950s, when pioneers like Alan Turing, John McCarthy,
Marvin Minsky, and Herbert Simon laid the conceptual foundation. The term “Artificial
Intelligence” was formally introduced in 1956 at the Dartmouth Conference. The 1960s and
70s saw the development of symbolic AI systems and early neural networks, though limited
computing power restricted their capabilities. The 1980s popularized expert systems, but their
rigidity led to a decline in interest. In the 2000s, the explosion of big data, advances in
machine learning, and improved computational power revitalized AI, with deep learning
achieving human-level performance in several domains.
Categories of AI :
AI can be classified based on capability:
• Narrow AI – Focused on specific tasks like translation or speech recognition.
• General AI – Hypothetical AI with the ability to perform any intellectual task a human can
do.
• Super AI – A theoretical form of AI surpassing human intelligence.
Functionally, AI can also be divided into Reactive Machines, Limited Memory AI, Theory
of Mind AI, and Self-aware AI.
Applications of AI :
AI applications span healthcare, finance, education, manufacturing, and entertainment. In
healthcare, AI aids diagnosis, predicts disease risks, and assists in personalized treatment
planning. In finance, AI detects fraud, manages risk, and automates trading. In manufacturing,
AI improves production efficiency, conducts predictive maintenance, and ensures quality
control. AI also powers personal assistants, chatbots, recommendation systems, and
autonomous vehicles.
Challenges and Ethics :
Despite its benefits, AI raises concerns about data privacy, algorithmic bias, job
displacement, and decision transparency. Ensuring ethical AI involves fairness,
accountability, transparency, and human oversight. Governance frameworks and regulatory
guidelines are being developed globally to promote responsible AI use.

Core Areas of AI :

• Machine Learning

• Natural Language Processing

• Computer Vision

• Robotics

• Expert Systems
Future Scope and Trends in Artificial Intelligence
Artificial Intelligence is expected to evolve rapidly, influencing almost every sector of
human life. Its future scope extends beyond current applications, focusing on higher levels
of autonomy, intelligence, and adaptability.
CHAPTER 2
Python Fundamentals for AI
Python is the most widely used language for AI development due to its simplicity,
readability, and robust ecosystem of libraries. Its syntax resembles natural language,
reducing the learning curve and enabling faster development. Python supports rapid
prototyping, making it ideal for research and production environments.

Advantages of Python for AI

Python’s popularity in AI stems from its ease of learning, versatility, and large community
support. It integrates easily with C++, Java, and R, allowing hybrid solutions. Python’s
extensive set of AI-focused libraries eliminates the need to build algorithms from scratch,
accelerating project timelines.
Essential Python Concepts for AI :
AI development requires familiarity with Python’s core concepts, including data types
(integers, floats, strings, booleans), data structures (lists, tuples, sets, dictionaries), control
structures (if-else statements, loops), and object-oriented programming. Exception handling
ensures robustness by managing errors gracefully.
Python Library Ecosystem for AI :
• NumPy – Numerical computations and matrix operations.
• Pandas – Data manipulation and preprocessing.
• Matplotlib & Seaborn – Visualization tools for data analysis.
• Scikit-learn – Machine learning algorithms and preprocessing utilities.
• TensorFlow & PyTorch – Deep learning model development.
• NLTK & SpaCy – Natural language processing.
• OpenCV – Computer vision tasks.
Python in the AI Workflow :
Python is used across the AI pipeline, from data collection and preprocessing to model
training, evaluation, and deployment. Its integration with web frameworks like Flask and
FastAPI enables AI model deployment as APIs for real-time applications.

AI Tools And Framework :

AI tools and frameworks are software platforms, libraries, and environments that
help developers build, train, and deploy artificial intelligence models efficiently.
They provide pre-built functions, algorithms, and optimization techniques,
reducing the time and complexity of development.

• TensorFlow – An open-source framework by Google for building and training

deep learning models, supporting CPU and GPU processing.

• PyTorch – Developed by Meta (Facebook), known for its flexibility and

dynamic computation graphs, making research and prototyping easier.

• Keras – A high-level API that runs on top of TensorFlow, simplifying the

process of building neural networks.

• Scikit-learn – A Python library for machine learning, offering simple tools for
classification, regression, clustering, and model evaluation
CHAPTER 3

Control Structures and Data Handling

Control structures determine the flow of execution in a program, enabling decision-making

and iterative processing. In AI, control structures help manage dataset iterations, conditional
logic, and automated workflows.
Conditional Statements in AI
AI models often require conditional checks, such as filtering data, selecting algorithms
based on input size, or applying specific preprocessing steps. This allows dynamic
adaptation of workflows based on conditions.
Loop Structures in AI Applications
Loops allow repetitive tasks, such as training models over multiple epochs, processing large
datasets, and applying transformations to each data element. Efficient looping techniques
are essential for handling AI workloads.
Data Handling in AI
Data handling involves acquiring, cleaning, transforming, and storing data for analysis. AI
relies heavily on both structured (tabular) and unstructured (images, audio, text) data.
Effective handling ensures high-quality inputs, leading to better model performance.
Data Management Challenges
AI projects often face issues like missing values, inconsistent formats, and unbalanced
datasets. Robust preprocessing techniques and validation steps are critical to ensuring
reliable results.
. CHAPTER 4

Functions, Modules, and AI Libraries

Functions, modules, and libraries form the foundation of organized AI development in

Python. Functions encapsulate specific tasks into reusable blocks of code. Modules group
related functions, classes, and variables into a single file. Libraries are collections of modules
designed to perform a wide variety of operations, including AI-specific computations. This
modular approach improves maintainability, scalability, and collaboration in AI projects.
Role of Functions in AI
Functions allow developers to break down AI workflows into manageable steps, such as data
preprocessing, model training, and performance evaluation. This modularity enhances
reusability, ensuring that the same function can be applied to different datasets or models with
minimal changes. It also improves debugging efficiency by isolating logical errors to specific
components.
Importance of Modules in AI Projects
Modules help structure AI applications by grouping related functionalities. For example, a
preprocessing module may contain all data-cleaning routines, while a model module might
include training and evaluation methods. This separation fosters a clean architecture, making
large AI projects easier to understand and extend.
Key AI Libraries
• NumPy – Fundamental for numerical computation and matrix manipulation.
• Pandas – Handles structured data efficiently.
• Matplotlib/Seaborn – Provides visualization capabilities for dataset analysis and model
results.
• Scikit-learn – Offers classical machine learning algorithms and preprocessing utilities.
• TensorFlow/Keras/PyTorch – Enable deep learning model creation and training.
• OpenCV – Supports computer vision tasks such as image recognition and object detection.
Benefits of Using Libraries in AI
Using pre-built libraries saves development time, reduces errors, and ensures optimized
performance. Libraries are maintained by large developer communities, ensuring regular
updates and compatibility with the latest technologies.
Integration of Libraries in AI Workflows
AI projects often integrate multiple libraries in a single pipeline. For example, Pandas may
be used for data cleaning, Scikit-learn for feature selection, and TensorFlow for deep learning.
This interoperability is one of Python’s strongest advantages in AI.

Modules in AI Development :

AI programming, a module is a file or collection of files containing Python definitions,

functions, and classes that can be reused in other programs. Modules help organize code into
logical sections, making it easier to maintain, debug, and scale AI projects. They can be built-
in (such as math, os, and random) or user-defined, created to handle specific tasks like data
preprocessing or model evaluation.modules are widely used to import specialized
functionality from external libraries, such as NumPy for numerical computations, Pandas for
data handling, and TensorFlow or PyTorch for building and training models. By using
modules, developers can avoid rewriting common code, improve efficiency, and ensure better
project structure.
Common AI Libraries Used :
• TensorFlow – Deep learning framework for large-scale AI models
• PyTorch – Flexible framework for research and prototyping
• Scikit-learn – Machine learning algorithms and utilities
• OpenCV – Computer vision and image processing tasks
CHAPTER 5
Working with AI Datasets

AI models rely on datasets for training, validation, and testing. A dataset can be structured
(tables), unstructured (images, audio, text), or semi-structured (JSON, XML). The quality,
size, and diversity of a dataset directly affect the performance and generalization
capability of an AI model.
Data Sources
AI datasets may come from public repositories, APIs, IoT devices, or proprietary company
databases. Popular public sources include Kaggle, UCI Machine Learning Repository, and
ImageNet. Data collection should align with project objectives and ethical guidelines.
Dataset Preparation
Raw data is rarely ready for use. Preprocessing steps include cleaning (removing
duplicates, handling missing values), normalization (scaling numerical values), and
encoding categorical variables. Data must also be split into training, validation, and testing
sets to prevent overfitting.
Data Quality and Bias
High-quality datasets are accurate, complete, and representative of the problem domain.
Biased datasets can lead to unfair or inaccurate AI predictions. Data augmentation
techniques can help improve diversity and balance within datasets.
Tools for Dataset Handling
Python libraries such as Pandas, NumPy, and OpenCV simplify dataset manipulation. For
large-scale datasets, tools like Apache Spark and Dask enable distributed processing.
Challenges in Dataset Management
Challenges include data scarcity, privacy concerns, unbalanced class distribution, and
maintaining dataset relevance over time. Addressing these challenges is critical for
developing robust AI models.
CHAPTER 6
Machine Learning Algorithms

Machine Learning (ML) is a core subset of AI that enables systems to learn patterns from
data and improve performance without being explicitly programmed. ML algorithms adapt
their behavior based on input data, making them essential for predictive analytics and
intelligent decision-making.

Types of Machine Learning :

• Supervised Learning – Uses labeled datasets for training, such as classification and
regression tasks.
• Unsupervised Learning – Works with unlabeled data to find patterns and groupings, such
as clustering.
• Reinforcement Learning – Learns through interaction with an environment by maximizing
rewards.

Popular Machine Learning Algorithms :

• Linear Regression – Predicts continuous values.
• Logistic Regression – Used for binary classification.
• Decision Trees & Random Forests – Handle complex decision-making tasks.
• Support Vector Machines (SVM) – Classifies data by finding the optimal hyperplane.
• K-Means Clustering – Groups similar data points in unsupervised learning tasks.

Applications of Machine Learning

ML powers recommendation engines, fraud detection systems, medical diagnosis tools,
speech recognition, and autonomous systems. In business, it helps in customer segmentation,
sales forecasting, and operational optimization.
Advantages and Limitations
Advantages include adaptability, automation, and the ability to uncover hidden patterns.
Limitations involve dependency on data quality, high computational requirements, and the
risk of overfitting.

The Role of ML in AI Development

Machine Learning serves as the backbone for many AI applications, bridging the gap between
raw data and actionable intelligence. It transforms datasets into predictive models capable of
handling real-world complexity.

Supervised Learning Algorithms :

Supervised learning algorithms are trained using labeled datasets, where each input has a
corresponding correct output. The model learns patterns from this data to make predictions
on new, unseen inputs. Common algorithms include Linear Regression for predicting
continuous values, Logistic Regression for classification tasks, Decision Trees and Random
Forests for structured predictions, Support Vector Machines (SVM) for separating classes,
and K-Nearest Neighbors (KNN) for instance-based learning.

Unsupervised Learning Algorithms :

Unsupervised learning algorithms work with unlabeled datasets, where the system tries to

identify hidden patterns, relationships, or structures without predefined outputs. They are
mainly used for clustering, dimensionality reduction, and association rule mining.
CHAPTER 7
Deep Learning with Neural Networks

Deep Learning is a specialized branch of Machine Learning that focuses on algorithms inspired
by the structure and functioning of the human brain. It uses multi-layered neural networks,
known as deep neural networks, to automatically learn hierarchical features from large
datasets. Unlike traditional ML algorithms, deep learning can automatically extract relevant
features from raw data, reducing the need for manual feature engineering. This capability has
led to breakthroughs in image recognition, speech processing, and natural language
understanding. Deep learning thrives on large datasets and high computational power, making
it ideal for complex AI applications where traditional ML methods may fail.
Structure of Neural Networks
A neural network consists of interconnected layers of nodes, called neurons.
• Input Layer – Receives raw data, such as pixel values in images or word embeddings in text.
• Hidden Layers – Contain multiple neurons that transform input features into more abstract
representations. Each neuron applies a weighted sum of its inputs followed by a non-linear
activation function.
• Output Layer – Produces the final prediction, such as a class label or probability score.
The depth of the network refers to the number of hidden layers. Deep neural networks often
have dozens or even hundreds of layers, enabling them to model highly complex relationships.
Training Deep Neural Networks
• Training involves feeding data through the network, computing prediction errors using a loss
function, and updating weights via backpropagation and optimization algorithms like
stochastic gradient descent. Proper training requires large amounts of labeled data,
regularization techniques to prevent overfitting, and careful selection of hyperparameters such
as learning rate, batch size, and number of layers.
Applications of Deep Learning
Deep learning powers many modern AI applications:
• Image Classification & Object Detection – Face recognition, medical imaging.
• Natural Language Processing – Machine translation, sentiment analysis, chatbots.
• Speech Recognition – Virtual assistants like Siri and Alexa.
• Generative AI – Creating images, music, and text.
• Autonomous Vehicles – Perception systems for navigation.

Types of Deep Learning Architectures

o Different neural network architectures are suited to different tasks:
• Convolutional Neural Networks (CNNs) – Designed for spatial data such as images,
CNNs use convolutional filters to detect features like edges, textures, and patterns.
• Recurrent Neural Networks (RNNs) – Suited for sequential data such as text and time
series. Variants like LSTMs and GRUs address issues like vanishing gradients.
• Transformer Models – Highly effective in NLP tasks due to their ability to handle long-
range dependencies in text.
• Autoencoders – Used for unsupervised learning tasks like dimensionality reduction and
anomaly detection.
Challenges and Future Trends
Challenges include high computational costs, dependence on massive datasets, and the
black-box nature of neural networks. Future research focuses on explainable AI, energy-
efficient models, and integration with quantum computing for faster processing.
CHAPTER 8
Natural Language Processing (NLP)
Natural Language Processing (NLP) is a field of AI that enables machines to understand,
interpret, and generate human language. It bridges computer science, linguistics, and AI to
create systems capable of reading text, listening to speech, and responding in a human-like
manner. NLP plays a crucial role in applications such as chatbots, translation services, and
voice assistants.
Core Components of NLP
NLP involves multiple tasks, including:
• Tokenization – Breaking text into words or phrases.
• Part-of-Speech Tagging – Identifying grammatical roles of words.
• Named Entity Recognition (NER) – Detecting names of people, organizations, locations.
• Parsing – Analyzing sentence structure.
• Sentiment Analysis – Determining the emotional tone of text.
NLP Techniques and Models
Traditional NLP relied on statistical methods like Hidden Markov Models and Conditional
Random Fields. Modern NLP uses deep learning models such as:
• Word Embeddings (Word2Vec, GloVe) – Represent words in a continuous vector space.
• Transformer-based Models (BERT, GPT) – Capture context more effectively and handle
long-range dependencies.
• Sequence-to-Sequence Models – Power tasks like machine translation and text
summarization.
Applications of NLP
• Machine Translation – Google Translate, DeepL.
• Conversational Agents – Chatbots, virtual assistants.
• Information Retrieval – Search engines, document indexing.
• Text Summarization – Condensing lengthy documents into concise overviews.
• Speech-to-Text – Converting audio into written form.
Challenges in NLP
Language is inherently ambiguous, with slang, idioms, and cultural variations making
interpretation difficult. Low-resource languages often lack sufficient training data, and
biases in datasets can lead to discriminatory outputs.
Future of NLP
Research is moving toward zero-shot and few-shot learning, where models can understand
tasks without large labeled datasets. Multimodal NLP, integrating text with images or audio,
is also gaining traction.

Components of NLP :

The main components of Natural Language Processing (NLP) are designed to help machines
understand and process human language effectively. Morphological Analysis focuses on the
structure of words and their formation. Syntax Analysis examines the grammatical structure
of sentences. Semantics deals with the meaning of words and sentences, while Pragmatics
interprets language based on context and intent. Discourse analysis studies how sentences
relate within a larger text. Additionally, Phonology handles the sound structure in speech-
based NLP. Together, these components enable NLP systems to process, interpret, and
generate language for applications like translation, sentiment analysis, and conversational
AI.

Popular NLP Algorithms :

Natural Language Processing (NLP) uses various algorithms to perform tasks like text
classification, sentiment analysis, and machine translation. Common algorithms include
Naïve Bayes Classifier, often used for spam filtering and sentiment analysis, and Support
Vector Machines (SVM) for text categorization.
CHAPTER 9
Implementation
Implementation is the process of converting AI designs and models into fully functional
systems. It involves integrating trained models into applications, ensuring they operate
efficiently in real-world environments, and maintaining them over time.

Stages of AI Implementation

1. Requirement Analysis – Identify the problem, define objectives, and determine feasibility.

2. Data Preparation – Collect and preprocess relevant datasets.

3. Model Development – Select, train, and validate suitable AI algorithms.

4. Integration – Embed the model into the target application.

5. Testing – Evaluate performance in real-world scenarios.

6. Deployment – Launch the AI system for end-user access.

7. Monitoring and Maintenance – Ensure the model remains accurate over time.

Tools and Frameworks

AI implementation uses frameworks like TensorFlow, PyTorch, and Scikit-learn for model
training, and Flask, FastAPI, or Django for deployment as APIs. For large-scale
deployments, cloud platforms such as AWS, Azure, and Google Cloud provide scalable
infrastructure.

Challenges in Implementation

Challenges include integrating AI into existing systems, managing latency for real-time
applications, ensuring security, and complying with data privacy regulations. Additionally,
models may degrade over time due to changes in data distributions, requiring retraining.

Best Practices

Best practices involve modular design, continuous integration, automated testing, version
control for models, and user feedback loops. Explainability and interpretability are critical,
especially in regulated industries like healthcare and finance.

Future Trends in AI Implementation

Trends include edge AI, where models run on local devices to reduce latency, and AI-as-a-
Service (AIaaS) platforms that simplify implementation for businesses. MLOps (Machine
Learning Operations) is emerging as a framework for managing AI projects from
development to deployment.

Data Preprocessing :

Data preprocessing is a crucial step in AI and machine learning projects, ensuring the dataset
is clean, consistent, and suitable for analysis. It involves data cleaning (handling missing
values, removing duplicates, correcting errors), data transformation (normalization, scaling,
encoding categorical variables), and data reduction (removing irrelevant features or
dimensionality reduction). In NLP tasks, preprocessing may include tokenization, stop-word
removal, stemming, and lemmatization. Proper preprocessing improves model accuracy,
reduces noise, and enhances training efficiency. By preparing data effectively, the system
can learn meaningful patterns and deliver reliable results during both training and real-world
deployment.

Model Training :

Model training is the process of feeding preprocessed data into a selected algorithm so it can
learn patterns and relationships. During training, the model adjusts its parameters to
minimize errors using optimization techniques. This step is crucial for enabling accurate
predictions on unseen data.
CHAPTER 10

CODE :
import argparse
import logging
import os
import sys
from collections import Counter
from typing import List, Tuple, Iterable

import pandas as pd
from transformers import pipeline
import spacy

import matplotlib.pyplot as plt

import seaborn as sns

try:
from wordcloud import WordCloud
WORDCLOUD_AVAILABLE = True
except Exception:
WORDCLOUD_AVAILABLE = False

logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s - %(message)s",
datefmt="%H:%M:%S"
)
log = logging.getLogger("review_analyzer")

def load_sample_data() -> pd.DataFrame:

sample = {
"review_text": [
"Great quality and fast delivery! Very satisfied.",
"Poor battery life. Not worth the price.",
"It's okay, nothing special but works as expected.",
"Amazing sound quality and battery backup!",
"Delivery was late and packaging was damaged.",
"Absolutely love the product, great value for money.",
"Terrible service, I will never buy from here again.",
"Battery lasts all day, camera quality is excellent.",
"Item arrived broken and customer support was useless.",
"The design is sleek and easy to use.",
"Excellent build quality; feels premium.",
"Not impressed. The app is buggy and crashes.",
"Five stars! Superb customer service and quick refund.",
"Mediocre performance under heavy load, overheating issue noticed.",
"Affordable and performs well for the price."
]
}
df = pd.DataFrame(sample)
return df

def read_csv_if_exists(path: str) -> pd.DataFrame:

if not os.path.isfile(path):
raise FileNotFoundError(f"Input file not found: {path}")
df = pd.read_csv(path)
if 'review_text' not in df.columns:
raise ValueError("Input CSV must contain a column named 'review_text'")
return df

def sanitize_text_for_model(text: str, max_len: int = 512) -> str:

if not isinstance(text, str):
return ""
return text[:max_len]

def flatten(list_of_lists: Iterable[Iterable[str]]) -> List[str]:

return [item for sub in list_of_lists for item in sub]

def get_top_ngrams(words: List[str], n: int = 1, top_k: int = 10) -> List[Tuple[str, int]]:

if n < 1:
return []
if n == 1:
counts = Counter(words)
return counts.most_common(top_k)
ngrams = zip(*(words[i:] for i in range(n)))
ngram_strings = [" ".join(gram) for gram in ngrams]
counts = Counter(ngram_strings)
return counts.most_common(top_k)

class CustomerReviewAnalyzer:

def init(self, df: pd.DataFrame):

if 'review_text' not in df.columns:
raise ValueError("DataFrame must contain 'review_text' column")
self.df = df.copy()
self.df['review_text'] = self.df['review_text'].astype(str)
self.sentiment_pipe = None
self.nlp = None

def setup_models(self):
log.info("Loading sentiment model (transformers pipeline)...")
# Hugging Face will download the default sentiment pipeline model if not available
self.sentiment_pipe = pipeline("sentiment-analysis")
log.info("Loading spaCy model (en_core_web_sm)...")
self.nlp = spacy.load("en_core_web_sm")
log.info("Models loaded.")

def run_sentiment_analysis(self, trunc: int = 512):

if self.sentiment_pipe is None:
raise RuntimeError("Sentiment model not loaded. Call setup_models() first.")
labels, scores = [], []
log.info("Running sentiment analysis on reviews...")
for i, txt in enumerate(self.df['review_text']):
safe_txt = sanitize_text_for_model(txt, max_len=trunc)
try:
result = self.sentiment_pipe(safe_txt)[0]
labels.append(result.get('label', 'UNKNOWN'))
# transformers sometimes return 'score'
scores.append(result.get('score', None))
except Exception as e:
log.warning("Sentiment analysis failed for row %d: %s", i, str(e))
labels.append("ERROR")
scores.append(None)
self.df['sentiment_label'] = labels
self.df['sentiment_score'] = scores
log.info("Sentiment analysis complete.")

def extract_keywords_spacy(self):
if self.nlp is None:
raise RuntimeError("spaCy model not loaded. Call setup_models() first.")
log.info("Extracting keywords with spaCy...")
kw_list = []
for doc in self.nlp.pipe(self.df['review_text'].tolist(), disable=["ner"]):
tokens = [token.lemma_.lower() for token in doc
if token.is_alpha and not token.is_stop]
kw_list.append(tokens)
self.df['keywords'] = kw_list
log.info("Keyword extraction complete.")
def compute_ngram_stats(self, top_k: int = 20) -> dict:
all_keywords = flatten(self.df['keywords'].tolist())
unigrams = get_top_ngrams(all_keywords, n=1, top_k=top_k)
bigrams = get_top_ngrams(all_keywords, n=2, top_k=top_k)
return {'unigrams': unigrams, 'bigrams': bigrams}

def plot_sentiment_distribution(self, show: bool = True, save_path: str = None):

log.info("Plotting sentiment distribution...")
plt.figure(figsize=(8, 5))
sns.countplot(x=self.df['sentiment_label'],
order=self.df['sentiment_label'].value_counts().index)
plt.title("Customer Sentiment Distribution")
plt.xlabel("Sentiment")
plt.ylabel("Number of Reviews")
plt.tight_layout()
if save_path:
plt.savefig(save_path)
log.info("Saved sentiment distribution to %s", save_path)
if show:
plt.show()
plt.close()

def generate_wordcloud(self, max_words: int = 100, show: bool = True, save_path: str =
None):
if not WORDCLOUD_AVAILABLE:
log.warning("WordCloud library not found. Skipping wordcloud generation.")
return
all_text = " ".join(flatten(self.df['keywords'].tolist()))
if not all_text.strip():
log.warning("No keyword text available for wordcloud.")
return
wc = WordCloud(width=800, height=400, max_words=max_words).generate(all_text)
plt.figure(figsize=(10, 5))
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.title("Keyword WordCloud")
if save_path:
plt.savefig(save_path)
log.info("Saved wordcloud to %s", save_path)
if show:
plt.show()
plt.close()

analyzer = CustomerReviewAnalyzer(df)
try:
analyzer.setup_models()
except Exception as e:
log.error("Failed to load models: %s", str(e))
log.error("Make sure 'transformers' and 'en_core_web_sm' are installed.")
sys.exit(1)

# Run analysis steps

analyzer.run_sentiment_analysis()
analyzer.extract_keywords_spacy()
ngram_stats = analyzer.compute_ngram_stats(top_k=args.topk)

# Print summary outputs to console

log.info("=== Sentiment Counts ===")
print(analyzer.df['sentiment_label'].value_counts().to_string(), "\n")

log.info("=== Top Unigrams ===")

for tok, cnt in ngram_stats['unigrams'][:args.topk]:
print(f"{tok}: {cnt}")
print()

log.info("=== Top Bigrams ===")

for tok, cnt in ngram_stats['bigrams'][:args.topk]:
print(f"{tok}: {cnt}")
print()

# Plots and wordcloud

if not args.no_plots:
analyzer.plot_sentiment_distribution(show=True,
save_path="sentiment_distribution.png")
if not args.no_wordcloud:
analyzer.generate_wordcloud(show=True, save_path="wordcloud.png")
else:
log.info("Wordcloud generation skipped by flag.")
else:
log.info("Plotting skipped by flag (--no-plots).")

# Save CSV
analyzer.save_results(out_csv=args.output)
log.info("Processing complete. Output CSV: %s", args.output)

if __name__ == "__main__":
main()
OUTPUT :
21:10:01 INFO - No input provided. Using embedded sample dataset.
21:10:01 INFO - Loading sentiment model (transformers pipeline)...
21:10:05 INFO - Loading spaCy model (en_core_web_sm)...
21:10:05 INFO - Models loaded.
21:10:05 INFO - Running sentiment analysis on reviews...
21:10:06 INFO - Sentiment analysis complete.
21:10:06 INFO - Extracting keywords with spaCy...
21:10:07 INFO - Keyword extraction complete.
21:10:07 INFO - === Sentiment Counts ===
POSITIVE 9
NEGATIVE 4
NEUTRAL 2
21:10:07 INFO
quality: 3
delivery: 2
battery: 2
service: 2
price: 2
design: 1
sound: 1
backup: 1
support: 1
money: 1
value: 1
app: 1
issue: 1
refund: 1
performance: 1
review_text sentiment_label sentiment_score
Great quality
POSITIVE 0.999
and fast delivery! Very satisfied.
Poor battery life. Not worth the price. NEGATIVE 0.998
It's okay, nothing special but works as expected. NEUTRAL 0.672

Asfaq Final
No ratings yet
Asfaq Final
27 pages
PRASANTH-mind It - PDF - 20250821 - 183413 - 0000
No ratings yet
PRASANTH-mind It - PDF - 20250821 - 183413 - 0000
27 pages
AI Micro-Project Report 2023
No ratings yet
AI Micro-Project Report 2023
11 pages
Ram Report
No ratings yet
Ram Report
35 pages
Pooji Eti Final
No ratings yet
Pooji Eti Final
10 pages
Intership Report
No ratings yet
Intership Report
41 pages
ETE Project
No ratings yet
ETE Project
20 pages
Ai Internship Report-9 PDF
No ratings yet
Ai Internship Report-9 PDF
33 pages
Internship Report Vanaja 4-1 VANAJA
No ratings yet
Internship Report Vanaja 4-1 VANAJA
52 pages
AI Seminar Report: Concepts & Applications
No ratings yet
AI Seminar Report: Concepts & Applications
37 pages
Generative Ai - Record
No ratings yet
Generative Ai - Record
70 pages
Ai Internship Report-9
No ratings yet
Ai Internship Report-9
33 pages
Department of Electrical & Electronics Engineering
No ratings yet
Department of Electrical & Electronics Engineering
18 pages
Ai Seminar
No ratings yet
Ai Seminar
13 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
17 pages
Internship Report On Ai
No ratings yet
Internship Report On Ai
32 pages
AI Microproject Report 2023
No ratings yet
AI Microproject Report 2023
13 pages
Internship - Report - On - Ai - and - ML - 23P15A0513 SARATH - Final
No ratings yet
Internship - Report - On - Ai - and - ML - 23P15A0513 SARATH - Final
32 pages
Module 1 Notes
No ratings yet
Module 1 Notes
6 pages
AI Strategies for Students
No ratings yet
AI Strategies for Students
14 pages
Collage CSC Project
No ratings yet
Collage CSC Project
25 pages
Role of AI in Future Period
No ratings yet
Role of AI in Future Period
53 pages
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
No ratings yet
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
6 pages
Python Intership Report
No ratings yet
Python Intership Report
22 pages
Artificial Intelligence: A Comprehensive Overview: Introduction To AI
No ratings yet
Artificial Intelligence: A Comprehensive Overview: Introduction To AI
5 pages
Report Ry
No ratings yet
Report Ry
21 pages
Anand First Page
No ratings yet
Anand First Page
351 pages
Emerging Trends Computer Group
No ratings yet
Emerging Trends Computer Group
184 pages
Updated Internship Front Page
No ratings yet
Updated Internship Front Page
13 pages
AI Industry
No ratings yet
AI Industry
22 pages
AI Assignment File
No ratings yet
AI Assignment File
19 pages
AI Impact in Everyday Life
No ratings yet
AI Impact in Everyday Life
62 pages
Shareef
No ratings yet
Shareef
29 pages
AI and Machine Learning in Action Real World Solutions For Coders
No ratings yet
AI and Machine Learning in Action Real World Solutions For Coders
175 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
10 pages
IBM Internship Report
No ratings yet
IBM Internship Report
49 pages
TYBCA-SEM6-AI Seminar
No ratings yet
TYBCA-SEM6-AI Seminar
24 pages
The Impact of Artificial Intelligence and Machine Learning On Computer Science and IT
No ratings yet
The Impact of Artificial Intelligence and Machine Learning On Computer Science and IT
3 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
39 pages
WPR 1
No ratings yet
WPR 1
8 pages
CLASS 10 Project Title AI
No ratings yet
CLASS 10 Project Title AI
13 pages
Artificial Intelligence Lecture Notes
No ratings yet
Artificial Intelligence Lecture Notes
3 pages
Artificial Intelligence (AI) by CHATGPT
No ratings yet
Artificial Intelligence (AI) by CHATGPT
39 pages
Assid
No ratings yet
Assid
9 pages
Bonafide
No ratings yet
Bonafide
19 pages
GNAPIKA INTERNSHIP REPORT-final
No ratings yet
GNAPIKA INTERNSHIP REPORT-final
37 pages
AI's Impact on Computer Science
No ratings yet
AI's Impact on Computer Science
12 pages
AI Project Report
No ratings yet
AI Project Report
31 pages
Ganesh Behera
No ratings yet
Ganesh Behera
31 pages
Rojalin Nayak Internship Report
No ratings yet
Rojalin Nayak Internship Report
32 pages
Project Report On The Impact of Ai On Employment
No ratings yet
Project Report On The Impact of Ai On Employment
24 pages
Ishan GS Project File AI
No ratings yet
Ishan GS Project File AI
25 pages
Group 3 - Report On AI - EBM
No ratings yet
Group 3 - Report On AI - EBM
57 pages
INTERNSHIP REPORT Raviteja
No ratings yet
INTERNSHIP REPORT Raviteja
12 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
26 pages
Converted Text
No ratings yet
Converted Text
2 pages
+-An Internship Report On: Chadalawada Ramanamma Engineering College
No ratings yet
+-An Internship Report On: Chadalawada Ramanamma Engineering College
31 pages
Saran 2
No ratings yet
Saran 2
11 pages
Sabari IT
No ratings yet
Sabari IT
11 pages
10th-12th - Upto 6 Sem Above 80
No ratings yet
10th-12th - Upto 6 Sem Above 80
5 pages
Saran 2
No ratings yet
Saran 2
11 pages
E & AI - UNIT 2 Notes
No ratings yet
E & AI - UNIT 2 Notes
22 pages
E & AI - UNIT I Notes-1
No ratings yet
E & AI - UNIT I Notes-1
28 pages
Mobile Payments in Kenyan Schools
No ratings yet
Mobile Payments in Kenyan Schools
7 pages
University of Guyana: CSE 1100 Tutorial Worksheet 1 History of Computer
No ratings yet
University of Guyana: CSE 1100 Tutorial Worksheet 1 History of Computer
4 pages
Furniture Making Level 1 (CVQ)
No ratings yet
Furniture Making Level 1 (CVQ)
215 pages
RUP and FDD
No ratings yet
RUP and FDD
36 pages
Data Engineering Salary Insights
No ratings yet
Data Engineering Salary Insights
3 pages
Grade 9 Information and Communication Technology 1st Term Test 2023
No ratings yet
Grade 9 Information and Communication Technology 1st Term Test 2023
3 pages
Formats For The Geoid Models: Content
No ratings yet
Formats For The Geoid Models: Content
7 pages
Knowledge Base Article: Ovationutils Excel Add-In and Installation Instructions
No ratings yet
Knowledge Base Article: Ovationutils Excel Add-In and Installation Instructions
14 pages
Data Reshaping and Pivoting
No ratings yet
Data Reshaping and Pivoting
4 pages
Unreal Engine 4-Beginners Crash Course v1 PDF
89% (9)
Unreal Engine 4-Beginners Crash Course v1 PDF
152 pages
Network Command Utility
No ratings yet
Network Command Utility
14 pages
LFK95 Leaflet
No ratings yet
LFK95 Leaflet
4 pages
Dampak Game Online Terhadap Perilaku Siswa Di Lingkungan SMA Negeri 1 Bayang
No ratings yet
Dampak Game Online Terhadap Perilaku Siswa Di Lingkungan SMA Negeri 1 Bayang
10 pages
National Exit Exam Term 4
No ratings yet
National Exit Exam Term 4
3 pages
Zelio Logic SR2CBL08
No ratings yet
Zelio Logic SR2CBL08
1 page
CHAPTER ONE - E-Commerce
No ratings yet
CHAPTER ONE - E-Commerce
13 pages
Hypermesh Student Guide 211-233 PDF
No ratings yet
Hypermesh Student Guide 211-233 PDF
23 pages
AI-based Soundscape Analysis
No ratings yet
AI-based Soundscape Analysis
14 pages
HTML5 ELearning Kit For Dummies - (HTML5 Elearning Kit For Dummies®)
No ratings yet
HTML5 ELearning Kit For Dummies - (HTML5 Elearning Kit For Dummies®)
6 pages
Info 600 Assignment 1 V 2
No ratings yet
Info 600 Assignment 1 V 2
5 pages
CAD Course Overview & Commands
No ratings yet
CAD Course Overview & Commands
26 pages
05 MIS Development Process
100% (1)
05 MIS Development Process
29 pages
Welcome To Grimsbury Free Players Guide v3
No ratings yet
Welcome To Grimsbury Free Players Guide v3
72 pages
Chapter 1 - Introduction To Datbases
No ratings yet
Chapter 1 - Introduction To Datbases
26 pages
IRIS Integration User Guide
No ratings yet
IRIS Integration User Guide
18 pages
Cryptography Basics and Types Explained
No ratings yet
Cryptography Basics and Types Explained
2 pages
Hacker Rank
No ratings yet
Hacker Rank
44 pages
Z168 - WKGMA1B1-3 - 20170117 - 1228 - 32K Exist - NFC - Dual Sim
No ratings yet
Z168 - WKGMA1B1-3 - 20170117 - 1228 - 32K Exist - NFC - Dual Sim
23 pages
Microservices Development
No ratings yet
Microservices Development
9 pages
Legal, Ethical, and Professional
No ratings yet
Legal, Ethical, and Professional
18 pages