0% found this document useful (0 votes)
22 views43 pages

Miniproject

AURA (Academic Utility and Resource Assistant) is a React.js-based academic management platform designed to streamline workflows for students and faculty by integrating features such as attendance tracking, timetable management, and assignment assistance. It includes an AI-powered assistant, real-time notifications, and a collaborative Smart Board, enhancing academic efficiency and reducing administrative burdens. The project report details the development process, objectives, and limitations of AURA as part of the requirements for a Bachelor of Technology degree in Computer Science and Engineering.

Uploaded by

devakim2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views43 pages

Miniproject

AURA (Academic Utility and Resource Assistant) is a React.js-based academic management platform designed to streamline workflows for students and faculty by integrating features such as attendance tracking, timetable management, and assignment assistance. It includes an AI-powered assistant, real-time notifications, and a collaborative Smart Board, enhancing academic efficiency and reducing administrative burdens. The project report details the development process, objectives, and limitations of AURA as part of the requirements for a Bachelor of Technology degree in Computer Science and Engineering.

Uploaded by

devakim2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

AURA: Academic Utility and Resource Assistant

A Project Report
Submitted to the APJ Abdul Kalam Technological University
in partial fulfillment of requirements for the award of degree

Bachelor of Technology
in
Computer Science and Engineering (Artificial Intelligence and Machine
Learning)
by
Adithyan S Pillai (SCT22AM007)
Adrija A (SCT22AM008)
Gowri P N (SCT22AM033)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


Sree Chitra Thirunal College of Engineering
Trivandrum, Kerala
October 2023
DEPT. OF Computer Science and Engineering
Sree Chitra Thirunal College of Engineering Trivandrum
2023

CERTIFICATE

This is to certify that the report entitled AURA: Academic Utility and Resource
Assistant submitted by Adithyan S Pillai (SCT22AM007), Adrija A (SCT22AM008) & Gowri
P N (SCT22AM033) to the APJ Abdul Kalam Technological University in partial fulfillment of
the B.Tech. degree in Computer Science and Engineering is a bonafide record of the project
work carried out by them under our guidance and supervision. This report in any form has not
been submitted to any other University or Institute for any purpose.

Prof. Sreepriya S L Prof. Syama R


(Project Guide) (Project Coordinator)
Assistant Professor Assistant Professor
Dept.of CSE Dept.of CSE
SCT College of Engineering SCT College of Engineering
Trivandrum Trivandrum

Dr. HOD Name


Assistant Professor and
Head of Department of CSE
SCT College of Engineering
Trivandrum
DECLARATION

We hereby declare that the project report AURA: Academic Utility and Resource Assistant,
submitted for partial fulfillment of the requirements for the award of degree of Bachelor of Tech-
nology of the APJ Abdul Kalam Technological University, Kerala is a bonafide work done by us
under supervision of Prof. Sreepriya S L.

This submission represents our ideas in our own words and where ideas or words of others
have been included, we have adequately and accurately cited and referenced the original sources.

We also declare that we have adhered to ethics of academic honesty and integrity and have not
misrepresented or fabricated any data or idea or fact or source in our submission. We understand
that any violation of the above will be a cause for disciplinary action by the institute and/or the
University and can also evoke penal action from the sources which have thus not been properly
cited or from whom proper permission has not been obtained. This report has not been previously
formed the basis for the award of any degree, diploma or similar title of any other University.

Adithyan S Pillai

TRIVANDRUM Adrija A

09-05-2024 Gowri P N
Abstract

AURA (Academic Utility and Resource Assistant) is a React.js-based academic management


platform designed to streamline student and faculty workflows. It integrates essential features
such as attendance tracking, timetable management, assignment assistance, notes organization,
and activity points tracking. AURA includes an AI-powered assistant for academic queries, auto-
mated certificate classification, and a GitHub-integrated notes repository for seamless document
access. Its real-time notifications system ensures users stay updated on deadlines, attendance,
and assignments. The Smart Board fosters collaboration, while a collapsible sidebar and an
animated splash screen enhance user experience. With an intuitive interface and automation-
driven functionalities, AURA reduces administrative burdens and enhances academic efficiency.
By consolidating multiple tools into one seamless system, it empowers students and faculty to
manage their academic tasks effortlessly.

i
Acknowledgement

We take this opportunity to express our deepest sense of gratitude and sincere thanks to
everyone who helped us to complete this work successfully. We express our sincere thanks to Dr.
HOD Name, Head of Department, COMPUTER SCIENCE & ENGINEERING, SREE CHITRA
THIRUNAL COLLEGE OF ENGINEERING for providing us with all the necessary facilities and
support.We gratefully acknowledge the contributions of Prof. Syama R, our coordinator, whose
support and assistance were instrumental in the completion of this project.

We would like to place on record our sincere gratitude to our project guide Prof. Sreepriya S L,
Assistant Professor, COMPUTER SCIENCE & ENGINEERING, SREE CHITRA THIRUNAL
COLLEGE OF ENGINEERING for the guidance and mentorship throughout this work.

Adithyan S Pillai
Adrija A
Gowri P N

ii
Contents

Abstract i

Acknowledgement ii

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 AURA - Academic Utility and Resource Assistant . . . . . . . . . . . . . . . . . 1
1.3 Intelligent Document Management . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 AI-Powered Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Real-Time Notifications and User Engagement . . . . . . . . . . . . . . . . . . . 2
1.6 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.7 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Review 4
2.1 AI-Driven Academic Achievement Tracking . . . . . . . . . . . . . . . . . . . . 4
2.2 Learning Management System Utilization for Deadlines . . . . . . . . . . . . . . 5
2.3 AI-Powered Assessment Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Real-Time Data Integration and Analytics . . . . . . . . . . . . . . . . . . . . . 6

3 Methodology 8
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 AI Model Selection and Training . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Implementation and Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Evaluation and Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Implementation 15
4.1 Development Tools and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1 Faiss (v1.6.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

iii
4.1.2 Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.3 Pandas (v1.1.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.4 Streamlit (v0.62.0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.5 Sentence Transformers (v0.3.8) . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.6 Transformers (v3.3.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.7 NumPy (v1.19.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.8 Torch (v1.8.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.9 Folium (v0.2.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.10 Setuptools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.11 find namespace packages . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Data Collection and Cleaning . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Vectorizing the Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3 Building an Index with Faiss . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.4 Searching with User Queries . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Results 31

6 Conclusion 35

References 36

iv
Chapter 1

Introduction

1.1 Motivation
In academic environments, students and faculty often struggle with managing multiple academic
resources, deadlines, and performance tracking across various platforms. Traditional academic
management systems are often fragmented, requiring users to rely on multiple tools for assignment
tracking, attendance monitoring, and resource sharing. This inefficiency leads to disorganization,
missed deadlines, and a lack of real-time insights into academic progress. Consequently, there
is a growing need for an integrated platform that streamlines academic utilities and enhances the
overall learning experience.

1.2 AURA - Academic Utility and Resource Assistant


AURA is a comprehensive academic management system designed to centralize and optimize
academic activities for students and faculty. Built using React.js, AURA integrates essential
features such as timetable management, assignment tracking, attendance monitoring, notes orga-
nization, and activity points management. It provides an AI-driven academic assistant, real-time
notifications, and a Smart Board for collaborative learning, all within a seamless and intuitive
interface. By combining these functionalities into a single platform, AURA eliminates the need
for multiple disjointed tools, making academic management more efficient and effective.

1
1.3 Intelligent Document Management
AURA incorporates automated certificate classification and a structured repository for notes and
documents. Users can upload certificates, which are then classified and assigned activity points
based on predefined academic criteria. Additionally, AURA integrates with GitHub to provide
a centralized repository for storing and sharing academic materials, ensuring easy access and
organization of important documents.

1.4 AI-Powered Assistance


AURA leverages artificial intelligence to enhance user experience and academic productivity. It
features an AI-driven figure generation tool to assist with report creation, an intelligent query
system to answer academic-related questions, and automated grading support for assignments.
The AI-powered assistant ensures users receive relevant and timely academic insights, improving
efficiency and reducing manual workload.

1.5 Real-Time Notifications and User Engagement


AURA’s real-time notification system keeps students and faculty updated on assignment deadlines,
attendance records, internal marks, and academic events. The interactive dashboard provides
instant insights, enabling users to stay informed and engaged with their academic progress. Addi-
tionally, the collapsible sidebar and animated splash screen enhance user experience by offering a
modern and interactive interface.

1.6 Objectives
The primary objective of AURA is to develop an integrated academic management platform that
enhances productivity, organization, and academic performance. Key goals include:

• Centralized Academic Management: Providing a unified platform for students and faculty
to manage assignments, attendance, notes, and deadlines efficiently.

• Automation and AI Integration: Leveraging AI for certificate classification, grade calcula-


tion, and academic query assistance.

2
• Real-Time Academic Insights: Implementing a notification system to ensure users stay
updated with essential academic information.

• User-Friendly Interface: Designing an intuitive and engaging user experience that simplifies
academic management.

• Collaboration and Accessibility: Facilitating document sharing and collaboration through


an integrated GitHub repository and Smart Board.

1.7 Scope and Limitations


Scope: The focus of AURA is to develop a functional prototype that integrates multiple academic
management features into a cohesive system. Through rigorous testing and optimization, AURA
aims to demonstrate the feasibility of an all-in-one academic platform that enhances productivity
and user engagement.
Limitations:

• Domain-Specific Implementation: The initial deployment of AURA may be tailored to


specific academic institutions, requiring further adaptation for broader use cases.

• Scalability Challenges: While AURA is designed for efficiency, scaling to support larger
user bases may necessitate additional optimization and infrastructure enhancements.

• AI Accuracy: The effectiveness of AURA’s AI-driven tools depends on training data and
fine-tuning, which may require ongoing refinement to ensure optimal performance.

3
Chapter 2

Literature Review

In this section, we review recent advancements in academic management systems, AI-driven


learning tools, and data integration techniques that are relevant to our research. We summarize
key methodologies and findings in academic achievement tracking, learning management system
(LMS) utilization, AI-powered assessment tools, and real-time data analytics. This review serves
as a foundation for understanding the theoretical and practical landscape informing our own
contributions.

2.1 AI-Driven Academic Achievement Tracking

M. Savitha, S. Gopika, E. Jayaprakash, and G. Lashman introduced an AI-driven academic achieve-


ment tracker that automates the process of monitoring student performance [1]. The system
integrates machine learning algorithms to analyze student progress, predict future performance,
and provide personalized recommendations for improvement. By leveraging AI, this approach en-
hances the efficiency of academic tracking and assists educators in making data-driven decisions.

The study utilizes machine learning models trained on historical academic data to identify patterns
in student performance. The process involves data preprocessing, where raw student records are
cleaned and structured. Feature selection is applied to extract relevant variables that influence
academic outcomes. Predictive modeling techniques, including regression analysis and neural
networks, are employed to forecast student performance. Additionally, reinforcement learning
strategies are used to optimize recommendations, ensuring that students receive the most effective
interventions based on their learning behavior.

4
While AI-driven tracking improves efficiency, it may face challenges related to data privacy,
algorithmic bias, and the accuracy of predictive models. Errors in prediction could lead to incor-
rect assessments, potentially affecting student outcomes if not properly validated. Additionally,
the reliance on historical data may not fully capture real-time changes in student performance,
necessitating periodic recalibration of the models.

2.2 Learning Management System Utilization for Deadlines

S. Lopez, A. Pham, J. Hsu, and P. Halpin explored how students bypass traditional syllabus
structures and utilize alternate LMS locations to track assignment deadlines [2]. The research
reveals that students often rely on informal sources such as discussion forums, peer interactions,
and notification-based reminders instead of the official syllabus.

The study employs surveys and system analytics to track student interactions with LMS plat-
forms. Researchers collect data on login frequency, navigation patterns, and user engagement
with assignment-related content. Statistical analysis is conducted to identify common behaviors
among students who rely on alternative tracking methods. The findings are used to develop recom-
mendations for optimizing LMS interfaces and ensuring deadline information is more prominently
displayed.

A major limitation is the inconsistency in student behavior across different LMS platforms, mak-
ing it challenging to standardize solutions. Additionally, reliance on informal tracking methods
could lead to misinformation if students do not verify deadlines with official sources. The study
also highlights the cognitive overload students experience due to fragmented information sources,
which can negatively impact their ability to manage deadlines effectively.

2.3 AI-Powered Assessment Tools

N. B. B. Booc, K. Sobremisana, A. Ybañez, R. Tolosa, S. M. Ladroma, and K. M. Caparoso ex-


amined the use of AI-powered calculator applications in mathematics summative assessments [3].
The research emphasizes how AI-based tools can support students in solving complex problems,

5
enhancing their understanding of mathematical concepts.

The study analyzes AI-powered calculators that utilize symbolic computation and machine learn-
ing to assist students with problem-solving. The tools integrate natural language processing (NLP)
to interpret mathematical queries and generate step-by-step solutions. Data is collected on student
interactions, error rates, and time spent per problem. Performance metrics are analyzed to evaluate
how AI assistance influences student learning and retention.

One major concern is the potential for academic dishonesty, as students may become overly reliant
on AI tools rather than developing problem-solving skills. Additionally, the effectiveness of AI-
powered assessment tools varies depending on the subject complexity and the accuracy of AI
interpretations. Ethical concerns arise regarding the extent to which AI should be integrated into
assessments, as excessive dependence on automated tools could diminish critical thinking abilities.

2.4 Real-Time Data Integration and Analytics

A. Ambasht explored real-time data integration and analytics as a means to empower data-driven
decision-making in academic settings [4]. The study discusses how integrating real-time analytics
into educational platforms enhances decision-making processes for both students and educators.
By leveraging automated data pipelines, institutions can improve academic tracking, performance
analysis, and personalized recommendations.

This research focuses on the integration of cloud-based analytics platforms that collect and process
student performance data in real-time. The system employs streaming data frameworks to capture
continuous academic activity, which is then processed through machine learning algorithms for
trend analysis. Advanced visualization techniques are used to present insights to educators,
enabling proactive interventions for student support.

Challenges include the high computational costs associated with real-time data processing and
concerns over data security. Ensuring the accuracy and reliability of real-time analytics remains a
critical issue in large-scale implementations. Additionally, real-time data systems must handle vast

6
volumes of information while maintaining system performance, requiring robust infrastructure
and optimization strategies.

7
Chapter 3

Methodology

This chapter presents the methodology used in developing AURA (Academic Utility and Re-
source Assistant). The system is designed to improve academic management through assignment
tracking, deadline reminders, internal marks monitoring, AI-driven figure generation, real-time
notifications, activity tracking, and grade calculation. This chapter details the system architecture,
data processing pipeline, AI model training, implementation, and evaluation methods.

3.1 System Architecture


AURA follows a modular, web-based architecture to facilitate seamless interaction between stu-
dents and faculty while ensuring scalability and efficiency. The system is structured as follows:

Frontend

The frontend of AURA is developed using React.js, providing an intuitive and responsive user
interface. It features dashboards, progress visualizations, and real-time notifications to enhance
the user experience. The use of modular components ensures that new features can be added
without disrupting the existing functionality.

Backend

The backend is implemented using Node.js and Express.js, which handle business logic, data
processing, and API endpoints. The backend ensures efficient management of academic records
and interactions. Secure API communication protocols are integrated to prevent unauthorized data
access.

8
Database

AURA utilizes GitHub as its database for storing structured academic data, including student
records, assignment details, and activity logs. The system employs GitHub repositories to manage
data efficiently, leveraging version control for tracking changes and updates. This approach
ensures data integrity, accessibility, and easy synchronization across multiple devices and users.

AI Modules

AI components in AURA enhance automation by enabling features such as certificate classifica-


tion, grade prediction, and academic data visualization. These modules leverage deep learning
frameworks such as TensorFlow and natural language processing (NLP) models to improve accu-
racy and efficiency.

Security and Authentication

To ensure data privacy and integrity, AURA enforces security measures such as role-based access
control (RBAC), encrypted data storage, and multi-factor authentication (MFA). Regular security
audits are performed to identify vulnerabilities and strengthen protection mechanisms.

3.2 Data Processing


Efficient data processing is a critical component of AURA, ensuring accuracy and reliability in
academic management. The data pipeline consists of the following stages:

Data Collection

Users upload assignments, certificates, and academic records through the frontend interface. These
inputs are securely transmitted and stored in the backend database, ensuring data integrity and
security. File validation checks are implemented to prevent corrupted or malicious uploads.

Preprocessing

Raw academic data undergoes preprocessing, including cleaning, structuring, and categorization.
AI-based techniques remove inconsistencies, standardize formats, and classify data according to
predefined academic categories. This process improves the accuracy and efficiency of downstream
processing.

9
Feature Extraction

Key academic parameters, such as deadlines, grades, and activity points, are extracted using NLP-
based parsing and classification models. These extracted features are used to generate insights for
students and faculty, enhancing decision-making and academic performance tracking.

Storage and Retrieval

The structured data is indexed for rapid retrieval, reducing latency in queries and improving per-
formance. Optimized search algorithms allow users to quickly access relevant academic records,
making the system highly efficient.

Data Synchronization

AURA ensures real-time synchronization of academic records across multiple users and devices.
Automated update mechanisms periodically refresh all modules, ensuring that users always have
access to the latest information.

3.3 AI Model Selection and Training


AURA integrates AI-driven automation to enhance various academic functionalities. The AI
models used in the system are carefully selected and trained for optimal performance.

Certificate Classification

AI-driven certificate classification employs NLP-based deep learning models to analyze and cat-
egorize certificates based on predefined university criteria. Transformer-based architectures such
as BERT are used to ensure high accuracy in classification tasks.

Activity Points Allocation

AURA uses AI to automate the allocation of activity points based on uploaded certificates. The
system processes certificate data, matches it against predefined university guidelines, and assigns
corresponding activity points without requiring manual intervention.

10
Each model is trained using academic datasets, validated through multiple test cases, and opti-
mized for accuracy, efficiency, and minimal computational overhead.

3.4 Implementation and Features


AURA’s implementation integrates several functional modules that enhance the academic experi-
ence for students and faculty. The key features include:

Activity Points Tracking

The Activity Points Tracker predicts certificate points using a keyword-based classification sys-
tem.Uploaded certificates are categorized into predefined types, each assigned weighted points.A
React-based UI allows seamless file uploads, validation, and point tracking. User profiles dynam-
ically update with awarded points, ensuring an intuitive and automated recognition system for
achievements.

Assignment Assistant

The Assignment Assistant utilizes React.js for a dynamic UI and Cohere’s AI API for intelligent
academic guidance.User inputs (topic, subject, code, pages) are processed to generate structured
recommendations.Axios handles API requests, and state management ensures seamless interac-
tion.The UI integrates Lucide icons for enhanced usability.

Attendance

The Attendance Tracker utilizes React.js for interactive UI, managing subject-wise and daily at-
tendance.useState controls visibility, while useMemo extracts subject mappings.Data is processed
to compute attendance percentages and required classes.Conditional rendering enhances usability,
and Lucide icons improve accessibility, ensuring a seamless user experience.

Chatbot

The AURA Chatbot integrates React.js for UI, Firebase for data management, and Gemini AI
for intelligent responses.It processes user queries using local logic for academic data (timetable,
attendance) and an AI API for broader queries.The chatbot employs state management, API
handling, and UI animations for an interactive experience.

11
Dashboard

The Dashboard Component in AURA dynamically retrieves and displays the user’s daily schedule
using React.js.It processes timetable data, formats subject names, and renders an interactive UI
with Tailwind CSS.State management and conditional rendering ensure accurate, real-time aca-
demic updates, enhancing user experience through structured data visualization and accessibility.

College Events

The Events Component in AURA dynamically fetches, parses, and displays college events using
React.js, Axios, and PapaParse.It retrieves CSV data, filters, and formats events while handling
errors and loading states.Tailwind CSS enhances UI responsiveness, ensuring seamless event
management with real-time updates and interactive registration links.

Notes

The Notes Component in AURA fetches and displays academic notes from a GitHub repository
using React.js.It enables dynamic browsing, searching, and filtering of subjects while allowing file
downloads.Efficient state management, API handling, and intuitive UI with Tailwind CSS ensure
seamless user experience and optimized content retrieval.

Smartboard

The SmartBoard Component in AURA provides quick access to an interactive online white-
board.Built using React.js, it integrates seamless navigation by opening the SmartBoard in a new
tab upon user interaction.The minimalistic UI, enhanced with Tailwind CSS, ensures an intuitive
and accessible digital collaboration experience.

Timetable

The TimeTable Component in AURA dynamically displays a structured weekly schedule using
React.js.It features collapsible sections for each day, enabling intuitive navigation.The integration
of Tailwind CSS ensures a responsive, visually appealing UI.State management via React hooks
allows seamless interaction, enhancing accessibility and user experience.

12
Sidebar

The Sidebar Component in AURA enhances navigation using React.js and Tailwind CSS.It dy-
namically toggles between expanded and collapsed states for an adaptive UI.Interactive elements,
including animated color-changing logos and intuitive menu selection, improve user experience.React
hooks efficiently manage state, ensuring smooth and responsive interactions.

Splashscreen

The Splash Screen in AURA uses React.js to create a smooth loading animation.It dynamically
controls opacity and progress using state and requestAnimationFrame(), ensuring a seamless
transition.Tailwind CSS enhances styling, while timed animations progressively reveal the AURA
logo and tagline, creating an engaging and professional introduction.

3.5 Evaluation and Refinement


To ensure AURA’s efficiency and reliability, a rigorous evaluation and refinement process is
conducted. The evaluation includes:

User Testing

Students participate in structured testing sessions to assess usability, interface responsiveness,


and feature effectiveness. Test cases simulate real-world academic scenarios to evaluate system
performance.

Performance Metrics

System performance is measured based on key metrics, including response time, AI model ac-
curacy, and database query efficiency. Benchmarking ensures that the system meets performance
expectations.

Scalability Testing

AURA undergoes load testing to verify its ability to handle a growing number of users and large
volumes of academic data. Stress tests simulate peak usage conditions to ensure reliability and
stability.

13
Security Assessments

Periodic security audits are performed to detect vulnerabilities and implement the latest security
protocols. Encryption, authentication mechanisms, and access control policies are continuously
updated to safeguard user data.

Feedback and Iteration

User feedback is continuously collected and analyzed to refine system interactions and optimize
features. Iterative updates ensure that AURA evolves to meet the dynamic needs of students and
faculty.

Through iterative refinements, AURA continuously evolves into a more advanced and efficient
academic management platform, ensuring an optimized experience for students and faculty alike.

14
Chapter 4

Implementation

4.1 Development Tools and Libraries

4.1.1 Faiss (v1.6.1)

Faiss is a library developed by Facebook AI Research that provides highly efficient algorithms for
similarity search and clustering of dense vectors. The CPU version (faiss-cpu) is used for indexing
and querying the dense embeddings generated by Sentence Transformers.

Faiss enhances the capabilities of the Vector-based Search Engine by providing efficient algo-
rithms for similarity search and clustering of dense vectors. Its integration with Sentence Trans-
formers allows for the creation of a powerful and scalable search system capable of delivering
accurate and relevant search . Faiss is utilized because of its:

Efficient Similarity Search: Faiss offers highly efficient algorithms for similarity search, allowing
the search engine to quickly retrieve documents that are similar to a given query. This is partic-
ularly important in applications where real-time or near-real-time search responses are required,
such as web search engines or recommendation systems.

Clustering of Dense Vectors: In addition to similarity search, Faiss also provides algorithms
for clustering dense vectors. Clustering is useful for organizing large collections of documents
into groups based on their semantic similarities. This can facilitate tasks such as topic modeling,
document categorization, or recommendation system personalization.

Indexing and Querying: Faiss provides functionalities for indexing and querying dense embed-
dings efficiently. The CPU version of Faiss, known as faiss-cpu, is utilized for these operations in

15
scenarios where GPU resources are not available or feasible. This ensures that the search engine
can be deployed on a wide range of hardware configurations, including servers without GPU
support or cloud platforms with limited GPU availability.

Integration with Sentence Transformers: Faiss seamlessly integrates with Sentence Transform-
ers, a library used for generating dense embeddings for sentences and text documents. After
generating embeddings for documents using Sentence Transformers, Faiss is employed to index
and query these embeddings for similarity search. This integration enables the search engine to
leverage the semantic information encoded in the dense embeddings to retrieve relevant documents
efficiently.

4.1.2 Pickle

Pickle is a standard Python library used for serializing and deserializing Python objects. It is
used here to save and load data structures such as Faiss indexes or trained Sentence Transformer
models.Here’s how Pickle is utilized :

Serialization: Pickle facilitates the conversion of Python objects into a byte stream, which can
be stored in a file or transmitted over a network. When a Faiss index or a trained Sentence
Transformer model is created or modified during the development process, Pickle is employed
to serialize these objects, preserving their state and structure.

Storage: Serialized objects can be stored in files on disk or databases for long-term storage. In the
case of the Vector-based Search Engine, Pickle is utilized to save the Faiss indexes, which contain
the dense embeddings of documents, and the trained Sentence Transformer models, which encode
semantic information into fixed-length vectors.

Loading: Pickle also enables the deserialization of serialized objects, allowing them to be recon-
structed into their original Python objects. During the execution of the search engine, Pickle is
utilized to load the previously serialized Faiss indexes and trained Sentence Transformer models
into memory, restoring them to their original state.

Interoperability: Pickle ensures interoperability between different Python environments and


versions. Serialized objects can be shared between different instances of the search engine running
on different machines or platforms, enabling seamless deployment and scalability.

16
4.1.3 Pandas (v1.1.2)

Pandas is a powerful library for data manipulation and analysis in Python. It is used for handling
structured data, such as loading datasets of academic articles, preprocessing, and data explo-
ration.Here’s how Pandas is utilized ,

Structured Data Handling: Pandas excels in handling structured data, providing versatile data
structures like DataFrame that allow for easy manipulation and analysis of tabular data. In the
context of the search engine, Pandas is used to load datasets of academic articles, which typically
come in structured formats such as CSV or Excel files.

Data Preprocessing: Before generating embeddings or indexing documents, preprocessing of the


data is often required to clean and prepare it for further analysis. Pandas offers powerful function-
alities for data cleaning, transformation, and manipulation, allowing for tasks such as removing
duplicates, handling missing values, and filtering out irrelevant information. This ensures that
the input data is clean, standardized, and ready for processing by other components of the search
engine.

Data Exploration: Understanding the characteristics of the dataset is crucial for developing an
effective search engine. Pandas provides extensive capabilities for data exploration, including
descriptive statistics, data visualization, and summarization. These functionalities enable devel-
opers to gain insights into the dataset, identify patterns, and make informed decisions about the
preprocessing and modeling steps.

Integration with Other Libraries: Pandas seamlessly integrates with other Python libraries
commonly used in data science and machine learning, such as NumPy, Scikit-learn, and Mat-
plotlib. This interoperability allows for smooth data exchange and collaboration between different
components of the search engine, facilitating the development of a cohesive and efficient solution.

4.1.4 Streamlit (v0.62.0)

Streamlit is a Python library used for building interactive web applications for machine learning
and data science. It is used here to create the user interface for the vector-based search engine,
allowing users to interact with the system through a web browser.

17
Streamlit plays a crucial role in the development of the Vector-based Search Engine with Sentence
Transformers and Faiss by enabling the creation of an interactive and user-friendly interface. Its
simplicity, versatility, and seamless integration with Python libraries make it an ideal choice for
building web applications for machine learning and data science tasks.Here’s how Streamlit is
leveraged :

Interactive Web Applications: Streamlit enables the creation of interactive web applications
directly from Python scripts, without the need for HTML, CSS, or JavaScript. This simplifies the
development process and allows for rapid prototyping of user interfaces for machine learning and
data science applications.

User Interface Development: Streamlit provides a simple and intuitive API for building user
interfaces using familiar Python syntax. Developers can easily create various UI components such
as buttons, sliders, text inputs, and plots to facilitate user interaction with the search engine.

Real-time Updates: Streamlit offers automatic reactive updates, allowing the user interface to
dynamically update in response to user inputs or changes in the underlying data. This enables real-
time exploration and visualization of search results, enhancing the user experience and providing
immediate feedback.

Integration with Data Processing: Streamlit seamlessly integrates with data processing libraries
such as Pandas, allowing developers to incorporate data analysis and visualization directly into
the user interface. This facilitates data exploration and interpretation within the search engine
interface, empowering users to gain insights from the search results.

Deployment: Streamlit simplifies the deployment of web applications, providing built-in support
for deploying applications to various platforms such as Streamlit Sharing, Heroku, or Docker
containers. This ensures that the vector-based search engine can be easily deployed and accessed
by users via a web browser, without the need for complex setup or configuration.

4.1.5 Sentence Transformers (v0.3.8)

Sentence Transformers is a Python library for generating dense embeddings for sentences and
text documents. It provides access to pre-trained Transformer models for encoding text into fixed-
length vectors. In this case, Sentence Transformers is used to generate embeddings for documents,

18
which are then indexed and queried using Faiss.

Sentence Transformers serves as a critical component in the development of the Vector-based


Search Engine, enabling the generation of dense embeddings that encode the semantic meaning of
text documents. Its integration with Faiss enhances the search engine’s ability to retrieve relevant
documents based on semantic similarity, providing users with an effective and intuitive search
experience.Here’s how Sentence Transformers is utilized in this context:

Dense Embeddings Generation: Sentence Transformers provides a straightforward method for


generating dense embeddings, also known as fixed-length vectors, for text documents. These
embeddings capture semantic information about the content of the documents, enabling efficient
comparison and similarity calculation.

Pre-trained Transformer Models: The library offers access to a variety of pre-trained Trans-
former models, such as BERT, RoBERTa, and DistilBERT, which are renowned for their effective-
ness in natural language processing tasks. These models have been trained on large-scale corpora
of text data and can encode the semantic meaning of sentences into dense vector representations.

Semantic Information Encoding: By utilizing pre-trained Transformer models, Sentence Trans-


formers is capable of encoding rich semantic information into fixed-length vectors. This enables
the search engine to capture nuanced relationships between documents and accurately measure
their similarity based on semantic content rather than just keyword matching.

Efficient Integration with Faiss: After generating embeddings for documents using Sentence
Transformers, Faiss is leveraged to index and query these embeddings efficiently. Faiss provides
highly efficient algorithms for similarity search, allowing the search engine to retrieve relevant
documents quickly and accurately based on their semantic similarity.

Scalability and Performance: Sentence Transformers offers scalable solutions for generating
embeddings, allowing for the processing of large volumes of text data efficiently. This ensures that
the search engine can handle diverse datasets and deliver high-performance search capabilities to
users.

19
4.1.6 Transformers (v3.3.1)

Transformers is a Python library developed by Hugging Face that provides access to a wide range
of state-of-the-art pre-trained Transformer models for natural language processing tasks. It is
used alongside Sentence Transformers for accessing pre-trained models like BERT, RoBERTa,
and DistilBERT.Here’s how the Transformers library is utilized in this project:

Access to Pre-trained Transformer Models: Transformers library provides a vast repository of


pre-trained Transformer models, including popular ones like BERT, RoBERTa, and DistilBERT.
These models have been trained on extensive corpora of text data and have demonstrated superior
performance across various NLP tasks, including semantic similarity assessment and text classifi-
cation.

Semantic Representation Learning: By leveraging pre-trained Transformer models, the library


enables the search engine to learn rich semantic representations of text documents. These repre-
sentations capture intricate semantic relationships between words and sentences, facilitating the
accurate encoding of document semantics into fixed-length vectors.

Fine-tuning for Specific Tasks: Transformers library offers capabilities for fine-tuning pre-trained
models on domain-specific datasets. This allows developers to adapt the pre-trained models to the
specific requirements of the search engine, enhancing their ability to generate embeddings that are
tailored to the semantics of the document collection.

Integration with Sentence Transformers: Transformers library seamlessly integrates with Sen-
tence Transformers, enabling the search engine to access and utilize pre-trained Transformer
models for generating embeddings. This integration ensures that the search engine can leverage
the latest advancements in NLP research to enhance the quality and effectiveness of its semantic
representations.

Versatility and Flexibility: Transformers library provides a flexible and versatile API that allows
developers to easily load and utilize pre-trained models for various NLP tasks. This flexibility
enables the search engine to experiment with different models and configurations to find the most
suitable approach for generating embeddings and performing similarity search.

20
4.1.7 NumPy (v1.19.2)

NumPy is a fundamental package for scientific computing in Python. It is used for handling nu-
merical operations efficiently, particularly during the vectorization and indexing processes.NumPy’s
role here is:

Efficient Numerical Operations: NumPy provides a powerful array object that allows for effi-
cient storage and manipulation of large datasets. It offers a wide range of mathematical functions
and operations optimized for performance, making it ideal for handling numerical computations
required during various stages of building the search engine.

Vectorization: NumPy’s vectorized operations enable the application of mathematical operations


on entire arrays of data without the need for explicit looping. This significantly improves the
efficiency and speed of computations, especially when dealing with large volumes of text data or
dense embeddings generated by Sentence Transformers.

Indexing and Data Manipulation: NumPy’s array indexing capabilities are crucial for accessing
and manipulating data stored in arrays. It provides intuitive syntax for slicing, indexing, and
reshaping arrays, allowing for seamless data manipulation tasks such as filtering documents,
extracting embeddings, or performing similarity calculations.

Interoperability with Faiss: NumPy arrays serve as the primary data structure for storing dense
embeddings generated by Sentence Transformers and indexed by Faiss. Faiss seamlessly inte-
grates with NumPy arrays, enabling efficient indexing and querying of dense vector representa-
tions for similarity search operations.

Performance Optimization: NumPy’s underlying implementation is highly optimized and writ-


ten in C, ensuring fast execution of numerical operations. This performance optimization is
essential for maintaining the responsiveness and scalability of the search engine, particularly when
handling large-scale datasets and complex computations.

4.1.8 Torch (v1.8.1)

Torch is a machine learning library primarily used for deep learning tasks. It serves as the
backbone for libraries like Transformers and Sentence Transformers, providing GPU acceleration
and tensor computation capabilities. Here’s how Torch is leveraged:

21
Deep Learning Framework: Torch is a powerful deep learning framework that provides essen-
tial tools and functionalities for building and training neural network models. It offers a wide
range of modules and utilities for constructing various types of neural architectures, including
convolutional networks, recurrent networks, and transformer-based models.

GPU Acceleration: Torch seamlessly integrates with GPU hardware, allowing for accelerated
computation of tensor operations on CUDA-enabled devices. This GPU acceleration significantly
speeds up the training and inference processes, especially when dealing with large-scale datasets
and complex models.

Tensor Computation: Torch provides efficient tensor computation capabilities, enabling the ma-
nipulation and transformation of multi-dimensional arrays commonly used in deep learning tasks.
These tensor operations are fundamental for tasks such as data preprocessing, model training, and
inference, providing a versatile and efficient foundation for building machine learning pipelines.

Integration with Transformers and Sentence Transformers: Torch serves as the backbone
for libraries like Transformers and Sentence Transformers, which rely on its tensor computation
capabilities and GPU acceleration to implement advanced natural language processing models.
These libraries utilize Torch’s deep learning primitives to construct and train transformer-based
architectures for tasks such as text encoding, semantic representation learning, and similarity
search.

Scalability and Performance: Torch’s efficient implementation and GPU support make it well-
suited for handling large-scale datasets and complex models. Its scalability and performance
optimizations ensure that the Vector-based Search Engine can efficiently process and analyze text
data, delivering high-quality search results with minimal latency.

4.1.9 Folium (v0.2.1)

Folium is a Python library for creating interactive maps. While not directly related to the vector-
based search engine, it might be used for visualization purposes or integrating location-based
features into the search engine interface.Folium may not be directly related to the core function-
ality of the Vector-based Search Engine with Sentence Transformers and Faiss, it can enhance the
search engine’s capabilities by providing visualizations of geographic data, integrating location-
based features, and improving the overall user experience. Its flexibility and ease of use make it a

22
valuable tool for incorporating spatial context into the search engine interface.

Visualization of Geographic Data: Folium enables the creation of interactive maps with cus-
tomizable features such as markers, polygons, and heatmaps. While the search engine primarily
deals with text data, Folium can be employed to visualize geographical information associated
with documents or search results. For instance, if the documents contain location-based metadata,
Folium can be used to plot these locations on an interactive map for visual exploration.

Integration with External Data Sources: Folium allows for the integration of external data
sources such as GeoJSON files or spatial databases. This functionality can be leveraged to
incorporate additional geographic context into the search engine interface. For example, if the
search engine retrieves documents related to specific geographic regions or landmarks, Folium
can be used to display these regions or landmarks on an interactive map alongside the search
results.

Enhanced User Experience: By incorporating interactive maps into the search engine interface,
Folium can enhance the user experience by providing visual context to the search results. Users
can visually explore the spatial distribution of documents or search results on a map, gaining
insights into geographical patterns or correlations that may not be apparent from textual represen-
tations alone.

Location-based Search Features: Folium can also be used to integrate location-based search
features into the search engine interface. For example, users may be able to specify a geographic
area of interest and retrieve documents or search results that are relevant to that location. Folium
can facilitate the visualization of search results within the specified geographic region, enabling
users to explore content based on geographical proximity.

4.1.10 Setuptools

Setuptools is a package development library for Python that facilitates the packaging, distribution,
and installation of Python packages. It is often used in conjunction with the setup.py file to define
package metadata and dependencies.Setuptools is utilized in this context:

Package Management: Setuptools simplifies the management of project dependencies by allow-


ing developers to specify package metadata and dependencies in a setup.py file. This metadata

23
includes information such as the project name, version, author, and dependencies required for
installation.

Distribution: Setuptools enables developers to create distributable packages of their Python


projects, making it easy to share code with others or deploy applications to different environments.
By running commands like python setup.py sdist or python setup.py bdist wheel, developers can
generate source distributions or binary distributions of their project that can be easily distributed
and installed by others.

Installation: Setuptools automates the installation process of Python packages by providing the
setup.py install command. This command installs the package along with its dependencies into the
Python environment, ensuring that the necessary dependencies are resolved and installed correctly.

Versioning: Setuptools allows developers to manage package versions effectively, ensuring con-
sistency and compatibility across different environments. By specifying version constraints in the
setup.py file, developers can ensure that their package is compatible with specific versions of its
dependencies, preventing potential conflicts or compatibility issues during installation.

Integration with Package Indexes: Setuptools seamlessly integrates with Python package in-
dexes such as PyPI (Python Package Index), allowing developers to publish their packages and
make them available for installation by others. By registering their packages on PyPI and using
tools like twine, developers can easily upload their package distributions for others to discover
and install.

4.1.11 find namespace packages

find namespace packages is a function provided by setuptools that facilitates the discovery of
Python packages within a namespace. It is useful when organizing code into hierarchical names-
paces, allowing for cleaner and more modular package structures.Here’s an elaboration on how
”find namespace packages” is relevant to this project:

Hierarchical Namespace Organization: In large-scale software projects like the Vector-based


Search Engine, it’s common to organize code into hierarchical namespaces to achieve a modular
and maintainable structure. Namespace packages allow for logical grouping of related modules

24
and sub-packages, making it easier to navigate and manage the codebase.

Facilitates Modular Package Structures: By using namespace packages, developers can break
down the codebase into smaller, self-contained modules and sub-packages, each focusing on a
specific aspect of functionality. This modular approach promotes code reusability, extensibility,
and maintainability, as different components of the system can be developed and maintained
independently.

Cleaner Code Organization: Namespace packages help maintain a clean and organized codebase
by avoiding naming conflicts and providing clear separation between different components of
the system. This enhances readability and comprehension, as developers can easily locate and
understand the purpose of each module or sub-package within the namespace hierarchy.

Integration with Setuptools: Setuptools provides the ”find namespace packages” function as
part of its API, allowing developers to specify namespace packages within their project’s setup
configuration. This enables Setuptools to correctly identify and include namespace packages
during packaging, distribution, and installation processes, ensuring that the project’s modular
structure is preserved.

4.2 Development Process

4.2.1 Data Collection and Cleaning

1.Data Acquisition and Preprocessing

For this project, we aimed to build a vector-based search engine capable of retrieving relevant
academic articles on misinformation, disinformation, and fake news. To achieve this, we acquired
a real-world dataset of research papers.

2.Data Source and Retrieval

The dataset was compiled by querying the Microsoft Academic Graph (MAG) using Orion, a tool
that facilitates large-scale data access from MAG. This approach allowed us to efficiently gather a
comprehensive collection of relevant academic literature.

25
3.Dataset Description

The retrieved dataset comprised 8,430 articles published between 2010 and 2020. Each article
entry included the following information:

• Abstract: A concise summary of the research presented in the article.

• Title: The article’s main title, providing a high-level overview of the research topic.

• Citations: References made by the article to other scholarly works. (This information might
not be used in the current iteration of the search engine but could be valuable for future
enhancements).

• Publication Year:The year the article was published, which can be helpful for tracking trends
in misinformation research.

• ID: A unique identifier assigned by Microsoft Academic Graph to each article.

4.Data Cleaning

We performed minimal data cleaning to ensure the quality of the dataset for the search engine’s de-
velopment. This involved removing entries lacking abstracts, as abstracts are crucial for capturing
the core content and meaning of the research.
The resulting dataset provided a solid foundation for building the vector-based search engine,
enabling it to process and analyze the semantic content of academic articles related to misinfor-
mation.

4.2.2 Vectorizing the Text Data

Once you have your documents, you need to convert them into a format that a computer can un-
derstand. This process is called vectorization. In this case, you will use a pre-trained model called
Sentence Transformers to encode the text data into numerical vectors. Sentence Transformers are
a type of deep learning model that can learn how to represent the meaning of a sentence as a
vector.

After acquiring and cleaning our dataset of research articles, we embarked on the process of
vectorization. This crucial step transforms the textual content of the abstracts into numerical

26
representations that computers can comprehend and manipulate. Essentially, we are translating
the meaning of each abstract into a unique mathematical code.

To achieve this feat, we leverage the power of Sentence Transformers, a type of deep learning
model specifically trained for semantic tasks. These models possess the remarkable ability to
capture the essence of a sentence and express it as a vector – a multidimensional array of numbers.
This vector encapsulates the semantic relationships between words within the sentence, allowing
us to compare and analyze the meaning of different pieces of text.

For our project, we selected the ”distilbert-base-nli-stsb-mean-tokens” model from the vast library
of pre-trained Sentence Transformers. This particular model excels in tasks involving Semantic
Textual Similarity (STS), which aligns perfectly with our objective of finding articles with similar
meanings. Additionally, it boasts a significant advantage over the original BERT model – its
smaller size translates to faster processing times, making it computationally efficient for our
project.

The vectorization process involves a few key steps:

1. Initializing the Transformer: We provide the chosen model name (”distilbert-base-nli-stsb-


mean-tokens”) as a string to instantiate the Sentence Transformer object.

2. Harnessing GPU Power : If a Graphics Processing Unit (GPU) is available on the system, we
can leverage its superior computational capabilities to accelerate the vectorization process.

3. Encoding the Abstracts:This is where the magic happens! We employ the ‘.encode()‘
method of the Sentence Transformer. This method takes each abstract from our dataset
as input and generates a corresponding high-dimensional vector. These vectors encapsulate
the semantic meaning of the abstracts, allowing us to establish relationships between articles
based on their content, rather than just keywords.

By successfully vectorizing our text data, we unlock a powerful tool for building our search
engine. These document vectors will serve as the foundation for the next stage – constructing
an efficient index using Faiss.

27
4.2.3 Building an Index with Faiss

Faiss is a library that allows you to efficiently search for similar vectors in a large dataset. To use
Faiss, you first need to create an index that contains all of the document vectors. You can then use
this index to search for documents that are similar to a query vector.

Having transformed our text data into meaningful vectors, we now enter the realm of Faiss
(Facebook AI Similarity Search). Faiss is a powerful library that empowers us to efficiently search
for similar vectors within a vast dataset. This is precisely what we need to build our search engine
– the ability to find articles with content closely related to a user’s query.

Faiss operates by constructing an index, essentially a data structure that meticulously organizes
the document vectors. This index allows Faiss to rapidly locate vectors similar to a new query
vector, enabling us to retrieve relevant articles based on their semantic meaning. The beauty of
Faiss lies in its ability to handle datasets of any size, even those exceeding the limitations of a
computer’s RAM.

Here’s a breakdown of the steps involved in constructing our Faiss index:

1. Data Type Conversion: As Faiss works with 32-bit floating-point matrices, we need to
convert the document vectors from their current format (likely float64) to this specific data
type. This ensures compatibility with Faiss’ internal operations.

2. Index Creation: We create a Faiss index object. This object serves as the central hub for
storing and searching the document vectors. We specify the dimensionality of the vectors
(the number of elements in each vector) to tailor the index for our data.

3. Unique Identification (Optional): Faiss employs the ”IndexIDMap” object to enable as-
signing custom IDs to the indexed vectors. In our case, we can leverage the paper IDs
retrieved from Microsoft Academic Graph to uniquely identify each document vector within
the index.

4. Populating the Index: Finally, we populate the Faiss index with the transformed document
vectors and their corresponding IDs. This process essentially builds a comprehensive map
of semantic relationships between the articles in our dataset.

5. Verification : To ensure the index functions as intended, we can test it with a sample vector.
By querying the index with an already indexed vector, we expect the first retrieved document

28
(along with its distance) to be the query itself (distance of zero). This confirms the index is
functioning correctly and ready to handle user queries.

By constructing this Faiss index, we’ve laid the groundwork for the final stage of our search engine
development: implementing the search functionality itself. The index allows us to efficiently
navigate the semantic landscape of our document collection, paving the way for retrieving articles
that resonate with a user’s search intent.

4.2.4 Searching with User Queries

Once you have built your index, you can use it to search for relevant documents for a given
query. To do this, you first need to encode the query text into a vector using the same Sentence
Transformers model that you used to encode the document vectors. Then, you can search the
index for documents that have similar vectors to the query vector. The documents with the most
similar vectors will be the most relevant to the query.

1.Encoding the User’s Intent

The process begins with understanding the user’s query. We take the user’s search text and encode
it using the same Sentence Transformer model (”distilbert-base-nli-stsb-mean-tokens”) employed
for the document vectors. This ensures consistency in how we represent both the user’s intent and
the content of the articles. The encoding process creates a query vector, a numerical representation
capturing the semantic meaning of the user’s search.

Embracing Efficiency : Data Type Conversion

As with document vectors, the query vector needs to conform to Faiss’ data type requirements.
Therefore, we convert the query vector from its current format (likely float64) to a 32-bit floating-
point representation (float32). This ensures seamless interaction with the Faiss index.

Leveraging the Power of the Index:The Search

Now comes the moment of truth! We unleash the power of the Faiss index. We feed the encoded
query vector into the index, prompting it to search for document vectors with the closest semantic
resemblance. Imagine the index meticulously sifting through the vast collection of document
vectors, identifying those that share similar meaning with the user’s query.

29
The documents associated with the most similar vectors are considered the most relevant to
the user’s search. These documents will be presented to the user as potential answers to their
information need.

By seamlessly integrating user queries with the Faiss index, we empower users to navigate the vast
ocean of academic knowledge and efficiently discover articles that address their specific informa-
tion needs. This is the culmination of our efforts, a search engine that prioritizes semantic meaning
over simple keyword matching, ultimately fostering a more insightful research experience.

30
Chapter 5

Results

In this chapter, we present the outcomes of the implemented search engine, showcasing its ef-
fectiveness in retrieving relevant information for various queries. Through a series of sample
search results, we demonstrate the engine’s capability in addressing specific informational needs.
Each result is accompanied by an analysis of the retrieved articles, providing insights into their
relevance and significance.

Search Result:1

Figure 5.1: Search Sample for covid19 news, output news articles

The image shows a sample search result for the query ”covid-19 misinformation and social me-
dia”. Here’s a breakdown of the results:

31
• Search bar: The user has entered the query ”covid-19 misinformation and social media”

• Filters: There’s a filter by publication year, ranging from 2010 to 2021. You can refine your
search based on the year of publication.

• Number of Search Results: The search returned 10 results.

• Sample Result: The first retrieved article is titled ”A first look at COVID-19 information
and misinformation sharing on Twitter” published in 2020. The citation count is 20.

Overall, the image suggests a promising initial implementation of a search engine focused on
COVID-19 misinformation on social media. By leveraging semantic search, the engine can
retrieve articles that are conceptually relevant to the user’s query, even if they don’t contain the
exact keywords.

Search Result:2

Figure 5.2: Search Sample for Haiti Earthquake fake news, output relevant articles

The image shows a sample search result for the query ”what are the effects of misinformation
on social medis during extereme events like the Haiti earthquake ?”. Here’s a breakdown of the
results:

• Search bar: The user has entered the query ”What are the effects of misinformation on social
media during extreme events like the Haiti earthquake?”

32
• Filters: Publication year can be filtered by a slider ranging from 2010 to 2021.

• Number of search results: 10 results are found for the query.

• Sample Result: The first retrieved article is titled”Human and Algorithmic Contributions to
Misinformation Online - Identifying the Culprit (2019). This abstract discusses who is to
blame for the spread of misinformation online, humans or algorithms.”

Search Result:3

Figure 5.3: Search Sample for doubts about Brain death criteria and organ donation, output
relevant articles

The image shows a sample search result for the query ”What are some ethical considerations
around expressing doubts about brain death criteria and organ donation ?”. Here’s a breakdown
of the results:

• Search bar: The user has entered the query ”What are some ethical considerations around
expressing doubts about brain death criteria and organ donation ?”

• Filters: Publication year can be filtered by a slider ranging from 2010 to 2021.

• Number of search results: 10 results are found for the query.

• Sample Result: The first retrieved article is titled”this is the result of my project where a
semantic search engine leveraging Sentence Transformers and Faiss , where the query given

33
by user is ”What are some ethical considerations around expressing doubts about brain death
criteria and organ donation ? ?” and the output is the articles and the publication year. create
a result for the report based on the given information and image.”

34
Chapter 6

Conclusion

The AURA (Academic Utility and Resource Assistant) project aims to enhance academic man-
agement by integrating automation, real-time tracking, and AI-driven classification. Traditional
academic systems often face challenges in tracking student activities, managing certificates, and
ensuring seamless communication between faculty and students. AURA overcomes these lim-
itations by offering a structured platform that automates certificate classification, activity point
tracking, and real-time notifications.

By leveraging technologies such as React.js for a dynamic frontend, Node.js and Express.js for a
robust backend, and GitHub as a version-controlled database, AURA ensures a seamless academic
experience. The platform integrates WebSockets for real-time updates, enhancing engagement
and responsiveness. Additionally, AI-based certificate classification streamlines the verification
process, reducing manual workload while ensuring accuracy.

AURA revolutionizes how academic records are managed by providing a unified system that en-
sures transparency, efficiency, and automation. With its modular design and scalable architecture,
the platform can be expanded with additional features in the future. By automating essential
academic processes, AURA empowers students and faculty with a smarter, more efficient way
to track progress, manage activities, and stay informed. This project lays the foundation for
an improved academic ecosystem that prioritizes accessibility, ease of use, and technological
advancement.

35
References

36

You might also like