Chat Bot
Chat Bot
NO CONTENTS PAGE NO
1 SYNOPSIS
2 INTRODUCTION
3 SYSTEM ANALYSIS
3.1 Existing System
3.2 Proposed System
4 SYSTEM DESIGN
4.1 Data Flow Diagram
4.2 Database
4.3 Module Description
4.4 Screen
5 HARDWARE AND SOFTWARE
SPECIFICATION
6 IMPLEMENTATION
7 TESTING
7.1 Software Testing
7.2 Functional Testing
7.3 Non-Functional Testing
8 CONCLUSION AND FUTURE
ENHANCEMENT
9 BIBLIOGRAPHY
10 APPENDIX
10.1 Source Code
10.2 Screen Shots
1. SYNOPSIS
- Limited Accessibility: With restrictions on physical access to the university premises due
to factors like curfews or the recent epidemiological situation, accessing information through
traditional methods may be challenging for students.
- Lack of Personalization: Traditional methods may lack the ability to provide personalized
responses to individual inquiries, leading to a less engaging and efficient interaction for
users.
3.2 PROPOSED SYSTEM
To bridge this gap, we propose the development of a chatbot for Admission and
Registration purposes. This chatbot will leverage artificial intelligence techniques such as
Natural Language Processing (NLP) to interact with users in natural language. By
integrating machine learning, the chatbot will become more comprehensive and reliable,
capable of answering a wide range of inquiries related to the university, colleges, majors,
and admission policies. This project targets Palestinian Tawjihi students as well as other
Palestinian and non-Palestinian students, aiming to provide them with easy access to
information anytime they need it. By addressing common questions and providing timely
responses, the chatbot will enhance the accessibility and efficiency of the university
admission and registration process.
- Accessibility: The chatbot will provide easy access to information about university
admissions and registration, allowing users to inquire anytime and from anywhere.
- Efficiency: By leveraging artificial intelligence and machine learning, the chatbot will be
able to handle a wide range of inquiries quickly and accurately, saving time for both users
and university staff.
- 24/7 Availability: Unlike traditional methods that may have limited operating hours, the
chatbot will be available round-the-clock, ensuring that users can get the information they
need whenever they need it.
- Personalized Interaction: Through Natural Language Processing (NLP), the chatbot will
engage users in natural language conversations, providing a personalized and user-friendly
experience.
4. SYSTEM DESIGN
A Data Flow Diagram (DFD) is a diagram that describes the flow of data and
the processes that change data throughout a system. It’s a structured analysis and design
tool that can be used for flowcharting in place of or in association with information.
Oriented and process oriented system flowcharts. When analysts prepare the Data Flow
Diagram, they specify the user needs at a level of detail that virtually determines the
information flow into and out of the system and the required data resources. This
network is constructed by using a set of symbols that do not imply physical
implementations. The Data Flow Diagram reviews the current physical system, prepares
input and output specification, specifies the implementation plan etc.
Four basic symbols are used to construct data flow diagrams. They are symbols
that represent data source, data flows, and data transformations and data storage. The
points at which data are transformed are represented by enclosed Figures, usually
circles, which are called nodes.
- Data Flow
- Process
- Storage
User Interface
|
Chatbot Processing
v
v Repository
University
Admission&
Registration
4.2 DATABASE
For the proposed chatbot system for University Admission and Registration, you would need
a database that stores various information related to the university, admissions, registration,
majors, policies, and other relevant data. Here's a simplified outline of the database schema:
1. Users Table:
- UserID (Primary Key)
- Username
- Password (encrypted)
- Email
2. Questions Table:
- QuestionID (Primary Key)
- QuestionText
- UserID (Foreign Key referencing Users Table)
- Timestamp
3. Answers Table:
- AnswerID (Primary Key)
- QuestionID (Foreign Key referencing Questions Table)
- AnswerText
- Timestamp
4. Majors Table:
- MajorID (Primary Key)
- MajorName
- Description
- AdmissionRequirements
5. Admissions Table:
- AdmissionID (Primary Key)
- MajorID (Foreign Key referencing Majors Table)
- AdmissionDeadline
- AdmissionProcessDescription
6. Registration Table:
- RegistrationID (Primary Key)
- RegistrationDeadline
- RegistrationProcessDescription
7. Policies Table:
- PolicyID (Primary Key)
- PolicyTitle
- PolicyDescription
- EffectiveDate
- ExpiryDate
9. Logs Table:
- LogID (Primary Key)
- UserID (Foreign Key referencing Users Table)
- Action (e.g., question asked, answer provided)
- Timestamp
4.3 MODULE DESCRPTION
- Responsible for managing user accounts, including registration, login, and profile
management.
- Allows users to create accounts, login securely, and update their profile information.
- Offers details about registration deadlines, procedures, and requirements for enrolling in
courses.
- Assists users in understanding the registration process and fulfilling necessary steps.
LOGGING MODULE
- Records user interactions with the chatbot, including questions asked, answers provided,
and timestamps.
- Helps in analyzing user behavior, improving system performance, and tracking user
engagement.
ADMINISTRATIVE MODULE
Language : Python
Processor : i3 or above
Ram : 4 GB or above
PROCESSOR : i3
RAM : 8 GB RAM
MOUSE : Logitech
Anaconda Tool
1. Package Management :
Anaconda comes with its package manager called conda, which allows users to
install, update, and manage software packages and dependencies. Conda provides a unified
interface for managing packages across different programming languages and environments.
4. Environment Management :
Anaconda allows users to create isolated environments with specific sets of packages
and dependencies, making it easy to manage different project requirements and avoid
conflicts between packages. Users can create, clone, export, and share environments using
conda commands or Anaconda Navigator
5. Cross-Platform Compatibility :
Anaconda is available for Windows, macOS, and Linux operating systems, making it
suitable for users across different platforms.
6. Community Support :
Anaconda has a large and active community of users and contributors who provide
support, tutorials, and resources for beginners and experienced users alike. The community
forums, documentation, and online resources are valuable sources of information for
troubleshooting issues and learning new skills.
Overall, Anaconda is a powerful and versatile tool for data scientists, researchers,
educators, and developers working on data-intensive projects. Its extensive library of
packages, robust package management system, and user-friendly interface make it a popular
choice for data science and scientific computing tasks.
Languages Supported By Anaconda
Anaconda primarily supports the Python and R programming languages, which are
widely used in the field of data science, scientific computing, and machine learning.
1. Python :
Python is the primary language supported by Anaconda. It comes with a
comprehensive collection of Python packages and libraries commonly used in data analysis,
machine learning, scientific computing, and visualization. Anaconda provides tools like
conda for package management and environments to simplify Python development
workflows.
2. R:
In summary, while Python and R are the main languages supported by Anaconda,
users have the flexibility to work with other languages and tools within the Anaconda
ecosystem.
Features of Anaconda
Anaconda is a powerful tool for data science and machine learning tasks, offering a
range of features that streamline development, analysis, and deployment processes. Some key
features of Anaconda include:
1. Package Management :
Anaconda comes with a package manager called conda, which simplifies the
installation, updating, and management of software packages and dependencies. Conda
provides a unified interface for managing packages across different programming languages
and environments.
Anaconda includes a vast array of pre-installed data science libraries and tools such
as NumPy, pandas, SciPy, Matplotlib, scikit-learn, TensorFlow, PyTorch, and others. These
libraries are essential for data manipulation, analysis, visualization, and machine learning
tasks.
4. Environment Management :
Anaconda enables users to create isolated environments with specific sets of packages
and dependencies, facilitating the management of different project requirements and avoiding
conflicts between packages. Users can create, clone, export, and share environments using
conda commands or Anaconda Navigator.
5. Cross-Platform Compatibility :
Anaconda is available for Windows, macOS, and Linux operating systems, making it
suitable for users across different platforms.
6. Community Support :
Anaconda has a large and active community of users and contributors who provide
support, tutorials, and resources for beginners and experienced users alike. The community
forums, documentation, and online resources are valuable sources of information for
troubleshooting issues and learning new skills.
Anaconda provides resources and support for educational institutions, including the
Anaconda Education Program, which offers free access to Anaconda products and services
for educational purposes. It also offers training courses and certifications for individuals
looking to enhance their skills in data science and machine learning.
Anaconda integrates with cloud services such as Amazon Web Services (AWS),
Microsoft Azure, and Google Cloud Platform (GCP), allowing users to deploy and scale their
data science projects in cloud environments seamlessly.
These features make Anaconda a comprehensive and versatile tool for data scientists,
researchers, educators, and developers working on data-intensive projects. Its extensive
library of packages, robust package management system, and user-friendly interface make it a
popular choice for data science and scientific computing tasks.
Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations, and narrative text. It supports
various programming languages, including Python, R, Julia, and others.
1. Interactive Computing :
You can write and execute code in cells, which allows for an interactive computing
experience. Each cell can contain code, markdown text, or raw text.
2. Rich Output :
Jupyter Notebooks support the display of rich media output, including images, videos,
HTML, LaTeX equations, and more. This makes it suitable for data analysis, visualization,
and report generation.
3. Kernel Support :
Jupyter Notebook works with different programming languages through the concept
of kernels. Each notebook is associated with a kernel, which allows it to execute code written
in that language.
4. Notebook Sharing :
Notebooks can be shared easily with others, either by sharing the notebook file or
using online platforms like GitHub, JupyterHub, or Jupyter Notebooks hosted on cloud
services like Google Colab or Microsoft Azure Notebooks.
5. Version Control :
Since Jupyter Notebooks are plain text files, they can be version controlled using Git
or other version control systems, enabling collaboration and tracking changes over time.
7. Customization :
Jupyter Notebooks offer customization options, allowing users to configure the
appearance, behavior, and extensions according to their preferences.
Jupyter Notebook has become an essential tool in data science, machine learning, research,
education, and other fields where interactive computing and documentation are required. It
provides a flexible and intuitive environment for experimentation, analysis, and
communication of results.
About Python
Python is a dynamically typed, high-level programming language known for its
simplicity, readability, and versatility. It was created by Guido van Rossum and first released
in 1991. Since then, Python has grown into one of the most popular programming languages
worldwide, with a large and active community of developers.
Python's standard library is extensive and provides support for various tasks such as
file I/O, networking, database access, GUI development, and more. This rich collection of
modules and packages reduces the need for external dependencies and accelerates
development by providing ready-to-use solutions for common problems.
In addition to its standard library, Python has a vast ecosystem of third-party libraries
and frameworks contributed by the community. These libraries cover a wide range of
domains, including web development, data science, machine learning, artificial intelligence,
scientific computing, game development, and more. Popular libraries and frameworks like
NumPy, Pandas, TensorFlow, Django, Flask, and Pygame have cemented Python's position as
a leading language in these fields.
Features of Python
2. High-Level Language :
Python is a high-level programming language, which means it abstracts low-level
details like memory management and hardware interactions, allowing developers to focus on
solving problems at a higher level.
4. Dynamic Typing :
Python is dynamically typed, which means that variables don't have a fixed data type
and can dynamically change types during runtime. This flexibility simplifies coding but
requires careful attention to data types.
5. Multi-Paradigm :
Python supports multiple programming paradigms, including procedural, object-
oriented, and functional programming. This flexibility allows developers to choose the most
appropriate programming style for their projects.
6. Cross-Platform :
Python is platform-independent, which means that code written in Python can run on
different operating systems without modification. This makes it a versatile choice for cross-
platform development.
These features make Python a popular choice for a wide range of applications,
including web development, data analysis, machine learning, automation, scripting, scientific
computing, and more. Its simplicity, versatility, and strong community support have
contributed to its widespread adoption and continued success as a programming language.
Advantages of Python
3.Cross-Platform Compatibility:
Python is a cross-platform language, meaning that code written in Python can run
seamlessly on various operating systems such as Windows, macOS, and Linux. This
portability makes Python an ideal choice for developing applications that need to be deployed
across different platforms.
4.High-Level Language:
Python is a high-level language, which means that it abstracts away low-level details,
such as memory management and hardware interactions, allowing developers to focus on
solving problems and writing clean, concise code. This abstraction makes Python particularly
suitable for rapid development and prototyping.
5.Scalability:
Python is scalable, enabling developers to build small scripts or large-scale
applications with ease. Its scalability is evident in its usage by companies ranging from
startups to tech giants like Google, Facebook, and Netflix, which rely on Python for
developing robust and scalable systems.
6.Integration Capabilities:
Python seamlessly integrates with other languages and platforms, allowing developers
to leverage existing code written in languages such as C/C++ or Java. This interoperability
makes Python an attractive choice for building systems that require interfacing with legacy
code or integrating with external systems.
9.Deployment Flexibility:
Python offers various deployment options, including standalone executables, web
applications, microservices, and cloud-based solutions. Its versatility in deployment allows
developers to choose the most appropriate deployment strategy for their applications.
6. IMPLEMENTATION
7. TESTING
Unit testing
The first step in testing is Unit testing. Individual testing are tested to ensure
that they operate correctly. Each component is tested independently, without other
system components. The module interface is tested to ensure that information
properly flow into and out of the program.These are tested that the module operates
at boundary established to limit or restrict processing. Unit testing is normally
considered as an adjunct to the coding step. After the coding has been developed,
received and verified for correct syntax, unit testing begins. Here each module is
tested to provide its correctness, validity and determine any missing operations and
to verify whether the objectives have been met, errors are noted down and corrected
immediately.
Unit testing is the important and major part of the project. So errors can be
rectified easily in each module and program clarity can be increased. In this project,
the entire system is divided into several modules and is developed individually.
Hence, unit testing is conducted to individual modules.
Integration testing
The second step in the testing process is the Integration testing. Integration
testing is the systematic technique for constructing the program structure while
conducting tests to uncover errors associated with integrating. After the unit test,
each module is gradually integrated to form one final system.
All the modules when unit tested will work properly but after integrating the
data can cause error one module can have an inadvertent, adverse effect on another;
sub functions when combined may not produce the desired major function; global
data structures can cause problems, etc.
Hence, the objective of integration testing is to take unit tested modules and
build a final program structure. In this project, modules are combined to find the
overall performance of the system.
Acceptance testing
The types of acceptance testing are:
The User Acceptance test: focuses mainly on the functionality thereby
validating the fitness-for-use of the system by the business user. The user
acceptance test is performed by the users and application managers.
Performance testing
Various client side validations are used to ensure on the client side that only valid data is
entered. Client side validation saves server time and load to handle invalid data. Some checks
imposed are:
VBScript in used to ensure those required fields are filled with suitable data only.
Maximum lengths of the fields of the forms are appropriately defined.
Forms cannot be submitted without filling up the mandatory data so that manual mistakes
of submitting empty fields that are mandatory can be sorted out at the client side to save
the server time and load.
Tab-indexes are set according to the need and taking into account the ease of user while
working with the system.
SERVER SIDE VALIDATION
Some checks cannot be applied at client side. Server side checks are necessary to save the
system from failing and intimating the user that some invalid operation has been performed
or the performed operation is restricted. Some of the server side checks imposed is:
Server side constraint has been imposed to check for the validity of primary key and
foreign key. A primary key value cannot be duplicated. Any attempt to duplicate the
primary value results into a message intimating the user about those values through the
forms using foreign key can be updated only of the existing foreign key values.
Various Access Control Mechanisms have been built so that one user may not agitate
upon another. Access permissions to various types of users are controlled according to the
organizational structure. Only permitted users can log on to the system and can have
access according to their category. User- name, passwords and permissions are controlled
o the server side.
Using server side validation, constraints on several restricted operations are imposed.
8. CONCLUSION AND FUTURE ENHANCEMENT
CONCLUSION
This bot was built to respond to the inquiries of the students regarding each of the university's
faculties and their specializations, with extracted information for each specialization,
familiarizing students with the level exams that students submit about their enrollment in the
university, introducing the educational qualification diploma program and the mechanism for
joining it. Giving students notes on the electronic enrollment application package, the
locations of approved banks, and how to fill out the application. Introduce students to the
conditions and notes that must be taken into account in the event of joining Palestine
Polytechnic University and the mechanism for calculating grades. Introduce students to the
procedures followed to reserve a seat and what documents are required after the student is
accepted. Introducing students to the system of transferring to Palestine Polytechnic
University from another university on the undergraduate system. Informing students of the
university’s teaching system and language. Introducing students to the student exchange
system with other universities. Introducing students to the system of grants, exemptions, and
financial aid provided to students. Informing students of cases in which the student loses his
university seat. Introducing students to the installment refund system for new students and its
conditions.
FUTURE ENHANCEMENT
Multilingual Support
- Implement support for multiple languages to cater to a diverse user base.
Voice Interaction
- Introduce voice recognition capabilities for users to interact with the chatbot through
speech.
Appointment Scheduling
- Enable users to schedule appointments with university advisors or counselors through the
chatbot.
Chatbot Analytics Dashboard
- Create an analytics dashboard to track user interactions, popular queries, and system
performance metrics for continuous improvement.
9. BIBLIOGRAPHY
i) Books referred:
1. Sailunaz, K., & Alhajj, R. (2019). Emotion and sentiment analysis from Twitte rtext.
Journal of Computational Science, 36, 101003.
2. Fang, Y., Tan, H., & Zhang, J. (2018). Multi-strategy sentiment analysis of
consumer reviews based on semantic fuzziness. IEEE Access, 6, 20625-20631.
3. Ahmad, S., Asghar, M. Z., Alotaibi, F. M., & Awan, I. (2019). Detection and
classification of socialmedia – based extreme staffiliations using sentiment analysis
techniques. Human-centric Computing and InformationSciences,9(1), 24.
4. Gutiérrez-Batista,K.,Campaña,J.R.,Vila,M.A.,&Martin-Bautista,
M. J. (2018, June). Fuzzy Analysis of Sentiment Terms for Topic Detection Process in
Social Networks. In International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems (pp.3-14). Springer, Cham.
ii) Websites:
1. https://data.world/crowdflower/brands-and-product-emotions
2. https://www.kaggle.com/thoughtvector/customer-support-on-
twitter#twcs.csv
3. https://www.kaggle.com/kazanova/sentiment140
4. Kaplan,A.M.,&Haenlein,M.
(2012).Socialmedia:backtotherootsandbacktothefuture. Journal ofSystemsand
Information Technology.
10. APPENDIX
import sys
import os
import pandas as pd
import numpy as np
import re
import nltk
from keras.layers import Input, Embedding, LSTM, TimeDistributed, Dense, Bidirectional
from keras.models import Model, load_model
INPUT_LENGTH = 20
OUTPUT_LENGTH = 20
import os
print(os.listdir("../input"))
id2line = {}
for line in lines:
_line = line.split(' +++$+++ ')
if len(_line) == 5:
convs = []
for line in conv_lines[:-1]:
_line = line.split(' +++$+++ ')[-1][1:-1].replace("'","").replace(" ","")
c#id and conversation sample
for k in convs[300]:
# Sort the sentences into questions (inputs) and answers (targets)
questions = []
answers = []
for conv in convs:
for i in range(len(conv)-1):
questions.append(id2line[conv[i]])
answers.append(id2line[conv[i+1]])
# Compare lengths of questions and answers
print(len(questions))
print(len(answers))
def clean_text(text):
'''Clean text by removing unnecessary characters and altering the format of words.'''
text = text.lower()
text = re.sub(r"i'm", "i am", text)
text = re.sub(r"he's", "he is", text)
text = re.sub(r"she's", "she is", text)
text = re.sub(r"it's", "it is", text)
text = re.sub(r"that's", "that is", text)
text = re.sub(r"what's", "that is", text)
text = re.sub(r"where's", "where is", text)
text = re.sub(r"how's", "how is", text)
text = re.sub(r"\'ll", " will", text)
text = re.sub(r"\'ve", " have", text)
text = re.sub(r"\'re", " are", text)
text = re.sub(r"\'d", " would", text)
text = re.sub(r"\'re", " are", text)
text = re.sub(r"won't", "will not", text)
text = re.sub(r"can't", "cannot", text)
text = re.sub(r"n't", " not", text)
text = re.sub(r"n'", "ng", text)
text = re.sub(r"'bout", "about", text)
text = re.sub(r"'til", "until", text)
text = re.sub(r"[-()\"#/@;:<>{}`+=~|]", "", text)
# text = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", text)
text = " ".join(text.split())
return text
# Clean the data
clean_questions = []
for question in questions:
clean_questions.append(clean_text(question))
clean_answers = []
for answer in answers:
clean_answers.append(clean_text(answer))
# Find the length of sentences (not using nltk due to processing speed)
lengths = []
# lengths.append([len(nltk.word_tokenize(sent)) for sent in clean_questions]) #nltk approach
for question in clean_questions:
lengths.append(len(question.split()))
for answer in clean_answers:
lengths.append(len(answer.split()))
# Create a dataframe so that the values can be inspected
lengths = pd.DataFrame(lengths, columns=['counts'])
print(np.percentile(lengths, 80))
print(np.percentile(lengths, 85))
print(np.percentile(lengths, 90))
print(np.percentile(lengths, 95))
# Remove questions and answers that are shorter than 1 word and longer than 20 words.
min_line_length = 2
max_line_length = 20
print(len(short_questions))
print(len(short_answers))
138528
138528
r = np.random.randint(1,len(short_questions))
# We will use the first 0-80th %-tile (80%) of data for the training
training_input = short_questions_tok[:round(data_size*(80/100))]
training_input = [tr_input[::-1] for tr_input in training_input] #reverseing input seq for better
performance
training_output = short_answers_tok[:round(data_size*(80/100))]
print('encoded_training_input', encoded_training_input.shape)
print('encoded_training_output', encoded_training_output.shape)
encoded_training_input (24000, 20)
encoded_training_output (24000, 20)
#encoding validation set
encoded_validation_input = transform(
encoding, validation_input, vector_size=INPUT_LENGTH)
encoded_validation_output = transform(
encoding, validation_output, vector_size=OUTPUT_LENGTH)
print('encoded_validation_input', encoded_validation_input.shape)