Optimizing Resume Screening With Machine Learning: An NLP Approach
Optimizing Resume Screening With Machine Learning: An NLP Approach
Abstract—In the digital hiring landscape, managers face chal- conventional resume screening techniques encounter. We will
lenges in resume screening due to time constraints, biases, and also look at how Machine Learning models can help human
high application volumes. This study takes a closer look at how recruiters by analyzing and classifying resumes automatically
natural language processing and machine learning are being used
to speed things up for resume screening. By diving into lots of using the already established criteria. These algorithms are
research, this study showcases how promising natural language trained on large data-sets of past hiring outcomes.
processing systems can be. Details about the data set have been The objective research is to identify an optimal ML and
elaborated in the methodology part, like how the data is created NLP approach for resume screening, provide a thorough
and all different machine learning models tried out for classifying examination of their effects on talent acquisition effectiveness,
resumes. Using the Bag-of-Words method, the Gaussian Naive
Bayes model we achieved high accuracy rates—almost 99.5%. and expedite the hiring process.
These findings indicate that NLP algorithms can effectively filter The problem statement for the research and the introduction
resumes, aiding recruiters in identifying top candidates efficiently to the study are covered in Section I. Additionally stated
while minimizing biases and saving time and costs. Ultimately, are the suggested approaches and their associated benefits.
we highlight the potential of these systems to enhance the hiring Section II addresses the literature survey relevant to the study
process.
Index Terms—Classification, Machine Learning, Natural Lan- and different approaches considered by other researchers for
guage Processing, Resume Screening resume classification and ranking. Section III discusses the
exploratory approach or design. It includes a description of the
I. I NTRODUCTION training data, system analysis, classification techniques, feature
extraction, classifiers used, and performance metrics. Section
Now a days there is a lot of competition in the employment IV delves deeply into the findings of various machine learning
market, the process of resume screening poses significant models and provides us with both graphical and tabular data
challenges for both employers and job seekers. For hiring representation for better understanding of performance of each
managers and recruiters, the task of screening resumes has model. And, Section V concludes the paper and discusses the
grown more difficult in the digital age. Sorting through tons scope for further research in the near future.
of resumes takes forever. It’s tricky too, with bias and mistakes
creeping in. Job seekers face their own tough hurdle. Crafting II. L ITERATURE S URVEY
resumes that show off their skills and experiences is hard. Plus, Screening resumes is tough for recruiters. It can be re-
they need to shine in a sea of applications. ally overwhelming. That’s where natural language processing
Now, things are changing. The world has seen a real (NLP) comes in handy. It helps automate the resume screening
makeover by adopting natural language processing (NLP) and process by pulling out important info like skills, experience,
machine learning (ML). With these smart tools, companies can and education.
speed up hiring. They can find better candidates and reduce With so many jobs needing filled, hiring managers often
bias in their choices. It’s all about using data and language find it hard to pick the right candidates based on what they
tools to make things easier for everyone. actually need. This challenge is growing, for sure.
We will examine the fundamentals of machine learning and Recently, optimizing resume screening with machine learn-
natural language processing and how they relate to resume ing and NLP has caught a lot of interest[1]. There are so many
screening in this report. We’ll explore the main obstacles that new ideas and methods popping up. Researchers have looked
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on March 27,2025 at 10:41:19 UTC from IEEE Xplore. Restrictions apply.
closely at employment recommendation services. For example,
Mujtaba et al. created a detailed resume classification system
that uses NLP and machine learning techniques. This system
shows better accuracy and efficiency when sorting resumes
according to job needs.
The perks of using an e-recruitment portal for companies are
big. Plus, there are factors could sway a candidate’s selection,
along with other key recruitment processes that matter lot too.
Going further in the field, Rajath and his team tried out
K-nearest neighbors (KNN) along with cosine similarity mea- Fig. 1. Block diagram showing all stages of the study
sures. They used these for classifying and ranking resumes.
They proved that these methods are great at giving accurate A. Dataset
rankings based on how well a candidate fits the job[2].
Moreover, Ramos et al. came up with the Term Frequency- The dataset used is a popular open-source dataset available
Inverse Document Frequency (TF-IDF) method. This is uti- on Kaggle. It is a collection of 960 resumes. The data consists
lized for figuring out which words are important in document of one independent variable and one dependent variable. The
queries. It’s still a key technique in NLP-based resume screen- independent variable labeled ”resume” contains the various re-
ing systems. It helps pick out important terms and makes sumes associated with different job categories. The dependent
finding information more accurate[3]. Roy et al. explored variable, labeled ”Category,” depicts the various job categories.
an approach using machine learning for automating resume Table I lists the various job categories related to resumes in
recommendation systems, presenting an efficient methodology the data set and also the label associated with them after
for matching resumes to job descriptions using various ML performing label encoding.
algorithms[4].
To assess job titles, Chandak et al. use also used the TF- TABLE I
IDF vectorization approach. The authors use cosine similarity R ESUME INSTANCES OF EVERY JOB CATEGORY
as a criterion to assess how close candidate profiles are to open
Job Category Resume Instances Category Label
positions in order to further hone their recommendations. This
Advocate 20 0
cutting-edge architecture enhances the relevancy of employ-
ment recommendations given to users while also expediting Arts 36 1
the resume parsing process[6]. Further, in order to forecast Automation Testing 26 2
appropriate job responsibilities, P. V. J. et al. investigated Blockchain 40 3
a method utilizing a custom Convolutional Neural Network Business Analyst 28 4
(CNN) in conjunction with word2vec. They evaluated its per- Civil Engineer 24 5
formance against cosine similarity, Support Vector Machines Data Science 40 6
(SVM), Random Forest, and a pre-trained BERT model[7].
Database 33 7
In order to determine the best approach for precise clas-
DevOps Engineer 55 8
sification, Surendiran et al. presented a thorough solution
DotNet Developer 28 9
that investigates a number of machine learning techniques,
such as Random Forests, Decision Trees, K-Nearest Neighbors Electrical Engineering 30 10
(KNN), and Support Vector Machines (SVM)[8]. ETL Developer 40 11
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on March 27,2025 at 10:41:19 UTC from IEEE Xplore. Restrictions apply.
C. Natural Language Processing Models
Following pre-processing, the resume text is transformed
into a collection of features that the machine learning model
can use. Features can be extracted from text in a variety of
ways.
• Bag of words: This feature uses word count as a vector
to represent the text.
• Term Frequency-Inverse Document Frequency (TF-IDF):
It presents the text with a bunch of scores. These scores
signify a word’s importance in the document.
• Word2Vec: Utilizes real numbers as vectors to represent
words. A neural network model called Word2Vec is
designed to learn how to represent words in a way that
accurately reflects their semantic links.
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on March 27,2025 at 10:41:19 UTC from IEEE Xplore. Restrictions apply.
applied to ascertain the job category to which a cleaned TABLE II
resume belongs. M ETRIC VALUES WITH BAG OF W ORDS MODEL
It has two main advantages: it is simple and efficient
because it requires relatively little training data and Model Name Accuracy Precision Recall F1-Score
can be taught quickly. This makes it suitable for data Decision Tree 0.9948 0.9958 0.9948 0.9949
with a variety of features and speeds up the process of
computing probability. Naive Bayes Classifiers are of 3 K-Nearest Neighbor 0.9222 0.9571 0.9222 0.9283
types: Logistic Regression 0.9948 0.9954 0.9948 0.9948
– Multinomial Naive Bayes: Mostly used for document
Gaussian Naive Bayes 0.9948 0.9956 0.9948 0.9949
classification problems. Assumes the features to be
taken from a multinomial distribution. Random Forest 0.9948 0.9950 0.9948 0.9947
– Bernoulli Naive Bayes: Works similar to Multino-
SVM Classifier 0.9948 0.9955 0.9948 0.9949
mial Naive Bayes classifier. Assumes that features
are binary variables.
– Gaussian Naive Bayes: This method believes that
continuous features follow a Gaussian distribution.
So, it assumes that the values taken from this dis-
tribution are what predictors use when they have
continuous values instead of discrete ones.
Here, we will be using Gaussian Naive Bayes Classifier for
the classification task due to better accuracy in our case.
IV. I MPLEMENTATION A ND R ESULTS
The outcomes of the various prediction models are discussed
in this section and different metrics have been used to assess
the results.
A. Accuracy
It helps to understand how well the model can predict true
positives and true negatives correctly.
TP + TN Fig. 3. Classification models performance with Bag of Words model
Accuracy =
TP + TN + FP + FN
B. Precision
It helps to evaluate the reliability of the model.
TP
Precision = Table III provides a summary of the metric values for every
TP + FP
prediction model used in conjunction with the TF-IDF model.
C. Recall And, Figure 4 graphically compares these values.
A performance statistic that assesses how well a classifier
can identify every positive occurrence in the data-set among
all of the real positive examples.
TP TABLE III
Recall = M ETRIC VALUES WITH TF-IDF MODEL
TP + FN
D. F1-Score
Model Name Accuracy Precision Recall F1-Score
It takes precision and recall and puts them together into one
number. This gives a good look at how the classifier is doing. Decision Tree 0.9948 0.9955 0.9948 0.9949
Essentially, it’s like finding the average of precision and recall, K-Nearest Neighbor 0.9792 0.9813 0.9792 0.9787
treating them equally.
Logistic Regression 0.9948 0.9953 0.9948 0.9948
2 · Precision · Recall
F1-score = Gaussian Naive Bayes 0.9948 0.9953 0.9948 0.9948
Precision + Recall
Table II provides a summary of the metric values for every Random Forest 0.9948 0.9961 0.9948 0.9950
prediction model used in conjunction with the Bag of Words
SVM Classifier 1.0000 1.0000 1.0000 1.0000
model. And, Figure 3 graphically compares these values.
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on March 27,2025 at 10:41:19 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Classification models performance with TF-IDF model
Table IV provides a summary of the metric values for Fig. 6. Confusion Matrix with Bag of Words model
every prediction model used in conjunction with the Word2Vec
model. And, Figure 5 graphically compares these values.
TABLE IV
M ETRIC VALUES WITH W ORD 2V EC MODEL
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on March 27,2025 at 10:41:19 UTC from IEEE Xplore. Restrictions apply.
real-world. The algorithm, it turns out, got an accuracy of [12] Bhushan Kinge, Shrinivas Mandhare, Pranali Chavan, S. M. Chaware,
about 99.5%. That’s pretty impressive! It used the Bag-of- ”Resume Screening using Machine Learning and NLP: A proposed sys-
tem”, International Journal of Scientific Research in Computer Science,
Words model and the Gaussian Naive Bayes model for making Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-
predictions.Well, it shows that NLP can help filter resumes 3307, Volume 8, Issue 2, pp.253-258, March-April-2022.
effectively for job openings. Recruiters can feel more confident [13] A. Jivtode, K. Jadhav, and D. Kandhare, “Resume Analysis using
Machine Learning and NLP,” in International Research Journal of
that they’re finding the best candidates. Plus, they can save Modernization in Engineering Technology and Science, 2023.
time and money using this suggested system. [14] A. Jivtode, K. Jadhav, and D. Kandhare, “Resume Analysis using
The suggested system has several practical implications. Machine Learning and NLP,” in International Research Journal of
Modernization in Engineering Technology and Science, 2023.
First, by automating the resume screening procedure, it can
save recruiters time and money. Second, by assessing resumes
on the basis of their qualifications rather than on unimportant
details like the candidate’s name, gender, or age, it can aid in
reducing prejudice in the screening process. Third, by taking
a broader look at a candidate’s background, education, and
experience, among other things, recruiters may be able to find
more qualified applicants.
This research project can be extended in two ways. First,
work on increasing the effectiveness of the model using better
data cleaning approaches. Similarly, companies can use deep
learning models to categorize applicants for jobs and optimize
the hiring process.
R EFERENCES
[1] G. Mujtaba, I. Ali, J. Ahmed, N. Mughal and Z. H. Khand, “Resume
Classification System using Natural Language Processing and Machine
Learning Techniques”, Mehran University Research Journal of Engineer-
ing and Technology, Vol. 1, 65-79, January 2022.
[2] Rajath V , Riza Tanaz Fareed , Sharadadevi Kaganurmath, 2021,
“Resume Classification and Ranking using KNN and Cosine Similarity”,
International Journal of Engineering Research & Technology (IJERT)
Volume 10, Issue 08 ,August 2021.
[3] Ramos, J., et al., “Using tf-idf to determine word relevance in document
queries”, Proceedings of the first instructional conference on machine
learning, Piscataway, NJ, pp. 133–142.
[4] Pradeep Kumar Roy, Sarabjeet Singh Chowdhary, Rocky Bhatia, ”A
Machine Learning approach for automation of Resume Recommendation
system”, Procedia Computer Science, Volume 167, 2020, Pages 2318-
2327, ISSN 1877-0509.
[5] S. Pujari, “Resume Screening with Natural Language Processing in
Python”, Department of Computer Engineering, Vidyalankar Institute
of Technology, Mumbai, India, September 2023.
[6] A. V. Chandak, H. Pandey, G. Rushiya and H. Sharma, ”Resume
Parser and Job Recommendation System using Machine Learning,”
2024 International Conference on Emerging Systems and Intelli-
gent Computing (ESIC), Bhubaneswar, India, 2024, pp. 157-162,
doi:10.1109/ESIC60604.2024.10481635.
[7] P. V. J, S. N. J. P, S. Gopinath, U. S and K. C.R., ”Resume Analyzer
and Skill Enhancement Recommender System,” 2024 Asia Pacific Con-
ference on Innovation in Technology (APCIT), MYSORE, India, 2024,
pp. 1-6, doi: 10.1109/APCIT62007.2024.10673530.
[8] B. Surendiran, T. Paturu, H. V. Chirumamilla and M. N. R. Reddy,
”Resume Classification Using ML Techniques,” 2023 International Con-
ference on Signal Processing, Computation, Electronics, Power and
Telecommunication (IConSCEPT), Karaikal, India, 2023, pp. 1-5, doi:
10.1109/IConSCEPT57958.2023.10169907.
[9] Gopal Kamineni, Kandula Akhil Sai, G. SivaNageswara Rao. ”Resume
Classification usingSupport Vector Machine”, 2023 3rd International
Conference on Pervasive Computing and Social Networking (ICPCSN),
2023.
[10] Srijita Chakraborty, Amiya Ranjan Panda, Priyal Vadiya, Sayam Samal,
Ishita Gupta, Niranjan Kumar Ray. ”Predicting Diabetes: A Comparative
Study of Machine Learning Models”, 2023 OITS International Confer-
ence on Information Technology (OCIT), 2023.
[11] D. Jagan Mohan Reddy, S. Regella and S. R. Seelam, ”Recruitment
Prediction using Machine Learning,” 2020 5th International Conference
on Computing, Communication and Security (ICCCS), Patna, India,
2020, pp. 1-4.
Authorized licensed use limited to: BRACT's Vishwakarma Institute Pune. Downloaded on March 27,2025 at 10:41:19 UTC from IEEE Xplore. Restrictions apply.