Skip to content

DeepMatch is an AI-powered pipeline that extracts structured data from resumes and job descriptions using NER and transformer embeddings to compute semantic similarity and streamline candidate-job matching.

License

Notifications You must be signed in to change notification settings

prakadeesh01/deepmatch-x

Repository files navigation

Transformer Logo

💼 DeepMatch: Transformer-Based Resume–JD Matching

DeepMatch is an advanced pipeline that revolutionizes resume-to-job-description matching by leveraging Named Entity Recognition (NER) and transformer-based embeddings. Extract structured data, compute semantic similarities, and automate candidate screening with precision! 🚀


📜 Table of Contents


🌟 Overview

DeepMatch uses state-of-the-art NLP to extract structured entities (e.g., skills, experience, degrees) from resumes and job descriptions, then compares them using dense vector embeddings. It supports semantic matching, skill relevance scoring, and automated candidate ranking — making it ideal for modern HR automation.


slide

🔍 Features

🧠 Named Entity Recognition (NER)

Extract structured information from resumes and job descriptions with high accuracy.

Models Used:

  • spaCy

    • en_core_web_sm (pretrained, lightweight)
    • en_core_web_trf (transformer-based, high accuracy)
    • Custom-trained spaCy model on resume NER data
  • Hugging Face Transformers

    • bert-base-cased
    • distilbert-base-uncased

Entities Extracted:

  • Name
  • Email
  • Phone
  • Location
  • Degree
  • Designation
  • Company
  • Years of Experience
  • Skills

📊 Embedding Models

Converts entity-level text into dense vectors for semantic comparison.

Models Supported:

  • all-MiniLM-L6-v2
  • paraphrase-MiniLM-L12-v2
  • sentence-t5-base
  • sentence-t5-large

Embedding Modes:

  • Per-Entity: Individual embeddings for each entity
  • Combined: Joint embeddings for concatenated entities

📏 Similarity Scoring

Measures alignment between resumes and job descriptions.

Metrics:

  • Cosine Similarity (default)
  • Dot Product (alternative)
  • Euclidean Distance (optional)

Scoring Options:

  • Per-entity similarity for granular insights
  • Joint profile-level comparison for overall match

🧪 Example Outputs

  • Sample NER Output
  • Similarity Score Heatmap

(Files available in the output/ folder.)


⚙️ Setup Instructions

Get DeepMatch up and running with these simple steps:

🔁 1. Clone the Repository

git clone https://github.com/prakadeesh01/deepmatch.git
cd deepmatch

📦 2. Install Dependencies

pip install -r requirements.txt

▶️ 3. Run the Notebooks

jupyter notebook


📁 Data Notes

Input:
Place resumes and job descriptions in the data/ folder.
Supported Formats: .pdf, .docx

Output:
NER results, embeddings, and similarity scores are saved in output/.

Privacy:
No actual resume data is included in the majority of the repository to protect personal information.


💼 Use Cases

DeepMatch powers a range of HR and recruitment solutions:

  • ✅ Resume Screening Systems: Automate candidate evaluation with precision.
  • ✅ Job Recommendation Engines: Match candidates to ideal roles.
  • ✅ Candidate–Job Fit Matching: Rank candidates by semantic alignment.
  • ✅ Automated Skill Gap Analysis: Identify areas for upskilling.

🛠️ Tech Stack

  • Languages: Python 3.9+
  • NER: spaCy, Hugging Face Transformers
  • Embeddings: SentenceTransformers, T5
  • Similarity Metrics: scikit-learn, SciPy
  • Environment: Jupyter Notebooks, VS Code

📜 License

This project is licensed under the MIT License.


👨‍💻 Author

Prakadeesh K S
GitHub: @prakadeesh01


🙏 Acknowledgements

  • spaCy for robust NER capabilities
  • Hugging Face Transformers for pretrained language models
  • SentenceTransformers for efficient semantic embeddings

About

DeepMatch is an AI-powered pipeline that extracts structured data from resumes and job descriptions using NER and transformer embeddings to compute semantic similarity and streamline candidate-job matching.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published