GitHub - key-r-code/drexel-co-op-matcher: Streamline your co-op search with LLMs!

About

Ever get tired of manually scrolling through all the co-op postings on DrexelOne all within one week? Now you don't have to!

                          ,     \    /      ,
                         / \    )\__/(     / \
                        /   \  (_\  /_)   /   \
     __________________/_____\__\@  @/___/_____\_________________
     |                          |\../|                          |
     |                           \VV/                           |
     |             D-R-E-X-E-L C-O-O-P M-A-T-C-H-E-R            |                      
     |__________________________________________________________|
                   |    /\ /      \\       \ /\    |
                   |  /   V        ))       V   \  |
                   |/     `       //        '     \| 
                   `              V                '

How it works

1. Web Scraping

Uses Selenium to log into DrexelOne and scrape available co-op postings.
Saves all the static HTML pages in a directory.
Uses BeautifulSoup to parse all the scraping into a single json file.

2. LLM pipeline

Data Preparation:
- Reads the scraped co-op postings from the JSON file and extracts key information (title, description, qualifications).
- Reads a user's resume from a PDF file, extracting all text content.
Embedding & Indexing:
- Uses the Google Gemini embedding model (models/embedding-001) to generate vector embeddings for both the resume and co-op postings.
- Chunks text before embedding to create more meaningful representations.
- Calculates average embeddings for all the co-op posting data.
- Creates a FAISS index from the embeddings of all the scraped co-op postings.
Similarity Search:
- Uses the FAISS index to perform a similarity search, finding the top k co-op postings that are most similar to the resume embedding.
LLM Ranking:
- Constructs a prompt for a Google Gemini Pro model, including:
  - The full text content of the user's resume.
  - The top k co-op postings that are most similar to the user's resume based on the FAISS search.
- Outputs the top positions user should apply for.

Usage

Clone this repository:

git clone https://github.com/key-r-code/drexel-co-op-matcher.git
cd drexel-co-op-matcher

Set up virtual env

python3 -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate  # On Windows

Install dependencies:

pip install -r requirements.txt

Create a .env file and replace with your gemini API key. See .env.example

touch .env

Add resume PDF in the same directory
Add Drexel credentials in main.py
Create list of interested majors in main.py. See majors.json for all major abbreviations used by the portal.
dragonScraper.py currently uses the Safari webdriver. Uncomment lines 17-23 to use Chrome or Firefox.
Run main.py:

python3 main.py

This will create a subdirectory and save all the static HTML files.

Change the name of the HTML directory in dragonScraper.py if you used the scraper before.

Run parsing_htmls.py:

python3 parsing_htmls.py

This will parse all the HTML files and create a single JSON file.

Run gemini-analysis-starter-nb.ipynb

TO DO:

dragonScraper.py

~~Add pagination handling~~
~~Add upcoming co-op postings and previously applied co-ops~~
Replace all time.sleep() calls with self.wait.until
~~Add Chrome and Firefox support (currently only support Safari WebDriver)~~

LLM-pipeline

Add a geminiPipeline class
Find optimal chunk_size and chunk_overlap

CLI App

Add API key handling
Add A/B/C round navigation
Add major navigation

Contributions

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch (git checkout -b feature/my-new-feature).
Make your changes.
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature/my-new-feature).
Create a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

How it works

Usage

TO DO:

Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
figures		figures
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dragonScraper.py		dragonScraper.py
gemini-analysis-starter-nb.ipynb		gemini-analysis-starter-nb.ipynb
main.py		main.py
majors.json		majors.json
parsing_htmls.py		parsing_htmls.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

About

How it works

Usage

TO DO:

Contributions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages