Find the job you actually want using AI.
Access here: https://intersect.streamlit.app
Intersect (web app) is a job-searching tool that uses NLP to reorder job postings based on semantic similarity rather than traditional keyword searches. Unlike lexical search (BM25), which relies on exact word matches, semantic search uses dense vectors to represent meaning (Boykis, 2023; Mitchell, 2019; Schmidt, 2015), providing more personalized results when used with user-provided text. By providing the user with different information retrieval methods (original ranking, semantic search, lexical search, semantic delta, reranking), the purpose of Intersect is to enhance job discovery and reduce manual effort.
Intersect uncovers non-obvious job opportunities by enhancing traditional search methods with NLP. The varied outcomes suggest a hybrid approach—combining keyword, semantic, and reranking techniques—could yield optimal results.
It involves
- Fetching job listings via APIs (currently Reed API) and vectorizing results with OpenAI's
text-embedding-3-small. - Capturing user input (text or PDF CV) and reordering results by computing similarity via dot product.
- Visualizing clusters using UMAP and HDBSCAN.
- Displaying original ranking from the job API.
- Reordering results using BM25 (lexical search).
- Reordering results using semantic search (embedding similarity).
- Identifying semantic delta (jobs that rank differently between lexical and semantic search).
- Reranking with Cohere's cross-encoder.
- web development
uv: environment and dependency managementstreamlit: web framework (frontend and backend) and hostingpypdf: pdf cv parsing
- data science
- semantic search: OpenAI's
text-embedding-3-small - lexical search:
bm25s(Lucene method)- preprocessing (tokenizer, stemmer, stop words)
- visualization: UMAP + HDBSCAN (
umap-learn,hdbscan) - reranker: Cohere's reranking model (
rerank-v3.5)
- semantic search: OpenAI's
- Boykis, V. (2023). What are embeddings?. Retrieved from https://github.com/veekaybee/what_are_embeddings
- Mitchell, M. (2019). Artificial Intelligence: A Guide for Thinking Humans. Pelican Books.
- Sanseviero, O. (2024). Sentence Embeddings. Cross-encoders and Re-ranking. hackerllama. Retrieved from https://osanseviero.github.io/hackerllama/blog/posts/sentence_embeddings2/
- Schmidt, B. (2015). Vector Space Models for the Digital Humanities. Bookworm. Retrieved from https://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html
- Sun, W., Yan, L., Ma, X., Wang, S., Ren, P., Chen, Z., Yin, D., & Ren, Z. (2024). Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents (No. ArXiv: 2304.09542). ArXiv. https://doi.org/10.48550/arXiv.2304.09542