Nowadays, the number of vacancies is growing exponentially. To do this, companies hire HR managers to recruit people, specialists. But it is difficult and slow to sort through thousands of resumes. To solve this problem, we have proposed our own algorithm for matching resumes with vacancies in the Talent Match hackathon;
💰 The first place prize was 75.000 rubles. Second and third got 50.000 and 25.000 rubles.
We have received 2 datasets with marked up and unmarked data. "case_2_data_for_members.json" contained structured data in the form of a dictionary, where there were 5-30 suitable and unsuitable resumes for each vacancy.
- Resume columns: uuid, first_name, last_name, birth_date, country, city, about, key_skills, experienceItem, educationItem;
- Vacancy columns: uuid, name, keywords, description, comment;
Uniqueness: before the hackathon, we collected a huge dataset of resumes and vacancies for tripletloss training, and it was used to isolate the main words from vacancies (filling in gaps) of the key_words function
We tried several methods and models🌎, but stopped at the article, the Siamese model👽. How our solution works:
- Preprocessing, lemmatization, tokenization
- Sending data via distil-roBerta to receive embeddings
- Training a Siamese neural network with 2 branches and the last layer of the classifier Сonclusion: a good model, a core that gave Precision=0.33, Recall=0.8, Validation loss=0.68;
We created a base NN for further upgrades and in the end we took the 2nd 🥈place;
(One job description and the top 2 suitable resumes that the model issued🔎)