A Python toolbox for gaining geometric insights into high-dimensional data
-
Updated
Jul 10, 2025 - Python
A Python toolbox for gaining geometric insights into high-dimensional data
This project is an unsupervised NLP-based recipe recommender system designed to provide personalized recipe suggestions. The system employs content-based filtering techniques, utilizing cosine similarity to measure the resemblance between user inputs and a database of recipes.
The project has text vectorization, handling big data with merging and cleaning the text and getting the required columns while boosting the performance by feature extraction and parameter tuning for NN, compares the Performances through applied different models treating the problem as classification and regression both.
📖 Use Bi-normal Separation to find document vectors which is used to compute similarity for shorter sentences.
Given a document, identifying the closest documents within the list of documents using tf-idf matrix and cosine similarity
🚀 Course Recommendation System is a machine learning-powered web application designed to recommend similar courses from Coursera's vast dataset of over 3,000 courses. Built using Python, Scikit-learn, and Streamlit, the app preprocesses course data, applies text vectorization, and leverages cosine similarity to offer personalized recommendations.
Comment Sentiment Analysis using Deep Learning
Word Factor Vectors
Experiments in the field of Sentiment Analysis using ML Algorithms namely Logistic Regression, Naive Bayes along with tfidf, one hot encoding, bag of words vectorization. Different MLP and RNN models viz. LSTM, GRU, Bidirectional LSTM. Lastly, state of the art BERT model
This program is a project carried out in the Natural Language Processing course, which is a Taylor Swift song recommender. It utilizes topics such as sentiment analysis in texts, text vectorization, and the removal of stopwords.
Evaluation of the accuracy of vectorization and text classification methods
In this project, task involves analyzing the content of the articles to extract key concepts and themes that are discussed across the articles to identify major themes/topics across a collection of BBC news articles.
A simple Python script for transforming a corpus of documents into text vectors suitable for visualization
A DL project that helps in classifying Toxic Comment weather it is positive or not.
Text Classification of Legitimate and Rogue Online Privacy Policies: A manual analysis and an experimental procedure
Movie Recommender based on Content based filtering.
Resume Matcher: A Streamlit app that compares resumes with job descriptions, highlights matched and missing skills, and calculates text similarity and skill coverage.
A diploma project focused on vectorizing scientific texts using the Top2Vec algorithm, with the aim of analyzing thematic groups, identifying trends, and visualizing the dynamics of interest in various topics in the field of computer science.
The repository contains notebooks created for collecting and preprocessing the corpus of diary entries and for experiments on creating models for predicting gender, age groups of authors and the time period of text creation.
Homeworks and final project for Infosearch course
Add a description, image, and links to the text-vectorization topic page so that developers can more easily learn about it.
To associate your repository with the text-vectorization topic, visit your repo's landing page and select "manage topics."