Implementations of Popular Static Word Embedding Techniques
-
Updated
Jun 5, 2025 - Jupyter Notebook
Implementations of Popular Static Word Embedding Techniques
Lightweight Semantic Chunking Library. Plug any embedding provider/API. Batch embeddings for efficiency and handling API rate limits.
RemEz is a descriptive question based learning platform built for students in highly theoretical subjects. The Frontend and Backend of this platform is built with the MERN stack and tailwind. This repository contains nlp code for pdf processing and descriptive QA generation via a LLM along with a similarity assessment of two descriptive answers.
Sentence Transformer model fine tuned on mtsb dataset to generate embeddings of Nepali sentences. Try from here:
Word Mini-Game : Guess the secret word ! Play here :
Learning project: modular RAG pipeline for legal document search & Q&A using SBERT, Pinecone, and FastAPI.
Data Collection repository for Reverse Search Engine
Sentiment analysis using machine learning classifiers SVM and MLP to investigate potential gender biases in the provided dataset.
Dive into LangChain, a powerful platform that lets you interact with your data like never before. This guide offers insights on its unique capabilities, helping you tap into your data in conversational ways.
Molecular substructure graph attention network for molecular property identification in drug discovery. This is the starting point for my thesis project and is the fork of a repository from the paper https://doi.org/10.1016/j.patcog.2022.108659
✍️ Convert class slides into beautiful markdown notes!
A chatbot that parses your PDF files and answers your questions around that file using GPT
Demonstrating RAG with streamlit.
The streamlit application for everyone who want to chit chat with their documents.
Multilingual toolkit for evaluating LLMs using embeddings
Analysis of embeddings and age biases in image generation models using CLIP, DINO, ResNet and Stable Diffusion XL
Performed feature extraction and similarity lookup on Caltech101.
A research project on visualising and processing large collections of sentiment data.
A Python-based semantic search system to find relevant transcript chunks based on user queries. Supports TF-IDF and Hugging Face LLM (llm2) search methods, with a Streamlit web interface and CLI for interactive querying. Outputs results in the format [timestamp], <chunk> and logs them to output/output.txt.
Exploring building an application in which an LLM can be prompted with the addition of context from a customly managed knowledge bank of data.
Add a description, image, and links to the embeddings-similarity topic page so that developers can more easily learn about it.
To associate your repository with the embeddings-similarity topic, visit your repo's landing page and select "manage topics."