Programs conducted at CDAC Pune's course PGDAI and subject "Natural Language Processing and Computer Vision" in November - December 2025
-
Updated
Dec 16, 2025 - Jupyter Notebook
Programs conducted at CDAC Pune's course PGDAI and subject "Natural Language Processing and Computer Vision" in November - December 2025
heuristic matching of large databases by fuzzy criteria like addresses
Turn messy survey responses into clean research insights. Dual-model pipeline: Claude Opus 4.5 extracts themes and assigns participants, GPT-5.1 writes executive summaries. Tuned temperatures for precision where it matters.
Data analysis of a wine's dataset.
A comprehensive news aggregation and text analysis system that leverages advanced machine learning techniques to process Vietnamese news articles.
A modern tool for detecting intra-domain textual ambiguities using word sense disambiguation techniques. Built with FastAPI (backend) and Next.js (frontend) for a modular and modern developer experience.
Repository of Natural Language Processing project at Polytechnic of Milan. Generative chatbots, with audio, images and RAG.
TopicGPT allows to integrate the benefits of LLMs into Topic Modelling
b站 AI日日新 不定期更新使用Python框架完成机器学习、深度学习、数据科学任务
Modular pipeline for text clustering, classification, and evaluation using TF-IDF and unsupervised ML techniques
The repository contains files (notebooks, data) for the course work of the 2nd course: "Topic modeling for text document analysis".
SLS : Neural Information Retrieval(IR)-based Semantic Search model
Parallel clustering-based Topic Modeling
GSDMM Short Text Clustering via Dirichlet Mixture Models
Automated Leaderboard System for Hackathon Evaluation Using Large Language Models
Performs unsupervised clustering on text documents.
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
The repository provides a pipeline for preprocessing text data, extracting features, and applying clustering algorithms like K-means, DBSCAN, or hierarchical clustering.
FastThresholdClustering is an efficient vector clustering algorithm based on FAISS, particularly suitable for large-scale vector data clustering tasks. The algorithm features intuitive and easy-to-select hyperparameters, uses cosine similarity as its distance metric, and supports GPU acceleration.
Contents for the practical part of the lecture Text Mining
Add a description, image, and links to the text-clustering topic page so that developers can more easily learn about it.
To associate your repository with the text-clustering topic, visit your repo's landing page and select "manage topics."