Skip to content
View kavgan's full-sized avatar

Highlights

  • Pro

Block or report kavgan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, …

Jupyter Notebook 1,184 789 Updated Dec 2, 2020

Robustness Gym is an evaluation toolkit for machine learning.

Python 445 36 Updated Jun 28, 2022

Curated List of Blog Posts From Opinosis Analytics

2 1 Updated Aug 14, 2021

Python word cloud library for use within Jupyter notebook and Python apps.

Jupyter Notebook 49 14 Updated May 15, 2024

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Python 16,999 3,720 Updated Jun 2, 2023

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Python 6,990 2,254 Updated Oct 14, 2025

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Python 131 45 Updated Jul 15, 2019
Jupyter Notebook 28 13 Updated Sep 30, 2016
Jupyter Notebook 46 45 Updated Feb 25, 2018

A few exercises for use at events.

Jupyter Notebook 1,438 669 Updated Apr 27, 2021

CNN text classification using keras

Python 16 6 Updated Nov 27, 2017

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Java 218 37 Updated Apr 9, 2020

Cool links & research papers related to Machine Learning applied to source code (MLonCode)

6,521 838 Updated Dec 3, 2020

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical no…

25 11 Updated Jan 22, 2018

OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)

44 12 Updated May 28, 2021

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to t…

Java 1,023 350 Updated Feb 6, 2026

Examples of code in spark

Python 10 6 Updated Dec 2, 2017

RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.

15 8 Updated Jan 24, 2020

This repo contains code and dataset for the Opinosis Summarization Framework

51 18 Updated Nov 14, 2019

Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

Java 2,737 416 Updated Jun 1, 2022