k-Met is a phonetic clustering algorithm for grouping words by their approximate pronunciation. It uses fuzzy matching techniques and the double metaphone indexing algorithm.
-
Updated
Feb 16, 2012 - Python
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
k-Met is a phonetic clustering algorithm for grouping words by their approximate pronunciation. It uses fuzzy matching techniques and the double metaphone indexing algorithm.
NaiveSumm is a naive summarization approach based on Luhn1958 work "The Automatic Creation of Literature Abstracts" It uses the frequencies of words in the document in order to calculate and extract the sentences that include the most frequent words.
Natural logic inference engine in Python (Stanford research, 2012)
Turkish Natural Language Toolkit
Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and …
all-paths graph kernel for protein-protein interaction extraction
Projects in Machine Learning ETH team trying to use mechanical turk and active learning for solving word-sense disambiguation task
IPython Notebook for Sentiment Classification
Get subject of simple sentences
Textual steganography using n-grams and biblical verses: Anoint thy data!
Uses the JS-Divergence to compare two documents, identifying words that most strongly represent each document in comparison to the document set.
A bunch of Python NLP shenanigans. An AGH-UST project.
CS 224D Final Project
A simple implementation of language identifier based on n-gram
Hierarchical Paragraph Vectors
Ruben's master thesis
Temporal and Causal Relation extraction module for the Newsreader project.
Created by Alan Turing