-
George Mason University
- Zurich, Switzerland
- https://sinaahmadi.github.io/
- https://orcid.org/0000-0001-7904-6551
- @sina_ahm
- in/sina-ahmadi-aba470287
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
A library for efficient similarity search and clustering of dense vectors.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Unsupervised text tokenizer for Neural Network-based text generation.
A C++ standalone library for machine learning
Fast inference engine for Transformer models
Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Unsupervised text tokenizer focused on computational efficiency
MARISA: Matching Algorithm with Recursively Implemented StorAge
UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Fast and customizable text tokenization library with BPE and SentencePiece support
GIZA++ is a statistical machine translation toolkit that is used to train IBM Models 1-5 and an HMM word alignment model. This package also contains the source for the mkcls tool which generates th…
pyhunspell / pyhunspell
Forked from smathot/pyhunspell(Official repo for pypi package) Python bindings for the Hunspell spellchecker engine
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use…
Extracts highlighted text from PDF documents.
Editor for aligned parallel texts (personal desktop application).
Code to reproduce experiments in "A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages"