- Rochester, NY
-
08:03
(UTC -04:00) - https://hksung.github.io
Stars
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates
Lexicons for the Multilingual UCREL Semantic Analysis System
quick and dirty dump of 25k English words from wordfreq
List of Permanent Free LLM API (API Keys)
Lightweight, open-source AI agent for your tools, chats, and workflows.
Korean sejong corpus download and simple analysis
Pre-trained word vectors of 30+ languages
A Large-Scale Open-Domain Sign Language Translation Dataset (ASL-English)
Code to compute AnthroScore, a computational linguistic measure of anthropomorphism in text
Ghostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
Training & Implementation of chatbots leveraging GPT-like architecture with the aitextgen package to enable dynamic conversations.
A ollama based chatbot. We use llama3 8b model via groq for this project.
https://sharedtask.duolingo.com
Real-time webcam demo with SmolVLM and llama.cpp server
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
A dataset containing human-human knowledge-grounded open-domain conversations.
A web app for ranking computer science departments according to their research output in selective venues, and for finding active faculty across a wide range of areas.
Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
An Argument Structure Construction Treebank