Skip to content
View AragonerUA's full-sized avatar

Highlights

  • Pro

Block or report AragonerUA

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
46 results for source starred repositories
Clear filter

Code to obtain the CNN / Daily Mail dataset (non-anonymized) for summarization

Python 662 308 Updated Jun 16, 2022

한국어 데이터 세트 링크

908 104 Updated Oct 14, 2024

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 3,462 1,907 Updated Mar 21, 2026

A small GUI library for Wayland client programming - rendered with Skia

C 31 Updated Mar 10, 2026

AI-powered penetration testing using autonomous agents — directly in your terminal.

TypeScript 199 34 Updated Mar 22, 2026

Listado de comunidades, provincias, municipios con su correspondiente código INE

HTML 19 15 Updated Mar 14, 2017

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Python 580 51 Updated Mar 11, 2026

A collection of raw text from various Greek dialects

Jupyter Notebook 7 Updated Feb 22, 2026

🐹 Deep clean and optimize your Mac.

Shell 43,388 1,260 Updated Mar 22, 2026

SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.

Python 150 16 Updated Oct 22, 2022

Find, verify, and analyze leaked credentials

Go 25,157 2,267 Updated Mar 23, 2026

Romanian Named Entity Corpus (RONEC) version 2.0

Python 68 16 Updated Nov 19, 2022

Give your agents the power of the Hugging Face ecosystem

Python 9,716 587 Updated Mar 20, 2026

The open-source voice synthesis studio

TypeScript 14,014 1,653 Updated Mar 22, 2026

Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.

Swift 1,728 221 Updated Mar 23, 2026

NLP Shared tasks (NER, SRL) using NSML

Python 183 66 Updated Jan 3, 2019

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Rust 2,834 251 Updated Mar 21, 2026

Useful datasets, specific to Georgia

62 7 Updated Mar 13, 2022

Distillation of KoBERT from SKTBrain (Lightweight KoBERT)

Python 198 23 Updated Sep 6, 2023

Portuguese pre-trained BERT models

Python 865 132 Updated Jun 17, 2024

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Python 1,564 248 Updated Jun 12, 2025

Distributed, offline-first bug tracker embedded in git

Go 9,721 295 Updated Mar 22, 2026

Korean GPT-2 pretrained cased (KoGPT2)

558 104 Updated Oct 3, 2024

Korean HateSpeech Dataset

395 39 Updated Jul 18, 2020

PaperBanana: Automating Academic Illustration For AI Scientists

Python 5,345 360 Updated Mar 22, 2026

Korean corpus repository

Python 745 79 Updated Oct 3, 2022

This API provides programmatic access to the AlphaGenome model developed by Google DeepMind.

Python 1,818 249 Updated Mar 18, 2026

Pretrained ELECTRA Model for Korean

Python 630 136 Updated Feb 19, 2024

The missing tiktoken training code

Rust 401 38 Updated Jan 3, 2026
Next