Skip to content
View oroszgy's full-sized avatar
:octocat:
:octocat:

Organizations

@ec-doris @huspacy

Block or report oroszgy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

NLP tools

174 repositories

Hierarchy-Aware Global Model for Hierarchical Text Classification

Python 225 43 Updated Nov 28, 2022

Code for Analyzing Redundancy in Pretrained Transformer Models accepted at EMNLP 2020

Python 14 2 Updated Oct 6, 2020

fast python port of arc90's readability tool, updated to match latest readability.js!

Python 2,870 356 Updated May 3, 2025

This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.

Python 76 25 Updated Sep 13, 2023

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

Python 7,881 567 Updated Jul 11, 2025

a sklearn wrapper for Google's BERT model

Jupyter Notebook 301 70 Updated Oct 26, 2022

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

Python 9,517 705 Updated Dec 8, 2025

multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.

Python 372 52 Updated Nov 21, 2022

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,142 31,507 Updated Dec 22, 2025

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Python 37,489 3,305 Updated Aug 17, 2024

Port of OpenAI's Whisper model in C/C++

C++ 45,244 5,031 Updated Dec 18, 2025

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

Python 1,063 78 Updated Mar 7, 2024

BERT for Multitask Learning

Jupyter Notebook 546 125 Updated Apr 12, 2023

The BiLSTM-CRF model implementation in Tensorflow, for sequence labeling tasks.

JavaScript 704 255 Updated Nov 21, 2022

使用BERT-BiLSTM+CRF进行ner任务(pytorch_lightning版)

Python 46 6 Updated Dec 16, 2022

A library to synthesize text datasets using Large Language Models (LLM)

Python 152 8 Updated Jan 17, 2023

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python 3,332 434 Updated Jul 10, 2025

Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023

Python 109 9 Updated Oct 24, 2023

🦙 Integrating LLMs into structured NLP pipelines

Python 1,355 106 Updated Jan 8, 2025

Easy Natural Language Processing

Python 144 23 Updated Apr 5, 2025

Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition, EACL 2021"

Python 396 43 Updated May 11, 2023

Multi-task model for named-entity recognition, relation extraction, entity mention detection and coreference resolution.

Python 45 6 Updated Jun 26, 2024

PyTorch code for SpERT: Span-based Entity and Relation Transformer

Python 711 150 Updated Feb 1, 2024
Python 2 1 Updated Oct 10, 2024

A Python implementation of Farasa toolkit

Python 138 23 Updated Sep 11, 2025

A blazingly fast and lightweight language detection library for Rust

Rust 429 21 Updated Dec 15, 2025

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 10,791 869 Updated Jun 10, 2024

https://arxiv.org/pdf/1909.04054

Python 78 27 Updated Nov 2, 2022

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

Python 451 44 Updated Nov 15, 2023

Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Python 860 112 Updated Feb 19, 2023