Stars
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model
The RST-LSTM module refines predictions of the sequential text classifier (BERT) on documents with complex discourse
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
Official Repository for paper "Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model"
Code for "Text Generation from Knowledge Graphs with Graph Transformers"
Official implementation of the CODI 2025 paper "Bridging Discourse Treebanks with a Unified RST Parser"
This repository contains the official code and data for the ACL 2023 Findings paper "End-to-End Argument Mining over Varying Rhetorical Structures".
This is our solution for the RuCoCo-23 shared task described in "Light Coreference Resolution for Russian with Hierarchical Discourse Features"
The official code and data for the ACL 2024 Findings paper "Bilingual Rhetorical Structure Parsing with Large Parallel Annotations".
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
A large-scale dataset for numerous academic papers summarization
Updating collection of summarization datasets in 100+ languages, based on our paper "The State and Fate of Summarization Datasets: A Survey".
AI Video dubbing / dubber / Video dubbing / AI dubbing AI 視訊配音
Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles
Fuzzy-Pattern Tsetlin Machine: a paradigm shift in the Tsetlin Machine family of algorithms.
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
Overview of pipelines related to PDF to Markdown document processing.
This repository delivers end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for re…
Rhetorical Structure Theory (RST) Corpus of online learning discussion messages
Rhetorical Structure Theory analyses of listserv messages exchanged during a scholarly debate
official code for "Large Language Models as Optimizers"
This repository contains text data relevant to speech disorders. Dataset comprises of people with language disorder's (Broca's aphasia) unclear sentence, their intended and the corrected version of…
Code and data repository for SIGDIAL 2023: What’s Hard in English RST Parsing? Predictive Models for Error Analysis
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
The #1 open-source SWE-bench Verified implementation
Code for the paper "SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation" (ICLR 2025)