-
gerzz.inc
- shanghai
- dubbing-ai.com dubbingai.io
Stars
Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasoning.
Chrome extension & CLI to let agents control your browser. Runs Playwright snippets in a stateful sandbox. Available as CLI or MCP
Official JAX implementation of End-to-End Test-Time Training for Long Context
MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7, achieves 74.0 and 75.3 on the BrowseComp and BrowseComp Zh, respectively.
This repository contains tools to download, crawl, and process French political speeches from the vie-publique.fr public dataset. It allows for the collection of speech metadata and the scraping of…
General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes.
MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B
DeepTutor: Agent-native, Open-sourced Personalized Tutoring. https://deeptutor.info/.
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
A collection of sample agents built with Agent Development Kit (ADK)
A highly optimized engine for neutts-air model to generate minutes of audio in seconds. Over 200x realtime on modern hardware!
A family of efficient speech models for multilingual phone recognition
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
Generate audio signals corresponding to moving sources/receivers in a shoebox-shaped room (Python)
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
A new dataset that includes long audio, captions of local audio events, and temporal boundaries
SpikeMamba presents a novel integration of spiking neural networks (SNNs) with the Mamba state space model architecture, investigating the potential for biologically-inspired temporal dynamics in l…
Resources to develop programming and software development skills
Extracted system prompts from Anthropic - Claude Fable 5, Opus 4.8, Claude Code, Claude Design. OpenAI - ChatGPT 5.5 Thinking, GPT 5.5 Instant, Codex. Google - Gemini 3.5 Flash, 3.1 Pro, Antigravit…
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
Official implementation: "AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation"
[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"
Towards Fine-grained Audio Captioning with Multimodal Contextual Cues