Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- C
- C#
- C++
- CMake
- CSS
- ChucK
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Gherkin
- Go
- HCL
- HTML
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- Lex
- Lua
- MATLAB
- MDX
- Macaulay2
- Makefile
- Markdown
- Nim
- Objective-C
- PHP
- Perl
- Python
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Singularity
- Swift
- TSQL
- TeX
- TypeScript
- Vim Script
- Vue
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
Chinese NER problem that needs to capture 18 types of entities in medical conversation text. The process is divided into 4 parts that are encapsulated in high-level abstract classes. We control the…
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU
Browser automation CLI for AI agents
This is a database of 300.000+ symbols containing Equities, ETFs, Funds, Indices, Currencies, Cryptocurrencies and Money Markets.
FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.
CodMate is a macOS SwiftUI app for managing CLI AI sessions: browse, search, organize, resume, and review work produced by Codex, Claude Code, and Gemini CLI. It focuses on speed, a compact three-c…
A high-performance, 100% client-side tool for removing Gemini AI watermarks. Built with pure JavaScript, it leverages a mathematically precise Reverse Alpha Blending algorithm rather than unpredict…
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca…
Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Production ready toolkit to run AI locally
🦄️ 🎃 👻 Clash Premium 规则集(RULE-SET),兼容 ClashX Pro、Clash for Windows 等基于 Clash Premium 内核的客户端。
Roo Code gives you a whole dev team of AI agents in your code editor.
Mustango: Toward Controllable Text-to-Music Generation
A Cloudflare Worker that integrates with a Telegram Bot to filter spam and manage silence consensus polls.
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation