- San Francisco Bay Area
-
08:04
(UTC -07:00) - https://philosyang.com/
Stars
A feature-rich command-line audio/video downloader
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
aider is AI pair programming in your terminal
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monito…
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
SoftVC VITS Singing Voice Conversion
Build resilient language agents as graphs.
Download market data from Yahoo! Finance's API
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Faster Whisper transcription with CTranslate2
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
Python app to work with pictures and associated metadata from Apple Photos on macOS. Also includes a package to provide programmatic access to the Photos library, pictures, and metadata.
An unofficial PyTorch implementation of the audio LM VALL-E
崩坏:星穹铁道自动化 | 崩坏:星穹铁道自动锄大地 | 崩坏:星穹铁道锄大地 | 自动锄大地 | 基于模拟按键
Cross-platform CLI and Python drivers for AIO liquid coolers and other devices
🍀️四叶草拼音输入方案,做最好用的基于rime开源的简体拼音输入方案!
A big progressive questing modpack for Minecraft 1.7.10 balanced around the mod GregTech.
[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
Japanese Riichi Mahjong AI agent. (Feel free to extend this agent or develop your own agent)