Skip to content
View thomaschhh's full-sized avatar

Organizations

@fraunhofer-iais @Modalities

Block or report thomaschhh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Toolkit for linearizing PDFs for LLM datasets/training

Python 17,100 1,371 Updated Mar 25, 2026
Python 19 Updated Dec 1, 2025

"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

Python 15,077 2,030 Updated Mar 3, 2026

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 13,007 1,857 Updated Dec 19, 2025

Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

Python 41 7 Updated Sep 18, 2024

A lightweight LMM-based Document Parsing Model

Python 6,580 458 Updated Apr 1, 2026

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 8,884 754 Updated Mar 25, 2026

Text-audio foundation model from Boson AI

Python 8,001 613 Updated Jan 18, 2026

ACE-Step: A Step Towards Music Generation Foundation Model

Python 4,260 535 Updated Feb 15, 2026

Super-fast Structured Outputs

Rust 724 59 Updated Apr 1, 2026

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 617 38 Updated Nov 24, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 4,017 331 Updated Aug 14, 2025

tiny vision language model

Python 9,532 753 Updated Nov 14, 2025

OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871

Jupyter Notebook 4,039 23 Updated Mar 20, 2026

Converts text to speech in realtime

Python 3,837 383 Updated Apr 1, 2026

Grundlagenskript fuer Tonmeisterstudenten (2000)

TeX 6 Updated Jul 13, 2025
Python 66 3 Updated Jan 27, 2025

Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…

Python 25,792 2,876 Updated Mar 16, 2026

FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, Comfy…

JavaScript 2,663 364 Updated Mar 30, 2026
19 2 Updated Jun 13, 2024

A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.

Python 3,182 546 Updated Feb 5, 2026

[TIP2026] Official codes of CCSRv2 and CCSRv1: Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution

Python 597 45 Updated Jul 17, 2025

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 85,775 9,919 Updated Apr 3, 2026

Machine Learning for Imbalanced Data, published by Packt

Jupyter Notebook 279 85 Updated Mar 2, 2026

Instructional notebooks on music information retrieval.

Jupyter Notebook 1,269 414 Updated Mar 25, 2026

Understanding Deep Learning - Simon J.D. Prince

Jupyter Notebook 9,309 2,196 Updated Feb 24, 2026

[ECCV 2024] codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

Python 4,055 351 Updated Jul 29, 2025