Starred repositories
Security guard for AI agents — blocks malicious skills, prevents data leaks, protects secrets. 24 detection rules, runtime action evaluation, trust registry.
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Awesome Unified Multimodal Models
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Allow your 🦞 bot to Shout, Speak, with "human" vibe
A survey on MM-LLMs for long video understanding: From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)
Robust Speech Recognition via Large-Scale Weak Supervision
📝A simple and elegant markdown editor, available for Linux, macOS and Windows.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
The project has implemented AI image/video processing based on neural networks, including but not limited to tasks such as denoising, restoration, enhancement, super-resolution.
This repository collects the state-of-the-art algorithms for video/image enhancement using deep learning (AI) in recent years, including super resolution, compression artifact reduction, deblocking…
深度学习面试宝典(含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向)
《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。
🔥[CVPR 2025 Highlight, Official Code] for paper "Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification". Official Weights and Demos provided.…
MirrorMetrics: How to evaluate Stable Diffusion LoRAs. A visual diagnostic tool to detect overfitting, check dataset quality, and fix training settings using InsightFace biometrics.
Computational aesthetic analysis of visual media
Convolutional Neural Networks to predict the aesthetic and technical quality of images.
A great tutorial that gives you a sufficient understanding of the recommendation system
Official implementation of "HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment"
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
图解计算机网络、操作系统、计算机组成、数据库,共 1000 张图 + 50 万字,破除晦涩难懂的计算机基础知识,让天下没有难懂的八股文!🚀 在线阅读:https://xiaolincoding.com
Linux命令大全搜索工具,内容包含Linux命令手册、详解、学习、搜集。https://git.io/linux
FongMi影视和tvbox配置文件,如果喜欢,请Fork自用。使用前请仔细阅读仓库说明,一旦使用将被视为你已了解。