Stars
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
A high-throughput and memory-efficient inference and serving engine for LLMs
revolutionary new technology that turns any image into obama
Fast and memory-efficient exact attention
A curated list of balanced multimodal learning methods.
A curated collection of papers on multimodal content understanding(MCU),including Fine-Grained Visual r Recognition/Classification(FGVR/FGVC), Content Moderation with Collaborative Large and Small …
我希望全世界都能使用这个系统,实现AI的真正0门槛,人人都能体验到AI带来的好处,而并不只是掌握在少数人手里。支持上千种垂直场景,支持AI模型定制化和AI算法定制化开发 深度融合,赋能万物智视:EasyAIoT 构筑了物联网设备(尤其是海量摄像头)的高效接入与管控网络。我们深度融合流媒体实时传输技术与前沿人工智能(AI),打造一体化服务核心。这套方案不仅打通了异构设备的互联互通,更将高清视频…
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Bitwuzla-MachBV at SMT-COMP 2025
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A linear estimator on top of clip to predict the aesthetic quality of pictures
📄 同济大学本科生毕业设计论文模板 | Tongji University Undergraduate Thesis Template | Overleaf / Mac / Linux / Windows / Workshop / Docker
CS-BAOYAN / CS-BAOYAN-2024
Forked from CS-BAOYAN/CS-BAOYAN-20232024年保研经验贴和相关物料
同济大学UIC用户交互技术 + 语音识别期末项目
This system is a school hospital access management system, which can not only realize the complete access function, but also display the information of surrounding hospitals' consultation rooms, so…
The IDE for competitive programming 🎉 | Fetch, Code, Compile, Run, Check, Submit 🚀