Stars
中文nlp解决方案(大模型、数据、模型、训练、推理)
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Cross-platform named pipe for Python
AdalFlow: The library to build & auto-optimize LLM applications.
我的学习笔记。My notebook, deploy at https://yindaheng98.github.io/
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
Official implementation for the arXiv paper "Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields"
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
An open-sourced end-to-end VLM-based GUI Agent
[arXiv 2023.02] Codes for my paper "DirectMHP: Direct 2D Multi-Person Head Pose Estimation with Full-range Angles"
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.
This repository has been moved. The new location is in https://github.com/TexasInstruments/edgeai-tensorlab
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
Accepted by CVPR Workshop 2024
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
Application to detect between actual faces and fake faces in realtime with Computer Vision and Deep Learning
the implement of 3D Mask Face Anti-spoofing with Remote Photoplethysmography
3D Passive Face Liveness Detection (Anti-Spoofing) & Deepfake detection. A single image is needed to compute liveness score. 99,67% accuracy on our dataset and perfect scores on multiple public dat…
[CVPR 2024 🔥] Official implementation of the paper "⏳ Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation"