Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks
Designed for text embedding and ranking tasks
Qwen3-Next: 80B instruct LLM with ultra-long context up to 1M tokens
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Foundational Models for State-of-the-Art Speech and Text Translation
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
Learning embeddings for classification, retrieval and ranking
Official code for Style Aligned Image Generation via Shared Attention
Implementation of the Surya Foundation Model for Heliophysics
Open-source large language model family from Tencent Hunyuan
tiktoken is a fast BPE tokeniser for use with OpenAI's models
The official pytorch implementation of our paper
A method to increase the speed and lower the memory footprint
Uncommon Objects in 3D dataset
Encoder of greater-than-word length text trained on a variety of data
VaultGemma: 1B DP-trained Gemma variant for private NLP tasks
VibeVoice: Open-source multi-speaker long-form text-to-speech model
Learning to Act by Watching Unlabeled Online Videos
Chinese and English multimodal conversational language model
Official implementation of Watermark Anything with Localized Messages
Summarization model fine-tuned on CNN/DailyMail articles
Efficient English embedding model for semantic search and retrieval