-
Sharry Cloud
- Jinan, Shandong
-
12:09
(UTC +08:00)
Highlights
Lists (2)
Sort Name ascending (A-Z)
Stars
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
A lightweight LMM-based Document Parsing Model
Klavis AI (YC X25): MCP integration platforms that let AI agents use tools reliably at any scale
The open source platform for AI-native application development.
Nexent is a zero-code platform for auto-generating agents — no orchestration, no complex drag-and-drop required. Nexent also offers powerful capabilities for agent running control, data processing …
[ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.
Agent-ready RPA suite with out-of-the-box automation tools. Built for individuals and enterprises.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.
https://dev.to/answeryt/the-demo-spell-and-production-dilemma-of-ai-agents-how-i-built-a-self-learning-agent-system-4okk
PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.
Res-SAM Framework for GPR Underground Hazard Detection
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
🛠️ DeepAgent: A General Reasoning Agent with Scalable Toolsets
Multi-Agent System Framework For Complex Tasks
Framework that enables fine-tuning of vision-language grounding models on custom datasets
A powerful baseline for image classification, face recognition and image retrieval with Pytorch
YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework(Supports RGBT detection for all YOLO series from YOLOv3 to YOLOv13, as well as RTDETR. 【Ultralytics YOLOv…
"LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"
[BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
INFTY Engine: An Optimization Toolkit to Support Continual AI