-
Northeastern University
- Boston
-
04:05
(UTC -12:00)
Stars
[VLDB' 25] Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
A framework for few-shot evaluation of language models.
Official Implementation for "ATP: All-in-one tuning and structural pruning for domain-specific llms"
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
AdaCode is licensed under CC BY-NC-SA 4.0 https://creativecommons.org/licenses/by-nc-sa/4.0/
[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"
Large-scale text-video dataset. 10 million captioned short videos.
Stable Diffusion web UI
ykk648 / AnimateDiff-I2V
Forked from guoyww/AnimateDiffAnimateDiff I2V version.
A retrain of AnimateDiff to be conditional on an init image
A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
High-efficiency floating-point neural network inference operators for mobile, server, and Web