Stars
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
MCP server for Atlassian tools (Confluence, Jira)
🤗 smolagents: a barebones library for agents that think in code.
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Papers and codes about Quantized Networks for easier survey and reference.
A repository for research on medium sized language models.
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
Reference implementation for DPO (Direct Preference Optimization)
Tools for managing datasets for governance and training.
Identifying the language of input text using character-level n-grams, with support for 45 languages
LLM training code for Databricks foundation models
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations which is the most serious current problem in widespread adoption of LLM's for many real…
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
A collection of libraries to optimise AI model performances
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
COYO-700M: Large-scale Image-Text Pair Dataset
Ask Me Anything language model prompting
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
A dataset of crowdsourced ratings for machine-generated image captions
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Train Dense Passage Retriever (DPR) with a single GPU
A repository containing datasets and tools to train a watermark classifier.
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retrieval (Lerner et al., ECIR'24)
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Summarizing and Exploring Tabular Data in Conversational Search (SIGIR '20)