-
Meta Superintelligence
- SF
- http://jacobmarks.github.io
- in/jacob-marks
- https://medium.com/@jacob_marks
Stars
An open source implementation of CLIP.
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Refine high-quality datasets and visual AI models
🧙 Build, run, and manage data pipelines for integrating and transforming data.
OCR model that handles complex tables, forms, handwriting with full layout.
TripoSR: Fast 3D Object Reconstruction from a Single Image
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
Image to prompt with BLIP and CLIP
[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cross-encoders and more. Created by Prithivi Da, open for PRs & C…
A Library for Differentiable Logic Gate Networks
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
code for CVPR2024 paper: DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Liquid Audio - Speech-to-Speech audio models by Liquid AI
Code for ICML 2023 paper, "PFGM++: Unlocking the Potential of Physics-Inspired Generative Models"
Search docs.voxel51.com with an LLM!
[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods…
AI assistant that can query visual datasets, search the FiftyOne docs, and answer general computer vision questions
ICCV 2023 Paper Global Features are All You Need for Image Retrieval and Reranking Official Repository
PyTorch implementation of CLIP Maximum Mean Discrepancy (CMMD) for evaluating image generation models.
ACL 2025: Synthetic data generation pipelines for text-rich images.
Open source AI/ML capabilities for the FiftyOne ecosystem