-
École des Ponts ParisTech / Inria
- https://lucasventura.com/
- @Lucas__Ventura
- @lucasventura.com
- in/lucasventurar
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Code for MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency
[SIGGRAPH Asia 2025 - TOG] Official implementation of MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te…
The simplest, fastest repository for training/finetuning small-sized VLMs.
Multi-Camera Hand-Eye Calibration Framework for calibrating a camera network with respect to a robot arm
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
(EarthVision 2025 - CVPR Workshop) Official repository of DAFA-LS, a dataset of satellite image time series for the task of archaeological looting detection.
HORT: Monocular Hand-held Objects Reconstruction with Transformers, ICCV 2025
[CVPR 2025 - Spotlight] Official PyTorch implementation of MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views
Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Code for "Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation"
A Python project for generating Spot It! (a.k.a Dobble) cards.
Official implementation of "Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy."
Handwritten Text Recognition and Character Detection
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Expressive Neural Network: A Neural Network Model with DCT Adaptive Activation Functions
Research code for ACL2024 paper: "Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline"
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Reliability in Semantic Segmentation: Can We Use Synthetic Data? (ECCV 2024)
Implementation of the multi-temporal UTAE for the task of satellite image time series semantic change detection (SITS-SCD)