-
PlayStation
- Japan
- https://awkrail.github.io/nishimura/
Stars
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)
Tesseract Open Source OCR Engine (main repository)
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
🫱 🎥 Yet another shot detector implemented in C++, which aims a fast alternative with PySceneDetect.
[ICCVW'25] Annotation, metadata, and description of EgoOops dataset proposed in "EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts".
[EMNLP2024 Demo], [ICASSP 2025], [ICASSP 2026] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
A video reader for extracting motion vectors and residuals from encoded H.264 videos.
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
[ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
DocBank: A Benchmark Dataset for Document Layout Analysis
(CVPR 2023) Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
Code for "Neural 3D Reconstruction in the Wild", SIGGRAPH 2022 (Conference Proceedings)
S3D Text-Video model trained on HowTo100M using MIL-NCE
An implementation of a small TCP/IP protocol stack for learning.
Implementation of "Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model"
Using LLMs and pre-trained caption models for super-human performance on image captioning.
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
An official implementation (base) of recipe generation from unsegmented cooking videos [Nishimura+ arXiv22]. Joint learning approach of event selector and sentence generator.
mlvlab / VT-TWINS
Forked from KoDohwan/VT-TWINSVideo-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)
ChangeIt dataset with more than 2600 hours of video with state-changing actions published at CVPR 2022
Official implementation of state-aware video procedural captioning (ACM MM 2021)
A Lisp interpreter implemented in Conway's Game of Life