Stars
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Multilingual Document Layout Parsing in a Single Vision-Language Model
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
Get your documents ready for gen AI
Official Implementation of TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Simulation platform for general-purpose robotics & embodied AI learning.
ROS 2 for Doosan Robot
simple python-based kinematics solver for robot arm
http://vlsiarch.eecs.harvard.edu/research/recommendation/
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Code for "Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks", Gupta et al, CVPR 2018
Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet…
CS231N 2017 video subtitles translation project for Korean Computer Science students
[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨