Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
-
Updated
Sep 18, 2016 - Python
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset
High-resolution Networks for the Fully Convolutional One-Stage Object Detection (FCOS) algorithm
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
[ECCV 2020] Boundary-preserving Mask R-CNN
A tensorflow implement mobilenetv3 centernet, which can be easily deployeed on android(MNN) and ios(CoreML).
VarifocalNet: An IoU-aware Dense Object Detector
SWA Object Detection
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
A simple Python API (built on top of TensorFlow) for neural image captioning with MSCOCO data.
A deep-learning object detection project pre-trained on COCO dataset
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
An ongoing research project on image entropy assessment using machine learning.
Video Platform for Action Recognition and Object Detection in Pytorch
Add a description, image, and links to the mscoco topic page so that developers can more easily learn about it.
To associate your repository with the mscoco topic, visit your repo's landing page and select "manage topics."