This repository is Pytorch implementation of Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.
- Python 3.5
- Pytorch 0.4.1
- CUDA 8.0
-
Please refer to MattNet to install mask-faster-rcnn, REFER and refer-parser2. Follow Step 1 & 2 in Training to prepare the data and features.
-
Calculate semantic similarity as supervision infotmation.
-
Download Glove word embedding to cache/word_embedding.
-
Generate semantic similarity and word embedding file.
python tools/prepro_sub_obj_wds.py --dataset ${DATASET} --splitBy ${SPLITBY}
python tools/prepro_sim.py --dataset ${DATASET} --splitBy ${SPLITBY}Train EARN with ground-truth annotation:
sh train.shEvaluate ARN with ground-truth annotation:
sh eval.shWe gather the referring expressions with higher-order or multi-entity relationships (mainly based on the length of the referring expression and the number of entities) from the original RefCOCO, RefCOCO+ and RefCOCOg validation and test set to evaluate the ability of models to reason the complex relationship. You can download the validation set in cache/prepro/.
-
Examples.
The examples for referring expressions with higher-order or multi-entity relationships can be seen in visualization.ipynb.
-
Our performance.
Here we show the number (num) and its percentage (ratio) of the expressions with complex relationships in original validation and test set, and the accuracy (IoU > 0.5) comparison of the max-context pooling (mcxtp) and soft-context pooling (scxtp). The RefCOCOg dataset has longer queries, so the cases with complex relationships are much higher. From the results, we can see soft-context pooling can perform better on complex relational reasoning.
|