This repo contains the evalaution code for Task 1 and 2 and Task 3 in ICText challenge.
- The image path and id will be provided through this path data/images.json. Your docker image (either in TensorFlow or PyTorch) MUST use it to read the images for inference.
- Then, you MUST output the results to this path output/result.json. So that the result JSON will be loaded by the evaluation docker to evaluate against evaluation/gt.json for final score.
- All the mentioned JSON files are just examples for you to understand how the overall evaluation pipeline works. Please remember to follow the correct output result JSON format so that the it will be evaluated successfully.
Setup and run this repo for Task 1 and 2 by:
$ bash run_task1_2.shMake sure docker is setup to use GPU through nvidia docker and run this for Task 3:
$ bash run_task3.shThe flow of evaluation for Task 3 is as follows:
- Start Timer
- Start Algorithm in TensorFlow or PyTorch
- Start Evaluation when Algorithm finishes
They will be running in parallel through command chaining.
. ├── data (Folder to store images) │ └── .gitkeep ├── evaluation (Contains Evaluation code) │ ├── evaluation/Dockerfile │ ├── evaluation/coco.py │ ├── evaluation/cocoeval.py │ ├── evaluation/gt.json │ ├── evaluation/ictext_eval.py │ ├── evaluation/main.py │ └── evaluation/requirements.txt ├── output (Output of the model should be saved here as result.json) │ ├── output/.gitkeep │ └── output/result.json ├── tensorflow (Sample folder to store code) │ ├── tensorflow/Dockerfile │ └── tensorflow/main.py ├── timer (Contains code to get FPS and used GPU memory size for task 3 evaluation) │ ├── timer/Dockerfile │ └── timer/main.py ├── torch (Sample folder to store code) │ ├── torch/Dockerfile │ └── torch/main.py ├── utilization (Contains log file to keep track of GPU usage every seconds) │ └── utilization/log.csv ├── README.md ├── run_task1_2.sh └── run_task3.sh
You can find the main evaluation code at evaluation/ictext_eval.py. The evaluation algorithm is taken from pycocotools with a few changes:
- Polygon coordinates [x1,1,...,x4,y4] will be used to replace bbox [x,y,w,h] for evalution.
- Reject submission with empty aesthetic labels, length of aesthetic labels != 3 and not one hot encoded. Please find the relevant code under the function 'loadRes' in coco.py.
- Multi-label score will be calculated based on the matching criteria of IoU>0.5, all area regions and matched based on ground truth to prediction. Please find the relevant code under the function 'evaluateImg' in cocoeval.py.
- All legible ground truth instances will be considered, and a default value of [0,0,0] will be given if there is no detection for the ground truth. Please find the relevant code under the function 'evaluateImg' in cocoeval.py.
- We will use F-2 score instead of F-1 score as we want to prioritize recall more than precision. Please find the relevant code under the function 'accumulate' in cocoeval.py. The score of 0 will be given for zero division cases. Only if all the elements of ground truth and prediction are zero, then we will flip them to 1.
- We set the default FPS to 30 and default GPU memory to 4000MB. For the calculation of 3S score, refer to the following formula: 3S = 0.2 x normalised speed + 0.2 x (1-normalised size) + 0.6 x normalised score
- We will only consider the 3S score when mAP (Task 3.1), mAP and F-Score (Task 3.2) are more than or equal to 0.5. This is the lower bound performance that we set for our dataset.