Yolo v3 object detection mAP metric

Posted July 15 by Rokas Balsys

##### Understanding the mAP (mean Average Precision) Evaluation Metric for Object Detection

In this tutorial, you will figure out how to use the mAP (mean Average Precision) metric to evaluate the performance of an object detection model. I will cover in detail what is mAP, how to calculate it and I will give you an example of how I use it in my YOLOv3 implementation.

The mean average precision (mAP) or sometimes simply just referred to as AP is a popular metric used to measure the performance of models doing document/information retrieval and object detection tasks. So if you time to time read new object detection papers, you may always see that authors compare mAP of their offered methods to most popular ones.

There are multiple deep learning algorithms that exist for object detection like RCNN's: Fast RCNN, Faster RCNN, YOLO, Mask RCNN, etc. All of these models solve two major problems: Classification and Localization:

• Classification: Identify if an object is present in the image and the class of the object;
• Localization: Predict the coordinates of the bounding box around the object when an object is present in the image. Here we compare the coordinates of ground truth and predicted bounding boxes.

While measuring mAP we need to evaluate the performance of both, classifications as well as localization of using bounding boxes in the image.

For object detection, we use the concept of Intersection over Union (IoU). IoU measures the overlap between 2 boundaries. We use that to measure how much our predicted boundary overlaps with the ground truth (the real object boundary): • Red - ground truth bounding box;
• Green - predicted bounding box.

In simple terms, IoU tells us how well predicted and the ground truth bounding box overlap. You'll see that in code we can set a threshold value for the IoU to determine if the object detection is valid or not.

For COCO, AP is the average over multiple IoU (the minimum IoU to consider a positive match). AP@[.5:.95] corresponds to the average AP for IoU from 0.5 to 0.95 with a step size of 0.05. For the COCO competition, AP is the average over 9 IoU levels on 80 categories (AP@[.50:.05:.95]: start from 0.5 to 0.95 with a step size of 0.05). The following are some other metrics collected for the COCO dataset: Image source

And, because my tutorial series is related to YOLOv3 object detector, here is AP results from authors paper: Source to YOLOv3 paper

In the figure above, AP@.75 means the AP with IoU=0.75.

mAP (mean average precision) is the average of AP. In some contexts, we compute the AP for each class and average them. But in some context, they mean the same thing. For example, under the COCO context, there is no difference between AP and mAP. Here is the direct quote from COCO:

AP is averaged over all categories. Traditionally, this is called "mean average precision" (mAP). We make no distinction between AP and mAP (and likewise AR and mAR) and assume the difference is clear from context.

Ok, let's get back to the beginning, where we need to calculate mAP. First, we need to set a threshold value for the IoU to determine if the object detection is valid or not. Let's say we set IoU to 0.5, in that case:

• If IoU ≥0.5, classify the object detection as True Positive(TP);
• If Iou <0.5, then it is a wrong detection and classifies it as False Positive(FP);
• When ground truth is present in the image and the model failed to detect the object, we classify it as False Negative(FN);
• True Negative (TN): TN is every part of the image where we did not predict an object. This metrics is not useful for object detection, hence we ignore TN.

If we set the IoU threshold value to 0.5 then we'll calculate mAP50, if IoU=0.75, then we calculate mAP75. Sometimes we can see these as mAP@0.5 or mAP@0.75, but this is actually the same.

We use Precision and Recall as the metrics to evaluate the performance. Precision and Recall are calculated using true positives(TP), false positives(FP) and false negatives(FN): To get mAP, we should calculate precision and recall for all the objects presented in the images.

It also needs to consider the confidence score for each object detected by the model in the image. Consider all of the predicted bounding boxes with a confidence score above a certain threshold. Bounding boxes above the threshold value are considered as positive boxes and all predicted bounding boxes below the threshold value are considered as negative. So, the higher the confidence threshold is, the lower the mAP will be, but we'll be more confident with accuracy.

So, how to calculate general AP? It's quite simple. For each query, we can calculate a corresponding AP. A user can have as many queries as he/she likes against his labeled database. The mAP is simply the mean of all the queries that the use made.

To see, how we get an AP you can check voc_ap function on my GitHub repository. When we have Precision(pre) and Recall(rec) lists, we use the following formula:

for i in list:
ap += ((rec[i]-rec[i-1])*pre[i])


We should run this above function for all classes we use.

To calculate the general AP for the COCO dataset, we must loop the evaluation function for IoU[.50:.95] 9 times. Here is the formula from wikipedia: Here N will be 9 and AP will be the sum of AP50, AP55, …, AP95. This may take a while to calculate these results, but this is the way how we need to calculate the mAP.

##### Practical YOLOv3 mAP implementation:

First, you should move to my YOLOv3 TensorFlow 2 implementation on GitHub. There is a file called evaluate_mAP.py, the whole evaluation is done in this script.

While writing this evaluation script, I focused on the COCO dataset, to make sure it will work on it. So in this tutorial I will explain how to run this code to evaluate the YOLOv3 model on the COCO dataset.

First, you should download the COCO validation dataset from the following link: http://images.cocodataset.org/zips/val2017.zip. Also in the case for some reason you want to train the model on the COCO dataset, you can download and train dataset: http://images.cocodataset.org/zips/train2017.zip. But it's already 20GB, and it would take really a lot of time to retrain model on COCO dataset.

In TensorFlow-2.x-YOLOv3/model_data/coco/ is 3 files, coco.names, train2017.txt, and val2017.txt files. Here I already placed annotation files, that you won't need to twist your head where to get these files. Next, you should unzip the dataset file and place the val2017 folder in the same directory, it should look following: TensorFlow-2.x-YOLOv3/model_data/coco/val2017/images...

Ok, next we should change a few lines in our yolov3/configs.py:
- You should link TRAIN_CLASSES to 'model_data/coco/coco.names'
- If you wanna train on COCO dataset, change TRAIN_ANNOT_PATH to 'model_data/coco/train2017.txt'
- To validate the model on COCO dataset change TEST_ANNOT_PATH to 'model_data/coco/val2017.txt'

- To change input_size of model, change YOLO_INPUT_SIZE and TEST_INPUT_SIZE to your needed, for example to 512. Now we have all settings set for evaluation. Now I will explain the evaluation process in a few sentences. The whole evaluation process can be divided into 3 parts:

• In the first part, the script creates a mAP folder in the local directory in which it creates another ground-truth folder. Here in creates .json file for every ground-truth image bounding box;
• In the second part, most part is done by our YOLOv3 model, it runs prediction on every image. Similar way as in the first parts, it creates .json file for every class we have and puts the detection bounding box accordingly;
• In the third part, we already have detected and ground-truth bounding boxes. We calculate the AP for each class with a voc_ap function. When we have AP of each class, we average it and we receive mAP.

Here is the output of evaluate_mAP.py script, when we call it with score_threshold=0.05 and iou_threshold=0.50 parameters:

81.852% = aeroplane AP
12.615% = apple AP
31.034% = backpack AP
27.652% = banana AP
60.591% = baseball-bat AP
52.699% = baseball-glove AP
91.895% = bear AP
73.863% = bed AP
43.250% = bench AP
52.618% = bicycle AP
47.743% = bird AP
42.745% = boat AP
23.295% = book AP
52.714% = bottle AP
55.977% = bowl AP
31.840% = broccoli AP
81.399% = bus AP
49.839% = cake AP
52.939% = car AP
2.849% = carrot AP
87.444% = cat AP
46.828% = cell-phone AP
52.618% = chair AP
74.931% = clock AP
57.715% = cow AP
58.847% = cup AP
47.931% = diningtable AP
79.716% = dog AP
24.626% = donut AP
80.199% = elephant AP
79.170% = fire-hydrant AP
46.861% = fork AP
82.098% = frisbee AP
80.181% = giraffe AP
4.545% = hair-drier AP
27.715% = handbag AP
79.503% = horse AP
18.734% = hot-dog AP
73.431% = keyboard AP
39.522% = kite AP
27.280% = knife AP
75.999% = laptop AP
69.728% = microwave AP
67.991% = motorbike AP
79.698% = mouse AP
4.920% = orange AP
58.388% = oven AP
81.872% = parking-meter AP
71.647% = person AP
33.640% = pizza AP
53.462% = pottedplant AP
72.852% = refrigerator AP
37.403% = remote AP
23.826% = sandwich AP
43.898% = scissors AP
62.098% = sheep AP
67.579% = sink AP
73.147% = skateboard AP
46.783% = skis AP
56.955% = snowboard AP
64.119% = sofa AP
27.465% = spoon AP
47.323% = sports-ball AP
68.481% = stop-sign AP
60.784% = suitcase AP
64.247% = surfboard AP
60.950% = teddy-bear AP
78.744% = tennis-racket AP
55.898% = tie AP
31.905% = toaster AP
83.340% = toilet AP
39.502% = toothbrush AP
44.792% = traffic-light AP
88.871% = train AP
48.374% = truck AP
73.030% = tvmonitor AP
67.164% = umbrella AP
59.283% = vase AP
54.945% = wine-glass AP
84.508% = zebra AP
mAP = 55.311%


##### Conclusion:

That's it for this tutorial part. I did this tutorial because it's valuable to know how to calculate the mAP of your model. You can use this metric to check how accurate is your custom trained model with validation dataset, you can check how mAP changes when you add more images to your dataset, change threshold, or IoU parameters. This is mostly used when you want to squeeze as much as possible from your custom model. I thought about implementing mAP into the training process to track it on Tensorboard, but I couldn't find an effective way to do that, so if someone finds a way how to do that effectively I would accept pull request on my GitHub, see you in a next tutorial part!