Yolo v3 object detection mAP metric
Posted July 15 by Rokas Balsys
Understanding the mAP (mean Average Precision) Evaluation Metric for Object Detection
In this tutorial, you will figure out how to use the mAP (mean Average Precision) metric to evaluate the performance of an object detection model. I will cover in detail what is mAP, how to calculate it and I will give you an example of how I use it in my YOLOv3 implementation.
The mean average precision (mAP) or sometimes simply just referred to as AP is a popular metric used to measure the performance of models doing document/information retrieval and object detection tasks. So if you time to time read new object detection papers, you may always see that authors compare mAP of their offered methods to most popular ones.
There are multiple deep learning algorithms that exist for object detection like RCNN's: Fast RCNN, Faster RCNN, YOLO, Mask RCNN, etc. All of these models solve two major problems: Classification and Localization:
- Classification: Identify if an object is present in the image and the class of the object;
- Localization: Predict the coordinates of the bounding box around the object when an object is present in the image. Here we compare the coordinates of ground truth and predicted bounding boxes.
While measuring mAP we need to evaluate the performance of both, classifications as well as localization of using bounding boxes in the image.
For object detection, we use the concept of Intersection over Union (IoU). IoU measures the overlap between 2 boundaries. We use that to measure how much our predicted boundary overlaps with the ground truth (the real object boundary):
- Red - ground truth bounding box;
- Green - predicted bounding box.
In simple terms, IoU tells us how well predicted and the ground truth bounding box overlap. You'll see that in code we can set a threshold value for the IoU to determine if the object detection is valid or not.
For COCO, AP is the average over multiple IoU (the minimum IoU to consider a positive match). AP@[.5:.95] corresponds to the average AP for IoU from 0.5 to 0.95 with a step size of 0.05. For the COCO competition, AP is the average over 9 IoU levels on 80 categories (AP@[.50:.05:.95]: start from 0.5 to 0.95 with a step size of 0.05). The following are some other metrics collected for the COCO dataset:
And, because my tutorial series is related to YOLOv3 object detector, here is AP results from authors paper:
In the figure above, AP@.75 means the AP with IoU=0.75.
mAP (mean average precision) is the average of AP. In some contexts, we compute the AP for each class and average them. But in some context, they mean the same thing. For example, under the COCO context, there is no difference between AP and mAP. Here is the direct quote from COCO:
AP is averaged over all categories. Traditionally, this is called "mean average precision" (mAP). We make no distinction between AP and mAP (and likewise AR and mAR) and assume the difference is clear from context.
Ok, let's get back to the beginning, where we need to calculate mAP. First, we need to set a threshold value for the IoU to determine if the object detection is valid or not. Let's say we set IoU to 0.5, in that case:
- If IoU ≥0.5, classify the object detection as True Positive(TP);
- If Iou <0.5, then it is a wrong detection and classifies it as False Positive(FP);
- When ground truth is present in the image and the model failed to detect the object, we classify it as False Negative(FN);
- True Negative (TN): TN is every part of the image where we did not predict an object. This metrics is not useful for object detection, hence we ignore TN.
If we set the IoU threshold value to 0.5 then we'll calculate mAP50, if IoU=0.75, then we calculate mAP75. Sometimes we can see these as mAP@0.5 or mAP@0.75, but this is actually the same.
We use Precision and Recall as the metrics to evaluate the performance. Precision and Recall are calculated using true positives(TP), false positives(FP) and false negatives(FN):
To get mAP, we should calculate precision and recall for all the objects presented in the images.
It also needs to consider the confidence score for each object detected by the model in the image. Consider all of the predicted bounding boxes with a confidence score above a certain threshold. Bounding boxes above the threshold value are considered as positive boxes and all predicted bounding boxes below the threshold value are considered as negative. So, the higher the confidence threshold is, the lower the mAP will be, but we'll be more confident with accuracy.
So, how to calculate general AP? It's quite simple. For each query, we can calculate a corresponding AP. A user can have as many queries as he/she likes against his labeled database. The mAP is simply the mean of all the queries that the use made.
To see, how we get an AP you can check voc_ap function on my GitHub repository. When we have Precision(pre) and Recall(rec) lists, we use the following formula:
for i in list: ap += ((rec[i]-rec[i-1])*pre[i])
We should run this above function for all classes we use.
To calculate the general AP for the COCO dataset, we must loop the evaluation function for IoU[.50:.95] 9 times. Here is the formula from wikipedia:
Here N will be 9 and AP will be the sum of AP50, AP55, …, AP95. This may take a while to calculate these results, but this is the way how we need to calculate the mAP.
Practical YOLOv3 mAP implementation:
First, you should move to my YOLOv3 TensorFlow 2 implementation on GitHub. There is a file called evaluate_mAP.py, the whole evaluation is done in this script.
While writing this evaluation script, I focused on the COCO dataset, to make sure it will work on it. So in this tutorial I will explain how to run this code to evaluate the YOLOv3 model on the COCO dataset.
First, you should download the COCO validation dataset from the following link: http://images.cocodataset.org/zips/val2017.zip. Also in the case for some reason you want to train the model on the COCO dataset, you can download and train dataset: http://images.cocodataset.org/zips/train2017.zip. But it's already 20GB, and it would take really a lot of time to retrain model on COCO dataset.
In TensorFlow-2.x-YOLOv3/model_data/coco/ is 3 files, coco.names, train2017.txt, and val2017.txt files. Here I already placed annotation files, that you won't need to twist your head where to get these files. Next, you should unzip the dataset file and place the val2017 folder in the same directory, it should look following: TensorFlow-2.x-YOLOv3/model_data/coco/val2017/images...
Ok, next we should change a few lines in our yolov3/configs.py:
- You should link TRAIN_CLASSES to 'model_data/coco/coco.names'
- If you wanna train on COCO dataset, change TRAIN_ANNOT_PATH to 'model_data/coco/train2017.txt'
- To validate the model on COCO dataset change TEST_ANNOT_PATH to 'model_data/coco/val2017.txt'
- To change input_size of model, change YOLO_INPUT_SIZE and TEST_INPUT_SIZE to your needed, for example to 512. Now we have all settings set for evaluation. Now I will explain the evaluation process in a few sentences. The whole evaluation process can be divided into 3 parts:
- In the first part, the script creates a mAP folder in the local directory in which it creates another ground-truth folder. Here in creates .json file for every ground-truth image bounding box;
- In the second part, most part is done by our YOLOv3 model, it runs prediction on every image. Similar way as in the first parts, it creates .json file for every class we have and puts the detection bounding box accordingly;
- In the third part, we already have detected and ground-truth bounding boxes. We calculate the AP for each class with a voc_ap function. When we have AP of each class, we average it and we receive mAP.
Here is the output of evaluate_mAP.py script, when we call it with score_threshold=0.05 and iou_threshold=0.50 parameters:
81.852% = aeroplane AP 12.615% = apple AP 31.034% = backpack AP 27.652% = banana AP 60.591% = baseball-bat AP 52.699% = baseball-glove AP 91.895% = bear AP 73.863% = bed AP 43.250% = bench AP 52.618% = bicycle AP 47.743% = bird AP 42.745% = boat AP 23.295% = book AP 52.714% = bottle AP 55.977% = bowl AP 31.840% = broccoli AP 81.399% = bus AP 49.839% = cake AP 52.939% = car AP 2.849% = carrot AP 87.444% = cat AP 46.828% = cell-phone AP 52.618% = chair AP 74.931% = clock AP 57.715% = cow AP 58.847% = cup AP 47.931% = diningtable AP 79.716% = dog AP 24.626% = donut AP 80.199% = elephant AP 79.170% = fire-hydrant AP 46.861% = fork AP 82.098% = frisbee AP 80.181% = giraffe AP 4.545% = hair-drier AP 27.715% = handbag AP 79.503% = horse AP 18.734% = hot-dog AP 73.431% = keyboard AP 39.522% = kite AP 27.280% = knife AP 75.999% = laptop AP 69.728% = microwave AP 67.991% = motorbike AP 79.698% = mouse AP 4.920% = orange AP 58.388% = oven AP 81.872% = parking-meter AP 71.647% = person AP 33.640% = pizza AP 53.462% = pottedplant AP 72.852% = refrigerator AP 37.403% = remote AP 23.826% = sandwich AP 43.898% = scissors AP 62.098% = sheep AP 67.579% = sink AP 73.147% = skateboard AP 46.783% = skis AP 56.955% = snowboard AP 64.119% = sofa AP 27.465% = spoon AP 47.323% = sports-ball AP 68.481% = stop-sign AP 60.784% = suitcase AP 64.247% = surfboard AP 60.950% = teddy-bear AP 78.744% = tennis-racket AP 55.898% = tie AP 31.905% = toaster AP 83.340% = toilet AP 39.502% = toothbrush AP 44.792% = traffic-light AP 88.871% = train AP 48.374% = truck AP 73.030% = tvmonitor AP 67.164% = umbrella AP 59.283% = vase AP 54.945% = wine-glass AP 84.508% = zebra AP mAP = 55.311%
That's it for this tutorial part. I did this tutorial because it's valuable to know how to calculate the mAP of your model. You can use this metric to check how accurate is your custom trained model with validation dataset, you can check how mAP changes when you add more images to your dataset, change threshold, or IoU parameters. This is mostly used when you want to squeeze as much as possible from your custom model. I thought about implementing mAP into the training process to track it on Tensorboard, but I couldn't find an effective way to do that, so if someone finds a way how to do that effectively I would accept pull request on my GitHub, see you in a next tutorial part!