Yolo v3 with TensorFlow 2

Posted May 10 by Rokas Balsys

Training custom YOLO v3 object detection model

YOLOv3 is one of the most popular real-time object detectors in Computer Vision. If you heard something more popular, I would like to hear it.

In my previous tutorials, I showed you, how to simply use YOLO v3 object detection with the TensorFlow 2.x application and how to train Mnist custom object detection. At the end of the tutorial I promised, that I will show you how to train custom object detection. It was a challenging task, but I found a way to do that. However, before training a custom object detector, we must know where we may get a custom dataset or how we should label it, so in this tutorial, I will show you where you could get labeled dataset, how to prepare it for training, and finally how to train it!

In this step-by-step tutorial, I will start with a simple case of how to train a 7-class object detector (we could use this method to get a dataset for every detector you may use). Therefore, I will build a Vehicle registration plate, Traffic sign, Traffic light, Car, Bus, Truck, and Person object detector.

1. Dataset:

As with any deep learning task, the first most important task is to prepare the training dataset. Dataset is the fuel that runs any deep learning model.

Same as in my past tutorials, I will use images from Google's OpenImagesV6 dataset, publicly available online. It is a very big dataset with more than 600 different categories of an object. The dataset contains the bounding box, segmentation, or relationship annotations for these objects. As a whole, the dataset is more than 600GB of size, but we will download the images and classes only needed for our custom detector.

So, we have a link to this large dataset, but it's not explained how we should download images and labels we need, should we download them one by one? No, there is an amazing OIDv4 ToolKit from GitHub with a full explanation of how to use it.

This toolkit really makes our life easier when we want to train a custom object detection model with popular objects and when we don't know where to get labeled images. This toolkit allows downloading images from OID v6 seamlessly. The installation is easy and clearly explained in the readme file. The toolkit is loaded with a variety of options. For example, the OIDv4 Toolkit allows us to download almost any particular class of interest from the given database. You can explore this dataset and you can check if there is a class you need for custom detection.

The toolkit can be downloaded from link I mentioned above or cloned by the following command:
git clone https://github.com/pythonlessons/OIDv4_ToolKit.git

If you installed requirements from my original project, you can skip the following step, otherwise, the first thing you should do is install necessary packages:
pip install -r OIDv4_ToolKit/requirements.txt

2. Using toolkit:

At first start, you will be asked to download class-descriptions-boxable.csv (contains the name of all 600+ classes with their corresponding 'LabelName'), test-annotations-bbox.csv and train-annotations-bbox.csv (the file contains one bounding box (bbox for short) coordinates for one image, and it also has this bbox's Label Name and current image's ID from the validation set of OIDv6) files to OID/csv_folder directory.

3. Downloading database images:

First, we should check if the database has an appropriate image class we need? Usually, I go to OIDv6 page -> click on explore and in a search tab try to find my needed class. In my example, I will search for "Vehicle registration plate", "Traffic sign", "Traffic light", "Car", "Bus", "Truck", and "Person". To download all of them I simply can use OIDv4_ToolKit. First, open OIDv4_ToolKit directory: cd OIDv4_ToolKit, and from there open terminal. In terminal write following command:
python main.py downloader --classes 'Vehicle registration plate' 'Traffic sign' 'Traffic light' Car Bus Truck Person --type_csv train --limit 2000

With this command, I will download 2000 training images for each class, and place them in the train folder. As I mentioned before if you are using this for the first time, this will first download train-annotations-bbox.csv CSV file and download the requested images from the specified class.

After it finished downloading training dataset, do the same for test dataset:
python main.py downloader --classes 'Vehicle registration plate' 'Traffic sign' 'Traffic light' Car Bus Truck Person --type_csv test --limit 200

After downloading the test and train dataset, folders structure should look following:

│    ...
│    train.py
│    detection_custom.py
│    ...
└─── OIDv4_ToolKit
    └─── OID
        └─── csv_folder
        │   │
        │   └─── class-descriptions-boxable.csv
        │   │
        │   └─── test-annotations-bbox.csv
        │   │
        │   └─── train-annotations-bbox.csv
        └─── OID
            └─── Dataset
                └─── train
                │   │
                │   └─── Bus, Car, Person, ...
                └─── test
                    └─── Bus, Car, Person, ...

4. Converting label files to XML:

If you would open one of the label files you might see class and coordinates of points in type of this: "name_of_the_class left top right bottom", where each coordinate is denormalized. So, the four different values correspond to the actual number of pixels of the related image.

So, we need to convert the text annotations format to XML. You may ask why we should convert it to XML? The XML format is quite popular and often used in other object detection algorithms, so I wrote a script to do this conversion. If you follow this tutorial and have the same file structure as I mentioned above, in tools folder is oid_to_pascal_voc_xml.py script. Simply run this script and after it finished conversion you should find .xml files created.

5. Converting XML to YOLO v3 file structure:

First, to train a Yolo v3 model there are requirements how annotation file should look:

  • One row for one image;
  • Row format: image_file_path box1 box2 … boxN;
  • Box format: x_min,y_min,x_max,y_max,class_id x_min2,y_min2,…;
  • Here is an example:
    path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
    path/to/img2.jpg 120,300,250,600,2

Now, we should have our .xml files prepared from above 4th step or we have manually labeled images. To train our custom object detection model we need annotations file and class file. Both of these files will be created with XML_to_YOLOv3.py script in the tools folder, same as in 4th step, simply run this script. After it finished a conversion you should find Dataset_names.txt, Dataset_train.txt and Dataset_test.txt files in model_data folder.

6. Change Yolo v3 configurations:

Now you need to change configs.py file to following:

# YOLO options
YOLO_DARKNET_WEIGHTS        = "./model_data/yolov3.weights"
YOLO_COCO_CLASSES           = "./model_data/coco.names"
YOLO_STRIDES                = [8, 16, 32]
YOLO_INPUT_SIZE             = 416
YOLO_ANCHORS                = [[[10,  13], [16,   30], [33,   23]],
                               [[30,  61], [62,   45], [59,  119]],
                               [[116, 90], [156, 198], [373, 326]]

# Train options
TRAIN_CLASSES               = "./mnist/mnist.names"
TRAIN_CLASSES               = "./model_data/Dataset_names.txt"
TRAIN_ANNOT_PATH            = "./mnist/mnist_train.txt"
TRAIN_ANNOT_PATH            = "./model_data/Dataset_train.txt"
TRAIN_LOGDIR                = "./log"
TRAIN_BATCH_SIZE            = 8
TRAIN_INPUT_SIZE            = 416
TRAIN_DATA_AUG              = True
TRAIN_TRANSFER              = False
TRAIN_TRANSFER              = True
TRAIN_LR_INIT               = 1e-4
TRAIN_LR_END                = 1e-6
#TRAIN_EPOCHS                = 30
TRAIN_EPOCHS                = 100

# TEST options
TEST_ANNOT_PATH             = "./mnist/mnist_test.txt"
TEST_ANNOT_PATH             = "./model_data/Dataset_test.txt"
TEST_BATCH_SIZE             = 4
TEST_INPUT_SIZE             = 416
TEST_DATA_AUG               = False
TEST_IOU_THRESHOLD          = 0.45

Now just start the training process with following terminal command from main TensorFlow-2.x-YOLOv3 folder:
python train.py

After a while I recommend you checking Tensorboard, to track the training process:
tensorboard --logdir=log

6. Transfer learning or train from zero?

Transfer learning is a technique to reuse the already pre-trained model on a new problem. It's currently very popular in deep learning because it can train deep neural networks with comparatively little data and in a much shorter time. This is very useful since most real-world problems typically do not have millions of labeled data points to train such complex models.

In transfer learning, the knowledge of an already trained machine learning model is applied to a different but related problem. For example, if you trained a simple classifier to predict whether an image contains a car, you could use the knowledge that the model gained during its training to recognize other objects like a truck.

With transfer learning, we basically try to exploit what has been learned in one task to improve generalization in another. We transfer the weights that a network has learned at "task A" to a new "task B."

The general idea is to use the knowledge a model has learned from a task with a lot of available labeled training data in a new task that doesn't have much data. Instead of starting the learning process from scratch, we start with patterns learned from solving a related task.

Transfer learning is mostly used in computer vision and natural language processing tasks like sentiment analysis due to the huge amount of computational power required.

Transfer learning has become quite popular in combination with neural networks that require huge amounts of data and computational power.

In configs.py you may see the line: TRAIN_TRANSFER, this is used to use transfer learning or not. This is how transfer learning looks in code:

    Darknet = Create_Yolov3(input_size=input_size)
    load_yolo_weights(Darknet, Darknet_weights) # use darknet weights
yolo = Create_Yolov3(input_size=input_size, training=True, CLASSES=TRAIN_CLASSES)
    except ValueError:
        print("Shapes are incompatible, transfering Darknet weights")
    for i, l in enumerate(Darknet.layers):
        layer_weights = l.get_weights()
        if layer_weights != []:
                print("skipping", yolo.layers[i].name)

First, we create a model with original weights, then we create our custom model and simply copy all the weights from original model layers to custom model layers which are the same.

To make a better understanding of why we use transfer learning I trained (on GTX1080TI) two models with the same ('Vehicle registration plate' 'Traffic sign' 'Traffic light' Car Bus Truck Person) dataset.

First here are the results from Tensorboard without transfer learning:


To evaluate our model performance we look at total validation loss, this is why we need testing data while training. As you can see from this chart we achieved the best results(lowest low) within 43 steps with an evaluation value of 7.724. To get this model it took almost 27 hours of training.

Here are results from Tensorboard with transfer learning:


While using transfer learning we achieved the best results within the 7th step of validation with an evaluation value of 6.614 and this took only 3 hours. As you can see after this point the validation curve began to rise, this means that we started overfitting and it's not getting better while training it. The only way to make it better is by getting more training data, use more augmentation techniques and other techniques.

To test your custom trained model use detection_custom.py script, just change image_path for your image in code.


From this tutorial, we learned where and how to download custom training data, how to prepare data for training, and how to choose which model is best. In the next part, I will show you how to label your own data and how to train custom object detection on your own labeled data.