Preparing YOLO v3 Custom Data

Posted August 22, 2019 by Rokas Balsys

Preparing YOLO v3 Custom training data

YOLOv3 is one of the most popular real-time object detectors in Computer Vision.

In my previous tutorial, I shared how to simply use YOLO v3 with TensorFlow application. At the end of tutorial I wrote, that I will try to train custom object detector on YOLO v3 using Keras, it is really challenging task, but I found a way to do that. However, before training custom object detector, we must know where we may get custom dataset or how we should label it, so this tutorial will be about dataset preparation.

In this step-by-step tutorial, I will start with a simple case of how to train a 4-class object detector (we could use this method to get dataset for every detector you may use). Therefore, I will build car, bus, fire hydrant and traffic light object detector.

1. Dataset:

As with any deep learning task, the first most important task is to prepare the dataset. Dataset is the fuel which runs any deep learning model.

I will use images from Google’s OpenImagesV5 dataset, publicly available online. It is a very big dataset with 600 different classes of object. The dataset contains the bounding box, segmentation or relationships annotations for these objects. As a whole, the dataset is more than 600GB of size, but we will download the images and classes only needed for us.

But okay, we have a link to this dataset, but it's not explained how we should download images and labels we need, should we download them one by one? No, there is an amazing OIDv4 ToolKit from GitHub with full explanation how to use it.

This toolkit really makes our life easier when we want to train a custom object detection model with popular objects. This toolkit allows to download images from OID v5 seamlessly. The installation is easy and clearly explained in the readme file. The toolkit is loaded with a variety of options. For example, OIDv4 Toolkit allows us to download almost any particular class of interest from given database.

The toolkit can be downloaded from link I mentioned above or cloned by the following command:

git clone

When you have downloaded or cloned the toolkit, first thing you should do is install necessary packages:

pip install -r requirements.txt

2. Using toolkit:

At first start you will be asked to download class-descriptions-boxable.csv (contains the name of all 600 classes with their corresponding ‘LabelName’), test-annotations-bbox.csv and train-annotations-bbox.csv (file contains one bounding box (bbox for short) coordinates for one image, and it also has this bbox’s Label Name and current image’s ID from the validation set of OIDv5) files to OID/csv_folder directory.

3. Downloading database:

First of all how to check if we can download appropriate image class we need? I usually go to OIDv5 page -> click on explore and in a search tab try to find my needed class. In my example, I will search for "Bus", "Car", "Fire hydrant", and "Traffic light". To download all of them I simply can use OIDv4_ToolKit with following command in cmd:

python downloader --classes 'Fire hydrant' 'Traffic light' Car Bus --type_csv train --limit 400

With this commands I will download 400 training images for each class, and place them in train folder. As I mentioned above if you are using this for first time, this will first download corresponding (train-annotations-bbox or test-annotations-bbox) CSV file and download the requested images from the specified class.

After downloading my test (not doing this in video tutorial) ant train dataset, my folders structure looks like this:

│    ...
└─── OID
    └─── csv_folder
    │   │
    │   └─── class-descriptions-boxable.csv
    │   │
    │   └─── test-annotations-bbox.csv
    │   │
    │   └─── train-annotations-bbox.csv
    └─── OID
        └─── Dataset
            └─── train
                └─── Bus
                └─── Car
                └─── Fire Hydrant
                └─── Traffic light

The toolkit also offers other optional arguments to download images of various requirements such as an object which extends beyond the boundary of the image or occluded by another object. More information about the various features can be found on their official GitHub page or obtained in the following way:

python -h

4. Converting label files to XML:

If you will open one of labels file you might see class and coordinates of points in type of this: "name_of_the_class left top right bottom", where each coordinate is denormalized. So, the four different values correspond to the actual number of pixels of the related image.

If you want to convert the text annotations format to XML (for example to train TF object detection API). Below is a little script that does it for you.

How to use:

If you are testing this script, and starting it from original OIDv4 ToolKit path, you should uncomment this line:

#os.chdir(os.path.join("OID", "Dataset"))

I recommend that when you have your images downloaded, copy them to your folder where you plan to train your object detection model. For example, copy images to 'Dataset/images/' folder and then use this os.chdir('Dataset') line of code. This script will create same .xml file name as image in a right format that we'll use later.

import os
from tqdm import tqdm
from sys import exit
import argparse
import cv2
from textwrap import dedent
from lxml import etree

XML_DIR = ''

os.chdir(os.path.join("OID", "Dataset"))
DIRS = os.listdir(os.getcwd())

for DIR in DIRS:
    if os.path.isdir(DIR):

        print("Currently in Subdirectory:", DIR)
        CLASS_DIRS = os.listdir(os.getcwd()) 
        for CLASS_DIR in CLASS_DIRS:
            if " " in CLASS_DIR:
                os.rename(CLASS_DIR, CLASS_DIR.replace(" ", "_"))
        CLASS_DIRS = os.listdir(os.getcwd())
        for CLASS_DIR in CLASS_DIRS:
            #if " " in CLASS_DIR:
            #    os.rename(CLASS_DIR, CLASS_DIR.replace(" ", "_"))
            if os.path.isdir(CLASS_DIR):

                print("\n" + "Creating PASCAL VOC XML Files for Class:", CLASS_DIR)
                # Create Directory for annotations if it does not exist yet
                #if not os.path.exists(XML_DIR):
                #    os.makedirs(XML_DIR)

                #Read Labels from OIDv4 ToolKit

                #Create PASCAL XML
                for filename in tqdm(os.listdir(os.getcwd())):
                    if filename.endswith(".txt"):
                        filename_str = str.split(filename, ".")[0]

                        annotation = etree.Element("annotation")
                        folder = etree.Element("folder")
                        folder.text = os.path.basename(os.getcwd())

                        filename_xml = etree.Element("filename")
                        filename_xml.text = filename_str + ".jpg"

                        path = etree.Element("path")
                        path.text = os.path.join(os.path.dirname(os.path.abspath(filename)), filename_str + ".jpg")

                        source = etree.Element("source")

                        database = etree.Element("database")
                        database.text = "Unknown"

                        size = etree.Element("size")

                        width = etree.Element("width")
                        height = etree.Element("height")
                        depth = etree.Element("depth")

                        img = cv2.imread(filename_xml.text)

                            width.text = str(img.shape[1])
                        except AttributeError:
                        height.text = str(img.shape[0])
                        depth.text = str(img.shape[2])


                        segmented = etree.Element("segmented")
                        segmented.text = "0"

                        label_original = open(filename, 'r')

                        # Labels from OIDv4 Toolkit: name_of_class X_min Y_min X_max Y_max
                        for line in label_original:
                            line = line.strip()
                            l = line.split(' ')
                            class_name = l[0]
                                xmin_l = str(int(float(l[1])))
                                add1 = 0
                            except ValueError:
                                class_name = l[0]+"_"+l[1]
                                add1 = 1

                            xmin_l = str(int(float(l[1+add1])))
                            ymin_l = str(int(float(l[2+add1])))
                            xmax_l = str(int(float(l[3+add1])))
                            ymax_l = str(int(float(l[4+add1])))
                            obj = etree.Element("object")

                            name = etree.Element("name")
                            name.text = class_name

                            pose = etree.Element("pose")
                            pose.text = "Unspecified"

                            truncated = etree.Element("truncated")
                            truncated.text = "0"

                            difficult = etree.Element("difficult")
                            difficult.text = "0"

                            bndbox = etree.Element("bndbox")

                            xmin = etree.Element("xmin")
                            xmin.text = xmin_l

                            ymin = etree.Element("ymin")
                            ymin.text = ymin_l

                            xmax = etree.Element("xmax")
                            xmax.text = xmax_l

                            ymax = etree.Element("ymax")
                            ymax.text = ymax_l



                        # write xml to file
                        s = etree.tostring(annotation, pretty_print=True)
                        with open(filename_str + ".xml", 'wb') as f:



5. Converting XML to YOLO v3 file structure:

First, to train a yolo model there is requirements how annotation file should be made:

  • One row for one image;
  • Row format: image_file_path box1 box2 ... boxN;
  • Box format: x_min,y_min,x_max,y_max,class_id (no space).
  • Here is an example:
        path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
        path/to/img2.jpg 120,300,250,600,2

As from now, we should have our .xml files prepared from above 4th step or we have manually labeled our images according my tutorial.

How to use:

First of all, to train YOLO v3 object detection model we need annotations file and classes file. Classes and annotations will be created with below script, you just need to change two lines of code:

  • 1. dataset_train - this is the location of you downloaded images with xml files
  • 2. dataset_file - this is the output file, that will be created with prepared annotation for YOLO training;
  • 3. classes_file - don't need to change this, this file will be created with all used classes which were in xml file. This file will be created with same name as dataset_file, but 'classes' word will be added after lower bracket.

Therefore, for my example files structure will look like this:

└─── OID
    └─── Dataset
        └─── train
            └─── Bus
            └─── Car
            └─── Fire Hydrant
            └─── Traffic light

Code to convert .xml files to YOLO v3 annotations:

import xml.etree.ElementTree as ET
from os import getcwd
import os

dataset_train = 'OID\\Dataset\\train\\'
dataset_file = '4_CLASS_test.txt'
classes_file = dataset_file[:-4]+'_classes.txt'

CLS = os.listdir(dataset_train)
classes =[dataset_train+CLASS for CLASS in CLS]
wd = getcwd()

def test(fullname):
    bb = ""
    in_file = open(fullname)
    root = tree.getroot()
    for i, obj in enumerate(root.iter('object')):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in CLS or int(difficult)==1:
        cls_id = CLS.index(cls)
        xmlbox = obj.find('bndbox')
        b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
        bb += (" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))

        # we need this because I don't know overlapping or something like that
        if cls == 'Traffic_light':
            list_file = open(dataset_file, 'a')
            file_string = str(fullname)[:-4]+'.jpg'+bb+'\n'
            bb = ""

    if bb != "":
        list_file = open(dataset_file, 'a')
        file_string = str(fullname)[:-4]+'.jpg'+bb+'\n'

for CLASS in classes:
    for filename in os.listdir(CLASS):
        if not filename.endswith('.xml'):
        fullname = os.getcwd()+'\\'+CLASS+'\\'+filename

for CLASS in CLS:
    list_file = open(classes_file, 'a')
    file_string = str(CLASS)+"\n"

6. Summary:

1. Download toolkit to download images
2. Use toolkit to download images from Google’s OpenImagesV5 dataset
3. Convert label files to XML using script
4. Convert XML to YOLO v3 file structure with
5. With my example we'll get 4_CLASS_test.txt and 4_CLASS_test_classes.txt, we'll use them to train YOLO v3 model


Now we have prepared our dataset and annotation file for further YOLO v3 object detection training.