Preparing YOLO v3 Custom Data

In this step-by-step tutorial, I will show you how to prepare data for your own custom YOLO v3 object detector

YOLOv3 is one of the most popular real-time object detectors in Computer Vision. In my previous tutorial, I shared how to simply use YOLO v3 with the TensorFlow application. At the end of the tutorial I wrote, that I will try to train a custom object detector on YOLO v3 using Keras, it is really challenging task, but I found a way to do that. However, before training a custom object detector, we must know where we may get a custom dataset or how we should label it, so this tutorial will be about dataset preparation.

In this step-by-step tutorial, I will start with a simple case of how to train a 4-class object detector (we could use this method to get a dataset for every detector you may use). Therefore, I will build a car, bus, fire hydrant, and traffic light object detector.

1. Dataset:

As with any deep learning task, the first most important task is to prepare the dataset. Dataset is the fuel that runs any deep learning model.

I will use images from Google’s OpenImagesV5 dataset, publicly available online. It is a very big dataset with 600 different classes of objects. The dataset contains the bounding box, segmentation, or relationships annotations for these objects. As a whole, the dataset is more than 600GB in size, but we will download the images and classes only needed for us.

But okay, we have a link to this dataset, but it's not explained how we should download images and labels we need, should we download them one by one? No, there is an amazing OIDv4 ToolKit from GitHub with a full explanation of how to use it.

This toolkit really makes our life easier when we want to train a custom object detection model with popular objects. This toolkit allows downloading images from OID v5 seamlessly. The installation is easy and clearly explained in the readme file. The toolkit is loaded with a variety of options. For example, the OIDv4 Toolkit allows us to download almost any particular class of interest from the given database.

The toolkit can be downloaded from the link I mentioned above or cloned by the following command:

git clone

When you have downloaded or cloned the toolkit, the first thing you should do is install the necessary packages:

pip install -r requirements.txt

2. Using toolkit:

At first start, you will be asked to download class-descriptions-boxable.csv (contains the name of all 600 classes with their corresponding ‘LabelName’), test-annotations-bbox.csv and train-annotations-bbox.csv (the file contains one bounding box (bbox for short) coordinates for one image, and it also has this bbox’s Label Name and current image’s ID from the validation set of OIDv5) files to OID/csv_folder directory.

3. Downloading database:

First of all how to check if we can download the appropriate image class we need? I usually go to OIDv5 page -> click on explore and in a search tab try to find my needed class. In my example, I will search for "Bus", "Car", "Fire hydrant", and "Traffic light". To download all of them I simply can use OIDv4_ToolKit with the following command in cmd:

python downloader --classes 'Fire hydrant' 'Traffic light' Car Bus --type_csv train --limit 400

With this above command, I will download 400 training images for each class, and place them in the train folder. As I mentioned above if you are using this for the first time, this will first download the corresponding train-annotations-bbox or test-annotations-bbox the CSV file and download the requested images from the specified class.

After downloading my test (not doing this in the video tutorial) and train dataset, my folders structure looks like this:

│    ...
└─── OID
    └─── csv_folder
    │   │
    │   └─── class-descriptions-boxable.csv
    │   │
    │   └─── test-annotations-bbox.csv
    │   │
    │   └─── train-annotations-bbox.csv
    └─── OID
        └─── Dataset
            └─── train
                └─── Bus
                └─── Car
                └─── Fire Hydrant
                └─── Traffic light

The toolkit also offers other optional arguments to download images of various requirements such as an object which extends beyond the boundary of the image or occluded by another object. More information about the various features can be found on their official GitHub page or obtained in the following way:

python -h

4. Converting label files to XML:

If you will open one of the labels file you might see the class and coordinates of points in type of this: "name_of_the_class left top right bottom", where each coordinate is denormalized. So, the four different values correspond to the actual number of pixels of the related image.

If you want to convert the text annotations format to XML (for example to train TF object detection API). Below is a little script that does it for you.

How to use:

If you are testing this script and starting it from the original OIDv4 ToolKit path, you should uncomment this line:

#os.chdir(os.path.join("OID", "Dataset"))

I recommend that when you have your images downloaded, copy them to your folder where you plan to train your object detection model. For example, copy images to Dataset/images/ folder and then use this os.chdir('Dataset') line of code. This script will create the same .xml file name as an image in the right format that we'll use later.

import os
from tqdm import tqdm
from sys import exit
import argparse
import cv2
from textwrap import dedent
from lxml import etree

XML_DIR = ''

os.chdir(os.path.join("OID", "Dataset"))
DIRS = os.listdir(os.getcwd())

for DIR in DIRS:
    if os.path.isdir(DIR):

        print("Currently in Subdirectory:", DIR)
        CLASS_DIRS = os.listdir(os.getcwd()) 
        for CLASS_DIR in CLASS_DIRS:
            if " " in CLASS_DIR:
                os.rename(CLASS_DIR, CLASS_DIR.replace(" ", "_"))
        CLASS_DIRS = os.listdir(os.getcwd())
        for CLASS_DIR in CLASS_DIRS:
            #if " " in CLASS_DIR:
            #    os.rename(CLASS_DIR, CLASS_DIR.replace(" ", "_"))
            if os.path.isdir(CLASS_DIR):

                print("\n" + "Creating PASCAL VOC XML Files for Class:", CLASS_DIR)
                # Create Directory for annotations if it does not exist yet
                #if not os.path.exists(XML_DIR):
                #    os.makedirs(XML_DIR)

                #Read Labels from OIDv4 ToolKit

                #Create PASCAL XML
                for filename in tqdm(os.listdir(os.getcwd())):
                    if filename.endswith(".txt"):
                        filename_str = str.split(filename, ".")[0]

                        annotation = etree.Element("annotation")
                        folder = etree.Element("folder")
                        folder.text = os.path.basename(os.getcwd())

                        filename_xml = etree.Element("filename")
                        filename_xml.text = filename_str + ".jpg"

                        path = etree.Element("path")
                        path.text = os.path.join(os.path.dirname(os.path.abspath(filename)), filename_str + ".jpg")

                        source = etree.Element("source")

                        database = etree.Element("database")
                        database.text = "Unknown"

                        size = etree.Element("size")

                        width = etree.Element("width")
                        height = etree.Element("height")
                        depth = etree.Element("depth")

                        img = cv2.imread(filename_xml.text)

                            width.text = str(img.shape[1])
                        except AttributeError:
                        height.text = str(img.shape[0])
                        depth.text = str(img.shape[2])


                        segmented = etree.Element("segmented")
                        segmented.text = "0"

                        label_original = open(filename, 'r')

                        # Labels from OIDv4 Toolkit: name_of_class X_min Y_min X_max Y_max
                        for line in label_original:
                            line = line.strip()
                            l = line.split(' ')
                            class_name = l[0]
                                xmin_l = str(int(float(l[1])))
                                add1 = 0
                            except ValueError:
                                class_name = l[0]+"_"+l[1]
                                add1 = 1

                            xmin_l = str(int(float(l[1+add1])))
                            ymin_l = str(int(float(l[2+add1])))
                            xmax_l = str(int(float(l[3+add1])))
                            ymax_l = str(int(float(l[4+add1])))
                            obj = etree.Element("object")

                            name = etree.Element("name")
                            name.text = class_name

                            pose = etree.Element("pose")
                            pose.text = "Unspecified"

                            truncated = etree.Element("truncated")
                            truncated.text = "0"

                            difficult = etree.Element("difficult")
                            difficult.text = "0"

                            bndbox = etree.Element("bndbox")

                            xmin = etree.Element("xmin")
                            xmin.text = xmin_l

                            ymin = etree.Element("ymin")
                            ymin.text = ymin_l

                            xmax = etree.Element("xmax")
                            xmax.text = xmax_l

                            ymax = etree.Element("ymax")
                            ymax.text = ymax_l



                        # write xml to file
                        s = etree.tostring(annotation, pretty_print=True)
                        with open(filename_str + ".xml", 'wb') as f:



5. Converting XML to YOLO v3 file structure:

First, to train a YOLO model there are requirements how annotation file should be made:

  • One row for one image;
  • Row format: image_file_path box1 box2 ... boxN;
  • Box format: x_min,y_min,x_max,y_max,class_id (no space).
  • Here is an example:
        path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
        path/to/img2.jpg 120,300,250,600,2

As of now, we should have our .xml files prepared from above 4th step or we have manually labeled our images according to my tutorial.

How to use:

First of all, to train YOLO v3 object detection model we need an annotations file and classes file. Classes and annotations will be created with the below script, you just need to change two lines of code:

  • 1. dataset_train - this is the location of your downloaded images with XML files;
  • 2. dataset_file - this is the output file, that will be created with prepared annotation for YOLO training;
  • 3. classes_file - don't need to change this, this file will be created with all used classes which were in the XML file. This file will be created with the same name as dataset_file, but the 'classes' word will be added after the lower bracket.

Therefore, for my example files structure will look like this:

└─── OID
    └─── Dataset
        └─── train
            └─── Bus
            └─── Car
            └─── Fire Hydrant
            └─── Traffic light

Code to convert .xml files to YOLO v3 annotations:

import xml.etree.ElementTree as ET
from os import getcwd
import os

dataset_train = 'OID\\Dataset\\train\\'
dataset_file = '4_CLASS_test.txt'
classes_file = dataset_file[:-4]+'_classes.txt'

CLS = os.listdir(dataset_train)
classes =[dataset_train+CLASS for CLASS in CLS]
wd = getcwd()

def test(fullname):
    bb = ""
    in_file = open(fullname)
    root = tree.getroot()
    for i, obj in enumerate(root.iter('object')):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in CLS or int(difficult)==1:
        cls_id = CLS.index(cls)
        xmlbox = obj.find('bndbox')
        b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
        bb += (" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))

        # we need this because I don't know overlapping or something like that
        if cls == 'Traffic_light':
            list_file = open(dataset_file, 'a')
            file_string = str(fullname)[:-4]+'.jpg'+bb+'\n'
            bb = ""

    if bb != "":
        list_file = open(dataset_file, 'a')
        file_string = str(fullname)[:-4]+'.jpg'+bb+'\n'

for CLASS in classes:
    for filename in os.listdir(CLASS):
        if not filename.endswith('.xml'):
        fullname = os.getcwd()+'\\'+CLASS+'\\'+filename

for CLASS in CLS:
    list_file = open(classes_file, 'a')
    file_string = str(CLASS)+"\n"

6. Summary:

  1. Download toolkit to download images;
  2. Use the toolkit to download images from Google’s OpenImagesV5 dataset;
  3. Convert label files to XML using script;
  4. Convert XML to YOLO v3 file structure with;
  5. With my example, we'll get 4_CLASS_test.txt and 4_CLASS_test_classes.txt, we'll use them to train the YOLO v3 model.


Now we have prepared our dataset and annotation file for further YOLO v3 object detection training.