Demonstrating YOLO v3 object detection with WebCam

Posted November 19, 2019 by Rokas Balsys



YOLOv3 web cam detection

Welcome to another YOLO v3 object detection tutorial. A lot of you asked me, how make this YOLO v3 work with web cam, I thought that this is obvious, but when I received around tenth email, with question "how to make it work with webcam", I thought - OK, I will invest my expensive 20 minutes and I will record a short tutorial about that.

So I will give you step by step instructions. From video tutorial, you might see that I downloaded my code from 4-th yolo object detection tutorial part. First if you want to try it by yourself you can find code on my GitHub. Alternatively, just clone whole directory with:

git clone https://github.com/pythonlessons/YOLOv3-object-detection-tutorial.git

We will be working in "YOLOv3-custom-training" directory. In video tutorial I will be working on "image_detect.py" file, but now you can find "webcam_detect.py" in same directory.

Now, download YOLOv3 weights from YOLO website, or use wget command:

wget https://pjreddie.com/media/files/yolov3.weights

Copy downloaded weights file to model_data folder.
Convert the Darknet YOLO model to a Keras model:

python convert.py model_data/yolov3.cfg model_data/yolov3.weights model_data/yolo_weights.h5

To measure how fast we can capture frames from our webcam we'll need to import time. Up to this step, you already should have all the needed files in 'model_data' directory, so we need to modify our default parameters to following:

"model_path": 'model_data/yolo_weights.h5',
"anchors_path": 'model_data/yolo_anchors.txt',
"classes_path": 'model_data/coco_classes.txt',

Now we need to comment "image = cv2.imread(image, cv2.IMREAD_COLOR)" line in "def detect_img(self, image):" function, because this function is used to read images from stored images on disk, here we'll use camera captured framed directly.

Now lets move to final part "if __name__=="__main__":".

Same as in previous tutorials I will use capture FPS parts, I will not explain them, because it's standrad procedure.

OpenCV provides a video capture object which handles everything related to opening and closing of the webcam. All we need to do is create that object and keep reading frames from it. The following code will open the webcam, capture the frames, scale them by a factor of 1, yolo model will detect object on it and then display them in a window. You can press the "q" key to exit:

yolo = YOLO()

# we create the video capture object cap
cap = cv2.VideoCapture(0)
if not cap.isOpened():
    raise IOError("We cannot open webcam")

while True:
    ret, frame = cap.read()
    # resize our captured frame if we need
    frame = cv2.resize(frame, None, fx=1.0, fy=1.0, interpolation=cv2.INTER_AREA)

    # detect object on our frame
    r_image, ObjectsList = yolo.detect_img(frame)

    # show us frame with detection
    cv2.imshow("Web cam input", r_image)
    if cv2.waitKey(25) & 0xFF == ord("q"):
        cv2.destroyAllWindows()
        break

cap.release()
cv2.destroyAllWindows()
yolo.close_session()

Short explanation:

As we can see in the preceding code, we use OpenCV's VideoCapture function to create the video capture object cap. Once it's created, we start an infinite loop and keep reading frames from the webcam until we encounter a keyboard interrupt. In the first line within the while loop, we have the following line:

ret, frame = cap.read()

Here, "ret" is a Boolean value returned by the read function, and it indicates whether or not the frame was captured successfully. If the frame is captured correctly, it's stored in the variable frame. This loop will keep running until we press the "q" key. So we keep checking for a keyboard interrupt in the following line:

if cv2.waitKey(25) & 0xFF == ord("q"):
	cv2.destroyAllWindows()
	break

Everytime we capture a new frame, we apply yolo detection on that frame:

r_image, ObjectsList = yolo.detect_img(frame)	

Conslusion:

So as you can see, it's not a magic to use your webcam with YOLO object detection. To edit code, that I could use webcam on all this stuff, took me around 10 minutes.

Also to make it more interesting we tried to compare FPS while using CPU and GPU. On cpu I was receiving around 3 frames per second, with GPU it was 11 frames per second. So to use object detection on gpu I can say that it's hundred times faster. Our frames were limited by openCV "cap.read()" function. I beleave there is ways to capture frames in faster way, but this wasn't the goal of this tutorial.

Full code of "webcam_detect.py" is on my github page.