People on the Internet are more or less familiar with the term CAPTCHAs — those annoying images contain the text you have to type in before accessing a website. CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. The primary purpose of CAPTCHA is to prevent automated stuff on the internet with bots, saying that's a test used in computing to determine whether or not the user is human.
CAPTCHA is just a text with noises, different colors, rotated symbols, or other ways changed to make it harder for a computer to recognize. Sometimes, even for a human, it's hard to identify what is written on an image, so it isn't easy to make bots who could break these images.
If you are reading this tutorial, you probably know that no matter how hard a captcha is, it's already possible to solve it with the rising of deep learning and computer vision. But probably you don't know how to do that, so keep reading to find it out.
No matter how much CAPTCHA evolves, there always will be people who come up with methods to break it. One of the most famous methods is to use a machine learning approach, and our main focus will be a specific type of Neural Network called Convolutional Neural Network (CNN).
CNN works similar to how our brain can recognize things and differentiate one object from another. To provide a better intuition, when you look at this picture below, you can immediately tell that these two animals are not the same species, but you will ask, how? The answer would be, that is obvious…
Using the same concept, we are going to do the same for our CAPTCHA detection Neural Network. Well, not the same because our computer does not perceive the picture the same as we do. They see bunches of symbols that indicate an intensity of color on that particular pixel. If we have an RGB image, one way to display them is as an array is RGBA. Layers in CNNs are particular as they are organized in 3 Dimensions, width, height, and depth. This fact allows us to feed in a picture to the network. The final layer, which is the fully connected layer, tells us what it predicts.
Here is an example of a photo that we see and what the computer sees in the same image to make everything more straightforward. In this photo, we see a puppy:
If we would like to see a picture in computer way, we may use this simple script on this image:
from PIL import Image
im = Image.open("puppy.jpg","r")
pix_val = list(im.getdata())
print(pix_val)
As a result, we receive thousands of numbers, which represent every pixel from the photo as an RGB value:
Creating a structured model to break CAPTCHA:
Let's look at the CAPTCHA again. Let's assume that it will come in a combination of 26 English alphabets and 0–9 numerical numbers. In the end, with our method, we'll be able to solve CAPTCHA with different amounts of symbols.
To use any machine learning system, we need to collect training data. To break a CAPTCHA system, we want a trained model that works like this:
Conclusion:
By the way, as I was searching for other out-of-the-box CAPTCHA solving models, I couldn't find them, so I decided to make one by myself. When I finish this tutorial series, you will be able to download the entire code. If you have all TensorFlow libraries on your computer, you will give a captcha to this model and receive the result.
That will be it for this part. I believe we have a good understanding of what our approach is. Next, we will be working with our CAPTCHA image dataset and training CNN using Tensorflow. I will go through step by step to train my CAPTCHA breaking model or use my model.