Do you still remember one of our recent articles, where we talked about image processing and computer vision? If not, we encourage you to read it first. As it turns out, these disciplines can be beneficial not only to the motor industry or medicine, but to office work, car park owners, and even police as well. Text extraction from an image is a technique that uses machine learning to extract the text directly from the picture with no human assistance. How will it change the way we work? How can text extraction from images using machine learning be beneficial to contemporary companies?
Generally speaking, thinking of text extraction is thinking of a way to teach artificial intelligence algorithms how to read. The first step of this assignment is to teach the algorithm to see the text (text recognition), and the next is to process it and transform it into a different form–for instance, a text file. We will look closer at both these stages of the text extraction process.
Text recognition with machine learning
As you know, you need to teach the computer to recognize what we know is text. Task is a bit simpler when we talk about high-quality, legible pictures, where the text is clearly visible, and so are all the letters and digits. But what about pictures or scans of more mediocre quality? This is where the challenge begins. However, let’s see how exactly does machine learning text recognition work.
We begin with the most common text recognition technique, and this is the OCR–Optical Character Recognition. OCR yields outstanding results only in very specific use cases, but in general, it is still considered as challenging. Optical Character Recognition is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera into editable and searchable data.
Let’s say we have a piece of paper–a high school diploma. You can use your scanning device to put it into a computer, but it’s not editable, for instance, with the MS Office tool. You need much more advanced graphics software to edit it. That takes time and requires specific skills. If you want to extract and repurpose data from this scanned document, you need an OCR software that would single out letters, put them into words, and then–words into sentences. This allows you to access and edit the document’s contents at once.
The most advanced OCR systems are focused on replicating natural human recognition. The OCR systems are based on three main rules–integrity, purposefulness, and adaptability. First, the observed object has always to be considered as one entity comprising many interrelated parts. In our case, the diploma is such an entity. Second, any interpretation of data must always serve some purpose. And finally, the OCR program has to be capable of self-learning.
The usage of the OCR software
The OCR software is by no means one, a uniform application that serves one and the same purpose. The OCR applications are used to serve lots of different intents. We can start with “reading” the printed page from a book or a random image with text (for instance, graffiti or advertisement), but we go on to reading street signs, car license plates, and even captchas. OCR software takes into consideration the following factors and attributes:
- Text density. On a printed page, the text is dense. However, given an image of a street with a single street sign, text is sparse. The OCR software has to recognize both.
- Text structure. Text on a page is usually structured, mostly in strict rows, while text in the wild may be scattered everywhere, in different rotations, shapes, fonts, and sizes.
- Font. While computer fonts are quite easy to recognize, handwriting font is much more inconsistent and, therefore, harder to read.
- Artifacts. There are almost none of them on a perfectly-scanned page, but what about outdoor pictures? This is a completely different story, and you have to keep that in mind when using OCR.
Now, let’s consider two major examples for the real-world, outdoor conditions: House numbers and car license plates. House plates are extremely important, just to mention Google Street View and Google Maps. This is a massive source of tons of different house numbers. And Stanford University created out of them the SVHN (Street View House Numbers) dataset. SVHN incorporates over 600,000 digit images and is aimed at developing machine learning and object recognition algorithms.
Another widespread application of OCR is car license plate recognition. This also has a lot of possible applications, from police databases (data obtained from speed cameras) to private parking lots that open the barrier after a license plate is verified.
The car parks with text recognition and machine learning technology
One of the companies that manage private car parks is Unipark. This is a company that operates in several European countries, such as Poland, Lithuania, Latvia, Estonia, and Belarus. It uses text recognition and extraction to manage cars driving in and driving out. When a vehicle approaches the barrier, the camera (similar to speed cameras used in Poland) takes a picture of its license plate, sends it to the company’s central database, and the barrier automatically opens.
When the text recognition part is done, the software extracts the car’s number plate and processes it into a plain, editable text, written in regular font.
When a given car owner wants to leave the car park, they have to go to the ticket machine (or ATVM) and choose their number plate from the list. Right after the payment, the barrier management software receives a signal that the given car can leave the parking lot. When the car approaches the barrier, its license plate is scanned again, and if the scanned number matches the already-paid numbers list–the barrier opens.
This is an example of how machine learning text recognition can be extremely helpful in day-to-day situations. The car owner doesn’t have to worry about a printed ticket and contrive where they should put it not to lose it. Everything happens within the software, and all the driver has to do is pay for their stopover.
Google Lens text recognition
Here’s another example. As you already know, Google Lens is an app that uses some image processing techniques along with machine learning technologies to give you more information about the object you’re pointing at. But what happens if a printed document is an object in question? Google Lens fires up its text recognition algorithm and allows you to directly translate the text from the original language into output one.
Let’s go back to our high school diploma example. Let’s say it has been issued in a language you don’t understand, but you have to get it translated. You have two possibilities–you can either type out every section of the diploma, for instance, into Google Translate, or you can use the Google Lens app. In the second case, all you have to do is to point your smartphone’s camera onto the diploma, take a picture, and ask Google Lens to translate it. The app will do that but also put translated text into the same place where the original one is! This is exactly how text recognition and extraction work.
Text extraction from images using machine learning
With the text recognition part done, we can switch to text extraction. You see, at the end of the first stage, we still have an uneditable picture with text rather than the text itself. To solve this problem, the next step is based on extracting text from an image. Right after text recognition, the localization process is performed. All the related features about a particular image are gathered. The text extraction and enhancement methods are applied with the help of machine learning algorithms. And finally, the extracted text is collected from the image and transferred to the given application or a specific file type. There are many types of text extraction algorithms and techniques that are used for various purposes. We can divide them into five main methods.
This method uses a sliding window to detect a text from any kind of image. This approach relies on several factors, such as color, edge, shape, contour, and geometry features.
This method uses various kinds of texture and its properties to extract a text from an image.
It’s the combination of the previous two techniques. First, the region-based approach is used to detect a text. Then, with the usage of the texture-based method, all the features are extracted from the text region.
EDGE BASED METHOD
As its name indicates, this method is based on the detection of the edges of every letter and digit. This method is used to develop a high-level contrast between the text and the background.
MORPHOLOGICAL BASED METHOD
This method is used to extract all the text-related features from the processed image.
The text extraction from images using machine learning software
There are many programs, algorithms, and applications that make text extraction from an image accessible. The list is very long, and it comprises several dozen apps and programs. Most of them are paid, but we have two free and handy tools on our list as well!
- Altair Monarch (according to G2.com, it is the fastest and easiest way to extract data from any source)
- Webhose.io (this app specializes in providing access to structured data from millions of web sources, even from deep and dark web)
- Import.io (it’s a SaaS product that enables users to convert the mass of data on websites into structured, machine-readable data)
- DocuClipper (it’s a cloud solution to extract fields and tables from scanned documents)
- Photo Scan (it is a free Windows 10 OCR app you can download from Microsoft Store. It recognizes the text from photo files but also directly from PC’s webcam)
- Microsoft OneNote (as it turns out, this Windows 10 free tool can also extract text from a multi-page printout with one click! It works both on pictures and handwriting text).
Do you think that text extraction from images using machine learning might be beneficial to your company or speed your work up? Don’t hesitate to call us or send us an e-mail. We are very keen to talk with you about implementing text extraction solutions to your business. Or perhaps, you’re interested in artificial intelligence in general? There is no better place for you. Let’s chat about your needs. Go straight to the contact section!
 National Seminar on FutureTrends & Innovations in Computer Engineering (NSFTICE’2015), Text Extraction Techniques. Authors: Yash Gupta, Shivani Sharma, Tushina Bedwal, I.M.S. Ghaziabad, India