An Overview of Industrial Text Recognition
November 21, 2023•925 words
Introduction
Image processing is widely used in the manufacturing industry for quality assurance. Its applications range from automated optical inspections of printed circuit boards to verifying serial numbers and manufacture dates on products. More creative applications integrating image processing with thermography have recently been published to perform predictive maintenance.
While some factories already employ such technology in their production lines, information on these implementations is typically treated as a closely guarded secret. Many factories in developing nations still rely on overly manual processes, which leaves them at a competitive disadvantage. This is a shame because the field of image processing is mature and features highly polished code libraries that could be used to easily build such implementations.
The application I'm primarily concerned with in this review is the validation of product batch numbers and manufacture dates in the food manufacturing industry. This information is critical for quality assurance and traceability. Failure to have clear, human-readable text can be quite costly in the event of a defects or product recalls.
This application demands high performance as hundreds of manufactured items would be passing through any inspection steps within a given minute. It would be preferred if the addition of an image processing quality check did not require slowing down the production line or introduce significant process changes in the production line.
My goal is to eventually develop an open source image processing solution that can be affordably implemented in a production line to perform quality assurance tasks. The first step of this process is to review the techniques and compute infrastructure involved in industrial image processing.
High Level Techniques
Instead of diving into potential candidates for software libraries, I'd like to review the high level features and techniques that would be required in the typical text processing pipeline.
Preprocessing
Images captured in a real operating environment are often not ideal in terms of text legibility. This can be due to lighting conditions, camera hardware or the relative speed of the object being captured. To combat these issues, various preprocessing techniques are employed to boost image clarity. Such techniques things denoising, rotations, colour correction and sharpening. While such techniques are valuable and have proven their worth in production applications, they often require manual tuning.
This manual tuning can be time consuming and is often specific to a particular scenario. For example, if the speed of the production line increases, or the lighting changes, the image processing application in question would suffer from reduced performance unless the preprocessing stages could account for these changes.
Thankfully there exist modern text recognition algorithms that are becoming increasingly robust to environmental changes. Ideally, we'd be able to deploy one of these algorithms to reduce the impact on the day to day operations within a factory.
Glyph Detection
When it comes to text recognition, the simplest approach would be to search within an image for objects that look like glyphs, specifically alphabets and numbers. This has historically been done with techniques such as simple image correlations, which effectively involve searching through an image pixel-by-pixel for a set of pixels that are the closet match to your particular glyph. For large images, this becomes too computationally expensive. However, there are techniques that were developed to mitigate this computational complexity such as those that involve transforming images to the frequency domain.
The biggest issue with this class of algorithms is that they are highly dependent on preprocessing. For example, the number 1 could be mistaken for a lowercase L with enough smudging or image skewing. The accuracy of these algorithms depend on preprocessing to address any potential image irregularities that occur in the field.
Feature Detection
The way to improve upon whole glyph detection would be to decompose glyphs into features that then search for each combination of these features within the image. For example, the uppercase letter T can be decomposed into a vertical and horizontal line that meet at a certain angle and distance. Searching for features allows this improved class of algorithms to be more robust to variances in image quality and even the style of the font that is being detected.
There has been much published work on sets of features that can be utilized to perform text recognition with high accuracy. These feature sets range from being completely hand-tuned to entirely automatically obtained through machine learning algorithms. It is these deep neural network machine learning algorithms that represent the state of the art today. While deep neural networks can be computationally expensive to execute, hardware vendors have developed hardware accelerators capable of achieving real-time execution.
Computing & Infrastructure
Given that industrial image processing should ideally not require significant changes to the factory floor plan or operating process, there are restrictions placed on the computing infrastructure that can be used. It would be a large undertaking for many factories to install rack-mount server hardware and while relying on cloud computing resources seems appealing, the latency involved in sending data off-premises would make real-time operation difficult.
Compact, fan-less industrial computers have become prevalent since they offer sufficient compute capacity, are easy to position and are ideal for long term deployments due to their resistance to dust build-up. There are many vendors that offer such solutions, however, one of the more interesting ones would be the Raspberry Pi. This open-source single board computer has grown in popularity for industrial applications due to its low cost and support for advanced camera peripherals. We will explore the Raspberry Pi and its hardware capabilities in-depth in another blog post.