in Blog

March 02, 2022

Building Computer Vision Applications


Edwin Lisowski

CSO & Co-Founder

Reading time:

10 minutes

Computer vision solution is a rapidly burgeoning field that’s generating a lot of excitement in today’s world. Based on findings from Grand View Research, the computer vision market size across the globe in 2020 was valued at $11.32 billion. This sector is expected to grow from 2021 to 2028 at a compound annual growth rate (CAGR) of 7.3%[1]. The applications of AI-powered computer vision projects are almost infinite across different industries. In fact, they are everywhere nowadays. There’s virtually no domain where this technology has not yet been leveraged. In fact, it has begun to reach even our homes via smart TVs, home automation, and facial recognition in smartphones.

In today’s article, we are going to show you what computer vision applications are all about and how to approach building them.

Computer vision defined

Computer vision is a subset of artificial intelligence that gives machines the perception of sight. Computers can see, correctly identify, and categorize objects through this technology, courtesy of deep learning algorithms and machine learning. They then interpret the information through actions or recommendations.

Computer vision models use three types of data, as listed below:

  • 2D images and video
  • 3D images and video
  • Sensor data from satellite images

The evolution of computer vision

The common belief is that Larry Roberts is the father of Computer Vision. In his Ph.D. thesis at MIT, he discussed the massive potential of deriving 3D geometrical data from 2D outlook views of blocks[2]. Afterward, other researchers gained interest in his work and learned computer vision from the blocks world perspective.


autonomous driving

Originally, computer vision began with applications that could only perform limited tasks. They also depended largely on manual coding and human intervention. With time, computing systems started to learn with machine learning development. This made it possible to design small apps and use statistical learning algorithms to detect objects or recognize patterns.

Advances in Artificial Intelligence marked a significant shift. Machine learning algorithms were largely replaced by deep learning and hybrid models utilizing neural networks. This saw the first commercial computer vision software released back in the 1970s. Since then, computer vision applications have advanced. From precision farming and contactless food delivery to Smartphone applications that pinpoint animals, the technology is disrupting entire industries.

Other computer vision applications include:

  • Security and surveillance
  • Patient diagnosis
  • Autonomous driving
  • Research on disease evolution
  • Predictive maintenance

It might be interesting for you: The latest advances in Computer Vision

The Process of Building Computer Vision Applications

Creating a scalable, computational model pipeline is the best strategy when implementing computer vision solutions. Here are important steps to guide you along the way.

Define the business problem

The key to successful computer vision projects is to have a clear business objective and benefit in mind. Make sure you can describe the objective and benefit in one or two sentences. The objective, for example, could be to minimize the number of stockouts on your shelves.

If you’re unable to arrive at a clear business problem, consider brainstorming with the relevant stakeholders. In terms of the benefit, think about the avoidable costs or gained revenue if the business problem can be solved. A well-thought-out business impact is bound to attract funding. It also enables you to receive the authorization to advance your proof of concept (POC) into production.

Get quality datasets

Successful computer vision applications need quality visual data. In fact, they require plenty of it. Over 80 percent of AI project time is used on data preparation along with engineering tasks. This is according to findings from an analyst company called Cognilytica [3].

Computer vision systems can discover relationships and make decisions and recommendations from the training data. The higher the quality and quantity of the data sets, the higher the success of computer vision projects.

binary system

Here are the qualities of a good data set:

  • Variance: The data set should include a wide scope of objects of interest. For example, trucks, motorcycles, SUVs, sedans, and minivans. In the context of roadways, the data could have rural roads, highways, or city streets.
  • Density: The images should have myriad target objects to mimic real-world scenarios. Images with one or two objects require more density.
  • Quality: This means the use of high-resolution images.
  • Quantity: The more data images available, the better. There’s no such thing as having too much data for training machine learning models.

So how do you acquire data? Well, there are three methods of data acquisition. They include:

  • Access and employ open data. Open data sets are created by individuals, organizations, and governments and are found online.
  • Create your own data setHire a third party to carry out the data gathering on your behalf.

Data annotation

In supervised learning, the data sets need to be enriched or annotated. Basically, the data is labeled in a way that teaches the machine system to predict the desired outcome. This means tagging, labeling, or marking a data set with the characteristics you wish your machine learning system to learn to identify. It’s the most time-consuming activity of your computer vision project[4]. Upon deployment, you want it to identify those features single-handedly and take some action.

For example, autonomous cars do not only need images of the road. They also require labeled images of every car, passerby, and road sign. Equally, sentiment analysis projects need labels that teach algorithms to understand when a person uses irony or slang. Computer vision projects need plenty of data for accurate outcomes. So you’ll need trained employees to label the images.

computer vision fact

The following are common annotation techniques applied in computer vision projects:

  • Bounding Box: This method is applied in visual data to draw a box around a target object.
  • Landmarking: It is applied to plot traits in the data set, including nose, eyes, and ears in images used in facial recognition systems.
  • Masking: This allows you to either hide or highlight areas in an image.
  • Wireframe: It can label geometric features, vertical lines, and their junctions to form 3-dimensional structures
  • Polygon: Highlights the edges of the target object by marking its highest points. This applies especially to irregular-shaped objects like landscapes or homes.
  • 3D cuboids: The application of 3D bounding boxes to label and measure several points on an object’s external surface.
  • Object tracking: This technique can label and trail the movement of a target object across different video frames.
  • Polyline: The technique draws lines featuring one or more segments. It is suited for open shapes, including power lines, sidewalks, and road lane markers.
  • Transcription: This annotation method can be automated or done manually. It identifies texts appearing in images or videos.

Model evaluation and training

You will have to analyze different models and architectures to see which architectures perform well with specific data sets. Some models perform well with texts, including term classification and translations. Others are suited for images, for example, localization models, detection models, or classification models.

After model evaluation, the next step is model training. Before model training, the training data set is split into three categories:

  • Training datasets (75%)
  • Validation datasets (10%)
  • Test datasets (15%)

The actual distribution may vary based on the quantity of prepared data available. Training data is used to train your computer vision algorithm or model so it can correctly predict the desired outcome. Validation data is applied to evaluate and inform your selection of the algorithm and parameters of the model you’re creating. Both training and validation data are used during model training.


After model training, the accuracy of the model is analyzed via test data. The model is judged on how correct it can predict new outcomes based on its training. Building computer vision applications is an iterative process involving trial and error. If your model is not giving accurate answers, you can start troubleshooting in the following areas:

  • Confirm that your annotated data are accurate and consistent. This is because inconsistent labeling or tagging makes it difficult for the neural network to master the features required for object detection or classification.
  • Think about increasing the quality and quantity of your training data.
  • Consider implementing new algorithms. The performance of algorithms differs in the context of inference speed and capacity to identify objects correctly.
  • The incorrect learning rate makes it hard for the neural network to assemble, resulting in poor model performance. So try out different learning rates.


After achieving the best model, you have to tie it back to the first step by asking yourself two questions:

  • Does the champion model tackle the identified business problem?
  • And does it provide the expected business value?

If it satisfies both questions, the model can be integrated with existing business processes. At this point, you can solicit feedback from end-users. The deployment of the champion model serves as the quality baseline for future training iterations.

Periodic retraining

Computer vision projects require periodic updates even after deployment. Why? The answer is based on the nature of the training datasets and how they influence the way machine models learn patterns and relationships and perform predictive functions.

Training data is static. In contrast, real-world conditions continue to change rapidly. Thus, training data may be rendered less accurate within a short time. In other words, the training data may no longer be the basis for correct future predictions. This phenomenon is known as model decay or data drift.

Training data

Due to the inevitability of this challenge, you need to retrain your model using new or updated information regularly. You have two alternatives:

  • Manual retraining approach: It involves using updated data inputs to repeat your original training data processes. You choose how and when to introduce new data to the algorithm.
  • Continual learning approach: The model learns endlessly from new streams of data generated from the environment where it’s been deployed.

Final thoughts on computer vision applications

The quality and quantity of your training data are the keys to successful computer vision projects. Also, how the training data is labeled affects the precision and performance of your model. The trick is to find labelers who have domain experience pertinent to your use case.

Building your computer vision applications is a complex process. The way you annotate your training data evolves the more you train, validate, and experiment with CV models. Depending on the predictive outcomes, you will need to create new datasets for better algorithm results. Final thoughts on computer vision applications

The quality and quantity of your training data are the keys to successful computer vision projects. Also, how the training data is labeled affects the precision and performance of your model. The trick is to find labelers who have domain experience pertinent to your use case.

Building your computer vision applications is a complex process. The way you annotate your training data evolves the more you train, validate, and experiment with CV models. Depending on the predictive outcomes, you will need to create new datasets for better algorithm results.

If you want to find and implement the best computer vision solution for your business, drop us a line today!


[1] Grandviewresearch. com. Industry Analysis: Computer Vision Market. URL: Accessed February 24, 2022
[2] Y. Aloimonos (ed.). Special Issue on Purposive and Qualitative Active Vision, CVGIP B: Image Understanding. Vol. 56 (1992).
[3] Data Engineering Preparation Labelling for AI. URL: Accessed February 24, 2022
[4] How Image Annotation Teaches Machines to See. URL: , Accessed February 24, 2022


Computer Vision