in Blog

March 28, 2022

What is Image Annotation? A Guide by Addepto


Edwin Lisowski

CSO & Co-Founder

Reading time:

9 minutes

Computer vision is a branch of artificial intelligence that gives machine models the ability to see and deduce meaningful information from visual inputs. It has helped unlock futuristic technology no one thought was possible. Typical examples include facial recognition, autonomous vehicles, unmanned drones, and more. But none of these remarkable computer vision technologies would have seen the light of day without image annotation. Remember, your computer vision model is only as good as your training data. The training dataset should have accurately annotated images that can be detected and predicted by the machine learning models. The higher the quality of your image annotations, the higher the accuracy of your computer vision model.

This guide will take you through everything you need to know about image annotation, from its definition to image annotation types and techniques and its use cases.

Read on for more insight about computer vision solutions. 

What is Image Annotation?

Image annotation is also known as tagging. It is a human-driven task of assigning labels to multimedia objects using text or annotation tools and techniques[1]. Annotating an image entails adding metadata to it. So your model can understand what’s contained in it and make accurate inferences.

It is easier to find annotated images using keyword-based search compared to non-annotated images, especially in large databases[2]. This is because image annotation labels the features you want your computer vision model to detect, recognize, and classify.

object detection, computer vision, street


Let’s say you’re training your computer vision model to identify cars in multiple contexts. In this case, it is not enough to only show your machine learning model images that have cars in them. Remember, the images may also contain other objects like pets, phones, and roads, among other things. Your model does not have the innate ability to differentiate all these things unless you show it. That’s where tagging comes in.

Essentially, your training dataset should identify which section of the image contains a car. With adequate image annotations, your model starts to create its own rules regarding what a car looks like.

The model will make car predictions and compare them to the image annotations. Of course, a few necessary adjustments may be needed to deliver accurate future predictions. More learning and testing will be needed to improve your model’s accuracy. And before long, it will be able to identify cars in other non-annotated images as well.

Types of image annotation

You can train your machine learning model using at least four main types of image annotation. Each type of image annotation is unique regarding how it portrays specific attributes or regions pictured in the image.

Of course, your choice of image annotation will depend on the data you want your computer vision algorithms to see.

Let’s get right into it.

Image Classification

Image classification aims to recognize the presence of comparable objects captured in images across the entire training dataset. Image classification oversimplifies the image into a specific label. It can teach your model algorithm to answer the question: does the image contain a dog or not? However, image classification can’t answer the question: what is the location or size of the dog? Examples of image classification include tagging the interior photos of a home with labels, including “living rooms” or “kitchen”.

Objection Detection/Recognition

Object detection trains the machine model to accurately detect different types of objects noticeable in the natural setting. It identifies whether an object exists, where it is located, and the number of items in an image. Object detection can also help your machine to identify various objects in non-annotated images on its own.

A bounding box is a perfect technique to label various objects within one photo or video. Take the example of an image of a street scene. It may feature pedestrians, sidewalks, bikes, vehicles, and trucks. You can tag each of these objects separately in the same picture or video to train your machine model to identify them.



As a more hi-tech form of image annotation, segmentation can analyze visual inputs to ascertain how objects within a photo are similar or different. It segments the image and processes it for tasks like image classification and object recognition. This type of image annotation forms the basis of multiple computer vision projects.

Segmentation falls under three types: sematic, instance, and panoptic. We discuss them below in detail:

  • Semantic segmentation classifies images with pixel-wise labeling of objects, such as a car, person, flower, and so on. It visualizes several objects with the same class label as one entity. While it can reveal the presence and location of objects, it does not reveal their size or shape. This method is suitable when you want to classify similar objects together. This is especially true for those objects that you don’t have to track or count across different images.

Semantic segmentation, street view


  • Instance segmentation visualizes several objects of the same class as unique individual instances. Essentially, it segments each object instance in an input image. The method can track and count the number, location, size, and shape of image objects. With instance segmentation, you can label every pixel inside the image in a process called pixel-wise labeling. You can also label the borders, making the border coordinates countable.
  • Panoptic segmentation merges the concepts of semantic and instance segmentation. It allows two labels to each of the image pixels: a unique semantic label and an instance id. Thus, no segment overlaps can occur. It is the most comprehensive form of segmentation. This is because merging the other forms of segmentation leads to a highly granular and meticulous illustration of the actual image. Panoptic segmentation, for instance, is employed with satellite imagery to identify changes in restricted conservation areas. This enables scientists to track changes in tree health and growth to ascertain how certain events like forest fires and construction have impacted the area.


Panoptic segmentation


Boundary Recognition

Boundary image recognition seeks to train computer vision models to recognize boundaries or lines of objects in images. Boundaries may include:

  • Borders of specific objects
  • Topographical areas depicted in an image
  • Man-made boundaries in an image

Self-driving cars, for example, rely on boundary recognition to identify traffic lanes, sidewalks, and land boundaries[4]. And drones are able to follow a specific course and steer clear of potential hurdles like power lines, thanks to boundary recognition. In the medical field, annotators can tag the borders of cells in medical images to discover abnormalities.

Image annotation techniques

Once you’ve picked your annotation method, the next step is choosing an image annotation technique. While the type of image annotation is the outcome you want to pull off for your visual data, the image annotation technique is how to achieve that label. This is supported by your data annotation tool. Often, the type of image annotation technique you choose is dictated by your use case.

Bounding Box

Bounding box entails drawing a square or rectangle around the target object. The boxes can either be 2-D or 3-D. This is the most basic image annotation technique owing to its simplicity and versatility. It is commonly used on objects that are somewhat symmetrical, like road signs, vehicles, and pedestrians.


Also called “dot annotation”, this image annotation technique involves plotting small dots across the image. The technique has several use cases. It is applicable in facial recognition to recognize facial features, expressions, and emotions. Landmarking image annotation technique can also label body position and alignment, as well as investigate the relationship between different parts of the body.


The polygon image annotation technique uses polygons around the target object’s location. Thus, it helps define the boundaries more accurately. It is applicable where objects are irregularly shaped, such as houses, cars, land areas, or animals.


Image masking is used to attract more attention to specific areas in an image and, at the same time, hide other unwanted areas.


Tracking image annotation assigns labels to and plots the movement of the target object across several video frames. Interpolation is a commonly used tool in tracking. It enables the annotator to label a single video frame.


The Polyline image annotation technique entails plotting unbroken lines comprising one or several segments. It works best to highlight crucial features that boast a linear appearance. A typical use case is in the context of self-driving cars, as the technique can define sidewalks, road lanes, or power lines.

Polyline image annotation technique, cars on the street


Use cases of image annotation

Image annotation has led to the creation of futuristic technologies that have revolutionized our lives today. These include:

  • Facial recognition technology uses annotated images of human faces to identify facial features and differentiate between various faces. Most smartphones have a facial recognition feature.
  • Image annotation is important in the security and surveillance industry. It helps with object detection, such as suspicious-looking bags. It also detects • questionable human habits.
  • In agriculture, it helps with detecting crop diseases. This is achieved by labeling images of both strong and disease-infested plants.
  • The medical field also employs the use of image annotation. For example, images of healthy and malignant tumors are annotated using pixel-precise annotation techniques. Hence, doctors can make timely and more precise diagnoses.
  • In wildlife conservation, drones rely on annotated images to identify wildfires and poaching.
  • In the robotics field, robots rely on image annotation to carry out tasks like planting seeds, arranging parcels, and mowing outdoor fields.

Final thoughts

Computer vision in artificial intelligence is popping up in various untapped fields. It has also improved the efficiency of existing industries[5]. To make computer vision models accurately perceive target objects in their natural habitat, they need to be trained using annotated images. Annotated images are created based on various types of image annotation and techniques.

And since you now understand what image annotation is, the various types of image annotation and techniques, as well as their use cases, you should be well equipped to take your business to unprecedented heights.


[1] What is mage Annotation: An intro to 5 Image Annotation Services. URL: Accessed March 21, 2022.
[2] Rodden, K. (1999). How do people organize their photographs? In BCS IRSG 21st Ann. Colloq. on Info. Retrieval Research, 1999., Accessed March 21, 2022
[3]Cloudfactory. com. Image Annotation Guide. URL: Accessed March 21, 2022
[4]  AI Solutions: Self-driving. URL: Accessed March 21, 2022
[5] URL: Accessed March 22, 2022


Computer Vision