How image augmentation improves computer vision performance

This article presents “image blending”, a image augmentation method. It explains the seamless blending of images to generate realistic training data for object detection. A practical application is illustrated in the detection of starfish on coral reef images.

Hugo Naya Profile Picture
Hugo Naya Data Scientist

What is data augmentation?

Machine learning models are occupying an increasingly important place in research and industry, due to impressive progress, particularly in tasks previously reserved for human experts. To learn, models go through a training phase for which they generally need labeled data. A general rule is that the more data we have available, the more performance the model gains given that it will be able to capture more phenomena (patterns) during its training. This observation is all the more true as the problem and the data are complex.

Example of labeled image

In the context of object recognition topics, it is not uncommon for the training set to be small or non-existent. It is indeed expensive to manually label large volumes of images, especially when a certain expertise is necessary (in the context of medical imaging for example).

One of the most used methods to address this problem is Data Augmentation (DA), in other words the artificial increase in the size of the dataset using image manipulation methods. This increase is conventionally achieved by performing operations modifying the appearance of the image, without modifying its semantics: for example, by changing the brightness, by performing a rotation/mirror effect, by changing the scale, or again adding noise.

In this article, we will focus on a more advanced augmentation method: image blending. Image blending is a process of transferring an image from the source domain to the target domain while ensuring that the transformed pixels conform to the target domain to ensure consistency.

The idea is to increase the number of images available for training an object detection model, by directly adding the target object into different backgrounds. We thus obtain labeled images very easily: knowing where the object was introduced, we can create at the same time the corresponding bounding box (which serves as a label).

 

Principle of image blending

More specifically, we will focus on Gradient Domain Blending, which allows for a homogeneous blend of images. The interest is to generate images without boundaries between the background and the added object, so that they are sufficiently realistic to be useful for training a detection model.

The principle, inspired by this work, consists of solving the Poisson equation associated with the gradient of the image on the area, defined by a mask, where the mixing must take place. The mask makes it possible to better specify where the object of interest is located on the image, and to define a gradient to smooth the transition between the two images.

The algorithm thus takes four elements as input:

  • a background image,
  • an image containing the object to add,
  • the coordinates where to add it,
  • the mask describing the shape of the object

The output is a new image, already labeled, containing the object of interest. Thus, by repeating the process on a sufficiently large number of background images, we can create a new database ready to be used for training an object detection algorithm like YOLO.

Opposite, an example of an object of interest and the corresponding mask: the white area more or less defines the position of the object on the image.

 

Application: CoTS

The example of application that we are going to study comes from a dataset made available for a Kaggle competition proposing the detection of a family of starfish (Crowns of Thorns Starfish, or CoTS) on video images of the large Australian coral reef.

The available dataset contains 23,000 images, of which around 5,000 include the objects to be detected. The idea is to use the 18,000 seabed images that do not contain useful information as a basis for this augmentation method.

The first step consists of building a dataset, in our case of around fifty entries, of objects to add to the funds in order to simulate a diversity of cases. Most of the manual work is at this stage, to collect the images and define the corresponding masks. This dataset is then augmented using classic techniques (mirror, rotations, contrast, etc.) in order to obtain the most varied object base possible.

But generating images by randomly positioning the object on a background is not enough: in order to obtain a model that can be used in real cases, the generated images must be plausible examples in reality. Let's imagine, for example, that we are trying to develop a bicycle detection algorithm on the road. If we use this method to augment an existing dataset by inserting bicycle images on background images containing a road, we want to insert the 'bicycle' objects on the road and not at the top of a tree on the edge for example.

Thus, the second step consists of analyzing the real images of the dataset in order to draw inspiration from them. We are particularly interested in the size of the objects to be detected, and their spatial distribution on the real images.

                                 Image Blending Aqsone

Position density of CoTs on real images (left) and distribution of object sizes (right)

Position density of CoTs on real images (left) and distribution of object sizes (right)

The distributions thus obtained will allow us to construct images whose diversity comes as close as possible to reality.

Finally, we move on to the generation of new images:

  1. As input, a background image not containing an object;
  2. We randomly select one of the images from the previously constructed object dataset;
  3. We generate a random object size and position, the probabilities being based on the densities described above, calculated on the real images;
  4. We apply the blending algorithm described above;
  5. The output is an image including an object of interest for detection, already labeled thanks to the knowledge of the position where the object was introduced.

Below are 2 before/after examples of augmented images with the corresponding label, which a detection model could use as a training set.

 

Example 1

 

Example 2

Image Blending AqsoneImage Blending Aqsone

 

 

For further

Even after analyzing the real images to understand their context, it is not impossible to generate aberrant images. For example, in the example images above, we could inadvertently introduce objects to be detected in areas where it is impossible to have real objects (such as areas where there is only water ). Without additional verification, introducing this unrealistic data into the training set could cause the model to learn bad patterns.

One solution is to start by training a first model on the non-augmented dataset. We can then, on a principle similar to that of GANs, use this model to detect the objects present in our new images: if the model based on real data manages to identify the new “false” objects, we can infer that they are realistic enough to be usable as a training base.

 

Conclusion

In conclusion, data augmentation is an essential component of the process of improving predictive models. More particularly, image blending allows us to go further in areas where little labeled data is available. However, we must not fall into the trap of increasing without understanding the context of the data and the purpose of the machine learning model, so as not to introduce aberrant data into our model.

Do you have an idea for a use case involving automatic image analysis but think your training dataset is too small? Contact us to identify how data augmentation could be useful.

 

 

A must see

Most popular articles

Do you have a transformation project? Let's talk about it !