in Blog

April 21, 2020

Adversarial Machine Learning


Edwin Lisowski

CSO & Co-Founder

Reading time:

10 minutes

Today, we want to tackle an entirely different side of machine learning. We want to talk about adversarial machine learning, which is an ML technique designed for testing purposes and research. Like every other human technology, machine learning is not perfect. It has various weak points and sometimes even errors and other defects. Adversarial machine learning is all about finding these defects, and, if possible, eliminating them. It’s an issue of paramount importance, as these defects can have a significant influence on our safety. Sometimes our lives as well. How so?

According to Wikipedia, Adversarial machine learning is a technique employed in the field of machine learning. It attempts to fool the machine learning models through malicious input. This technique can be applied for a variety of reasons. The most common being to attack or cause a malfunction in standard machine learning models[1].

Perhaps, one of the very first questions that come to your mind is, “What for?” The answer is very simple–to avoid problems in real-life conditions. You see, adversarial machine learning has an extensive scope of work. Additionally, it can be employed to verify almost any other ML algorithm or application.

Adversarial Machine Learning: How Simple Changes Can Fool ML Algorithms

One of the major problems with the vast majority of ML apps and programs is that they struggle to do their job with even slightly changed data source or database. The easiest way to show you this issue is on an example:

Let’s say we have a given image file. Many ML algorithms work with images (just to mention image processing, computer vision, image analysis, face recognition, etc.). If you change just a couple of pixels of your image (not randomly), the picture that still looks pretty much the same to the human eye. But it can become something totally different from the ML classifiers! And this effect can be obtained with just the simplest perturbations!

As a result, it takes just some straightforward changes to fool ML algorithms. Into ‘thinking’ that the object they analyze is something entirely different!

ML algorithms, hands

Adversarial Machine Learning in Action: Real-World Examples

We have three interesting examples of how the ML algorithms can be fooled by just small, unnoticeable disturbances:

  • Tesla: It turned out that AI models used by Tesla can be fooled simply by putting some stickers (adversarial patches) on the road, which the car interprets as the lane diverging, causing it to drive into oncoming traffic[2]. Furthermore, the researchers were able to trigger the auto wipers just by projecting noise on an electronic display placed in front of the vehicle, thus fooling the visual sensor of the system[3].
  • MIT: The MIT Computer Science & Artificial Intelligence Laboratory researchers managed to achieve a result, where the 3D-printed toy turtle that was misclassified by the ML algorithm as a rifle! On another occasion, baseball was classified as an espresso, no matter what angle the neural network viewed it from[4]!

According to Anish Athalye, Ph.D. candidate behind this project:

“This work clearly shows that something is broken with how neural networks work and that researchers who develop these systems need to be spending a lot more time thinking about defending against these sorts of so-called ‘adversarial examples’.”

  • KU Leuven[5]: It’s a research university in Belgium. Their researchers managed to fool the human identifying AI systems. And they have done it simply by printing a specific picture (adversarial patch) and holding it against their body[6]!

Potential Threats

We suppose you already see how dangerous these defects can be. The situation described in the third example could be used by intruders to get past any security cameras and get unnoticed into the building! The second example could be used during the war, where similar patches were used to hijack the attack on different targets. And the Tesla example? It could lead to hundreds and thousands of car accidents! And the list of potential threats by no means ends here! That’s why it’s extremely crucial to improve the ML algorithms constantly in order to make them secure against such attacks.

However, it’s not as nearly easy to do as it sounds. Let us explain why.

Adversarial Machine Learning

Why is it so difficult to devise the defence mechanisms?

The core problem is that it’s difficult, not to say impossible, to construct a model of the adversarial example crafting process. Adversarial attacks are non-linear and non-convex problems for the ML models, and that’s even including neural networks! As a result, we don’t have the necessary tools to ‘describe’ what would such an attack look like, or to cover all of its potential forms. Therefore it’s challenging to devise a versatile defence mechanism.

As OpenAI[7] states: Every strategy we have tested so far fails because it is not adaptive: It may block one kind of attack, but it leaves other possibilities open to an attacker who knows about the defence being used. Designing a defence that can protect against a powerful, adaptive attacker is an important research area.

As it turns out, even the specialized, ‘attack-proof’ algorithms can be broken by giving more computational firepower to the attacking mechanism. Currently, the ML algorithm that would be secure against every possible attack doesn’t exist, and this situation will not change for many years to come. But it doesn’t mean that we can or should throw in the towel. Our safety, and sometimes even lives, are at stake, that’s why the data scientists and ML specialists should tirelessly work in order to improve the safety of ML algorithms, to every possible extent.

We could end here, but we thought that it’s vital to show you the backstage and present some of the techniques and attacks used to train and test the machine learning algorithms.

Adversarial Machine Learning: Why Is It So Difficult To Devise The Defence Mechanisms?

Classification of the attacks

In general, there are two ways to classify forms of an attack:

The first way is based on the knowledge of the attacked system:

  • Black-box attack – the attacker has no knowledge of the ML model they are attacking
  • White-box attack – the attacker has access to the model or its parameters

The second way is based on the purpose of an attack:

  • Targeted attack – the attacker wants to fool the ML model, so it indicates other, specific target
  • Untargeted attack – the attacker wants to fool the ML model, so it indicates other, not-specified target (the only condition is it’s other than the real one)

To simplify this issue a bit, let’s use one of the previously mentioned examples. In the KU Leuven example, if the attacker used the targeted attack, the machine learning algorithm would show precisely the object the attacker wants it to show. Let’s say a cat instead of a human. If the attacker used the untargeted attack, it wouldn’t really matter if the machine learning algorithm showed a cat, a fish, or a tank. What’s important, it can’t recognize a human. The rest is less important.

As you can see, when it comes to classifying the attack forms, it’s the attacker’s intention and knowledge of what’s essential.

Now, let’s examine some of the most popular forms of an attack used to fool the machine learning algorithms.

Forms of an attack

There are at least four popular forms of an adversarial attack. Let’s take a closer look at each one of them:

The physical attack

It’s the most straightforward type of attack. Here, you simply physically add something to the image (as in our KU Leuven example). It can be a scarf, eyeglasses, or even a different hairstyle. Naturally, not every ML model can be fooled by such a small change, but for the facial recognition algorithms, it can be a challenge to identify such a “modified” person.

The noise attack

Perhaps, you are familiar with noise visible in photos. Here, it works almost exactly the same. The noise attack is all about adding additional noise–a random set of pixels containing no information to the picture in question. As it turns out, adding even a  small amount of noise to an image can make the ML algorithm think that it presents something else. The rule of thumb is–the more sophisticated your ML algorithm is, the more noise it takes to fool your algorithm.

The semantic attack

This model is based on negating the image. Changing colors can be successful if the output image presents something different (in the eyes of the machine learning algorithm). In other words, if the negated image loses characteristic features of the original image, the ML algorithm can be easily fooled.

The deepfool attack

This attack is aimed at changing the original image with a minimal amount of perturbation possible. As a result, the modified image looks exactly the same to the human eye. There are absolutely no visible differences. As a result, the machine learning algorithm is fooled and shows, instead of the original image, the image of a class that’s closest to it (but still different). For instance, a diaper can be classified as a plastic bag (the closest class–looks very similar), and train as a truck.


adversarial attack

Adversarial Machine Learning: How to defend?

Unfortunately, most of the techniques used to devise the versatile defence system are not efficient in a clash with adversarial attacks. According to[8], currently, there are only two forms of a defence that have already been proven on the battlefield and turned out to be efficient:

Adversarial training. The trained machine learning algorithms are exposed to a lot of adversarial attack examples and trained not to be fooled by each of them. In other words, the ML models are trained on the adversarial images along with regular images.

Defensive distillation. Machine learning model is trained to output many probabilities of different classes, rather than hard decisions about which class to output.

There is also a third method, called Random Resizing and Padding[9], in which the picture is being randomly resized and padded before it’s classified. According to “MITIGATING ADVERSARIAL EFFECTS THROUGH RANDOMIZATION” paper published by the Johns Hopkins University.

“The input image first goes through the random resizing layer with a random scale applied. Then the random padding layer pads the resized image in a random manner. The resulting padded image is used for classification.”

This method has also proven its usefulness. But there is still a lot to do in order to make the machine learning models and algorithms resistant against as many forms of attack as possible.

If you are interested in implementing AI or machine learning consultingdrop us a line! We will gladly show you all the possibilities that await you. It can be a whole new step in your company’s growth. Let’s take it together!


[1] Adversarial machine learning. Apr 18, 2021. URL: Accessed Apr 21, 2020.
[2] Sigal Samuel. It’s disturbingly easy to trick AI into doing something deadly. Apr 8, 2019. URL: Accessed Apr 21, 2020.
[3] Nir Morgulis, Alexander Kreines, Shachar Mendelowitz, Yuval Weisglass. Fooling a Real Car with Adversarial Traffic Signs. URL: Accessed Apr 21, 2020.
[4] Adam Conner-Simons. Fooling neural networks w/3D-printed objects. Nov 2, 2017. URL: Accessed Apr 21, 2020.
[5] KU Leuven. Apr 15, 2021. URL: Accessed Apr 21, 2020.
[6] Bob Yirka. Using a printed adversarial patch to fool an AI system. Apr 24,2019. URL: Accessed Apr 21, 2020.
[7] Ian GoodfellowNicolas PapernotSandy HuangRocky DuanPieter Abbeel & Jack Clark. Attacking Machine Learning with Adversarial Examples. Feb 24, 2017. URL: Accessed Apr 21, 2020.
[8] Ibid.
[9] Arunava Chakraborty. Introduction to Adversarial Machine Learning. Oct 16, 2019. URL: Accessed Apr 21, 2020.


Machine Learning