I have summarized (in a series of blogs) emerging security issues based on the following references. You should read the papers and listen to the recent talks as listed below. All credit truly belong to these researchers. Fascinating stuff!
- Adversarial Examples in Machine Learning- Patrick McDaniel & Nicolas Papernot, 2017
- Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks (IEEE Symposium on Security and Privacy) – Nicolas Papernot, 2016
- Crafting adversarial input sequences for recurrent neural networks – Nicolas Papernot, Patrick McDaniel, Ananthram Swami and Richard Harang, 2016
- Ensemble Adversarial Training – Attacks and Defenses, Florian Tramer, 2017
- Explaining and Harnessing Adversarial Examples, Ian Goodfellow, 2015
Part 1: Adversarial Examples in ML
The most popular reference image to illustrate the problem of adversarial examples is shown below.
On the left is an image of Panda, if you feed it into a state-of-art machine learning model then it should be able to recognize image as that of a panda (or more accurately, lets say it is “pretty sure that is a panda” – 57.7% confidence). So, it seems like the model has learnt something – because it can classify images correctly. However, you can also very easily find imperceptible perturbations that you see in the middle, and apply it to the image on the left to gibe the image on the right which is exactly the same to the human eye, and yet the same model will tell you with almost certainty (higher than before) that it is a gibbon (like “I’m certain that is a gibbon” – 99.3%).
Manipulating (attacking) ML systems
In another example below, we illustrate an autonomous vehicle using a deep neural network (DNN) to recognize a stop sign (with high probability). As shown in the figure, it functions correctly in this case.
Now, assume an adversary wants to manipulate the system. Say, I have some way of manipulating the image by painting something the stop sign or some way of getting into the actual data pipeline inside the car to modify the incoming images. In this case, as an adversary, I would want to control what the vehicle does. In other words, I want the car to misunderstand what it is seeing – as illustrated below. Here, we are trying to induce the machine learning system to make the wrong decision.
This is an example of an attack. The wrong decision is controllable on some level by the adversary. There exist vectors that the adversary can use to harm these machine learning systems.
ML security problems across domains
Images: There’s been a whole bunch work in the industry in showcasing these problems with image classifiers. That has been because, it is very easy to visually show this problem – where we can’t tell the difference in the input but the machine makes some dramatic misclassifications.
Physical Objects: Some recent interesting work has shown that you translate these input perturbations to the physical world. So, you can actually perturb the pixels in an image to make the machine misclassify and then you can actually print out these images, scan them and feed them into the model – to have it misclassify image. Other work has shown that by putting on glasses, a man’s image can actually fool facial recognition models to think that it is actually looking at a woman or a famous actress. This can be problematic for self-driving cars, because the perturbations introduced by some stickers on a street sign can induce the DNN in the car to misclassify it as a different sign. In real-life these models actually learn the images from different angles and distances, etc. and aggregate all the information. However, there is evidence that you can make some robust perturbations that can fool such machine learning models.
This has now gone beyond the realm of images and you can see adversarial examples for:
Malware Classifiers: These classifiers are closer to tangible security problems. For example, you can change the behavior of a malware classifier for android just by modifying a few entries in the manifest file. It is surprising that you can achieve such a significant change just by changing metadata (that does not really impact the behavior of the application).
Text Understanding: You can fool text understanding DNN systems. For example, we can use a particular instance of adversarial examples called adversarial sequences to mislead Recurrent Neural Networks (RNNs). into producing erroneous outputs.
Speech: You can also fool Ml systems for audio or speech recognition using similar techniques.
These examples seem to suggest problems for secure deployment of machine learning models in practice. And this lack of robustness for machine learning models is somewhat ubiquitous nowadays.