Adversarial Machine Learning
- Overview
Machine learning (ML) is a field within artificial intelligence (AI) that focuses on the ability of computers to learn from the data they are provided without being explicitly programmed for a specific task.
Adversarial machine learning (AML) is the process of extracting information about the behavior and characteristics of an ML system and/or learning how to manipulate the inputs of an ML system to achieve preferred outcomes.
Although AI includes a variety of knowledge-based systems, the data-driven approach to ML introduces additional security challenges during the training and testing (inference) phases of system operation.
AML focuses on the design of ML algorithms that can withstand security challenges, studying attacker capabilities, and understanding the consequences of attacks.
- Applications of Adversarial Machine Learning
Adversarial machine learning (AML) is a machine learning (ML) technique that uses deceptive input to trick ML models. It can also involve learning how to manipulate inputs to get a preferred outcome.
AML can be used in many applications, but it's most commonly used to cause a malfunction or execute an attack on a ML system.
AML includes:
- Generating adversarial examples
- Intentionally designed inputs that are created to mislead the model into making inaccurate predictions
- Detecting adversarial examples
- The process of identifying inputs that are specially created to deceive classifiers
- Training models to be robust against adversarial examples A machine learning technique that involves training models to be robust against adversarial examples
One example of an adversarial attack on ML systems is when someone puts stickers on a stop sign that a car's ML system is trained to recognize.
- Adversarial Attacks in AI
Adversarial Attacks in AI typically involve making small, imperceptible changes to input data, such as images or text, in order to deceive the machine learning model.
These changes are carefully designed to exploit the model's weaknesses and lead to incorrect predictions or biased results.
Most existing ML classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a ML classifier to misclassify it.
- Techniques of Adversarial Attacks
Adversarial attacks are a type of ML technique that intentionally manipulates a model's decision-making process to cause misclassification or faulty outputs.
Adversarial attacks work by:
- Analyzing the parameters of a machine learning model, such as a neural network
- Calculating changes to an input that cause a misclassification
- Making small, imperceptible changes to input data
- Taking advantage of the model's vulnerabilities to cause misclassifications or faulty outputs
- Adversarial attacks can occur during a model inference stage. For example, an attacker can generate an adversarial example from a DNN-based image classifier and feed it back into the image classifier.
- Models can learn to better detect and resist adversarial attacks by incorporating adversarial examples during the training process.
One example of an adversarial attack is when someone puts stickers on a stop sign, fooling the car into thinking it's a 45-mph speed limit sign.
The Carlini & Wagner (C&W) attack is a method that generates adversarial examples by optimizing the smallest perturbation to the input data that causes a misclassification by the target model.
The C&W attack has been shown to be able to defeat state-of-the-art defenses, such as defensive distillation and adversarial training.
[More to come ...]