How to Safeguard AI Systems from Malevolent Attacks?
How to Safeguard AI Systems from Malevolent Attacks? From assisting in driving to aiding in medical diagnostics and interacting with chatbots, AI systems have become integral to various facets of our lives. However, the ubiquitous presence of AIs raises growing concerns about their security and reliability in the face of malicious attacks. The National Institute of Standards and Technology (NIST) and its collaborators recently shed light on this issue by identifying vulnerabilities and tactics used by attackers to manipulate the behavior of AI systems.
In their publication titled "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations" (NIST.AI.100-2), NIST researchers and their partners comprehensively describe different types of attacks that AI systems face, vulnerabilities in AI and machine learning (ML), and strategies to mitigate these threats. Their goal is to raise awareness among AI developers and users about potential risks and help them formulate effective defenses.
Attacks on AI systems often exploit the inherent vulnerability of these systems: their reliance on data. AIs are trained on vast datasets, and any alteration or introduction of corrupted data during training or after deployment can compromise their functionality. The report provides an example of chatbots that could learn to respond with offensive or racist language when their safeguards are circumvented by carefully crafted malicious prompts.
Apostol Vassilev, a computer scientist at NIST and one of the authors of the publication, comments:
"We provide an overview of attack techniques and methodologies that encompass all types of AI systems. We also describe current mitigation strategies reported in the literature, but these available defenses currently lack strong assurances that they fully mitigate risks. We encourage the community to find better defenses."
He adds:
"Most software developers need more people to use their product so it can improve with exposure. But there is no guarantee that the exposure will be good. A chatbot may spit out incorrect or toxic information when prompted with carefully designed language."
The researchers identify four main types of attacks against AI systems:
- Evasion Attacks: These attacks occur after the deployment of AI and aim to modify inputs to alter its response. For example, incorrect road markings could induce an autonomous vehicle to make a wrong navigation decision.
- Poisoning Attacks: These attacks occur during the learning phase by introducing corrupted data to influence the AI model. For instance, examples of inappropriate language can be injected into the training data of a chatbot, affecting its ability to interact appropriately with users.
- Privacy Attacks: These attacks aim to obtain sensitive information about the AI or the data on which it was trained, often with the goal of compromising its security or leveraging it for malicious purposes.
- Abuse Attacks: This type of attack involves introducing incorrect information into a legitimate source, such as a webpage or online document, which the AI subsequently integrates, with the aim of diverting the intended use of the AI system.
According to Alina Oprea, a professor at Northeastern University and co-author of the report:
"Most of these attacks are quite easy to mount and require minimal knowledge of the AI system and limited adversarial capabilities. Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would represent a very small percentage of the entire training set."
The authors then break down each of these attack classes into subcategories and provide approaches to mitigate them, while acknowledging that defenses designed so far by AI experts to counter these adversarial attacks are at best incomplete.
According to Apostol Vassilev, it is crucial for developers and organizations seeking to deploy and use AI technology to be aware of these limitations.
He states:
"Despite significant progress made by AI and machine learning, these technologies are vulnerable to attacks that can cause spectacular failures with disastrous consequences. There are theoretical issues related to securing AI algorithms that simply have not been resolved yet. If someone says otherwise, they're selling snake oil."