Software security has come a long the last couple of decades. Hard to believe now but there was a time when penetration testing was done only at the host / network layer and security teams were completely aware of application level attacks like SQL injections, Cross Site scripting etc. As a result attackers started bypassing traditional defenses like network firewalls via attacks at the application layer and we saw controls popping up like Web application firewalls (WAF ) , Appsec reviews , securing code etc. Today software security has evolved to cover technology like Serverless, API controls and pipeline security via the DevSecOps movement

What is the point of this history lesson ? Well to recap that today’s software sits on top of technology lessons from before AND that we are in danger of repeating the same mistakes with artificial intelligence based systems not being covered in security reviews and penetration testing

The Emerging threat

Hackers are a notoriously clever bunch as security teams know. As soon as one layer of security is enforced in an application , they will move onto targeting new emerging technologies which do not have the same level of maturity. Keeping this in mind Artificial Intelligence based systems of today are the web applications of the past i.e. a completely new surface area of attack which cyber-security teams are unaware of.

For companies implementing AI based systems , cyber-security teams typically conduct penetration tests of the software stack right up to the application layer but miss one critical point:

An Artificial Intelligence (AI) based system's most critical feature is the ability to make decisions based on the data provided. If this data and decision-making capability is compromised or stolen then the entire AI ecosystem is compromised 

Most cyber-security teams of today unfortunately are not aware of the new types of attacks which AI systems introduce. In these attacks , the attacker manipulates the unique characteristics of how AI systems work to benefit his malicious intentions. Many commercial models have already been manipulated or tricked and this type of attack is only set to increase with the massive adoption of AI going forward.

Unique types of Artificial Intelligence attacks

The new threat surface which AI introduces are pretty diverse. Data can be “poisoned” either intentionally or unintentionally leading to manipulated decisions. Similarly an AI logic or data can be “inferred” leading to data extraction and a model can be “evaded” once attacker figure out the underlying decision making logic.

Artificial Intelligence attacks

If this sounds a bit too vague then the below table goes into more details of these attacks :

Attack TypeDescription 
Data Poisoning Attacker can poison the training data that is being used to train the Machine Learning model. By contaminating this data source, the attacker can create a “backdoor” as he knows the model has been trained on faulty data and knows how to take advantage of it. This can facilitate further attacks such as model evasion mentioned further on. 
Model Poisoning Like the previous attack but this time the attacker targets the model instead of the data. A pre-trained model is compromised and injected with backdoors which the attacker can take advantage of to bypass its decision-making process. 
Most companies do not build models from scratch but use pre-training models which are commonly available such as ResNet from Microsoft or Clip OpenAI. These models are stored in a Model Zoo which is a common way in which open-source frameworks and companies organize their machine learning and deep learning models. This is like a software supply chain attack in which an attacker can poison the well for many users
Data Extraction Attacker can query the model and understand what training data was used in its learning. This can result in the compromise of sensitive data as the attacker can infer the data used in the model’s training and is especially dangerous if sensitive data was involved. This type of attack also called “membership inference” does not require access to the model’s functionality and can be done just by observing the model’s outputs
Model Extraction Attacker can create an offline copy of the model by repeatedly querying it and observing its functionality. The fact that most models expose their APIs publicly and do not properly sanitize their outputs can facilitate these attacks. This technique allows the attacker to deeply analyze the offline copy and understand how to bypass the production model 
Model Evasion Attacker tricks the model by providing a specific input which results in an incorrect decision being made. This is usually accomplished by observing the model in action and understanding how to bypass it. For example, an attacker can attempt to trick AI-based anti-malware systems into not detecting their samples or bypass biometric verification systems. 

Make no mistake, as AI adoption increases; the above attacks will become as common as SQL injections are today and be referred to in the same breath. One of the best solutions for protecting against these attacks is early detection by incorporating them into current penetration testing practises.

Pen testing Artificial Intelligence applications

The good news is that you don’t have to start from scratch. If you have ever been involved in penetration testing or red teaming as part of a cyber-security team, then you might be familiar with the MITRE ATT&CK framework which is a publicly accessible framework of adversary attacks and techniques based on real-world examples. 

It is used globally by various public and private sector organizations in their threat models and risk assessments. Any person can access it and understand the tactics and techniques that attackers will use to target a particular system which is very useful for people involved in penetration testing or red teaming. 

This popular framework was used as a model to create MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems), which is described as 

“a knowledge base of adversary tactics, techniques, and case studies for machine learning (ML) systems based on real-world observations, demonstrations from ML red teams and security groups, and the state of the possible from academic research

ATLAS follows the same framework as MITRE, so it is very easy for cyber-security practitioners to study and adopt its techniques when they want to test their internal AI systems for vulnerabilities and security risks. It also helps in creating awareness of these risks amidst the cyber-security community as they are presented in a format, they are already familiar with.

AI Penetration testing tools

Standard security tools usually do not have AI-based techniques built into them which can assess a model’s vulnerability to risks like model inference or evasion. Thankfully there are free tools available that cyber-security teams can use to supplement their existing penetrating testing toolkits. These tools are open source, but you can look for commercial alternatives also. 

Whichever type you prefer make sure the tools have the following features. 

  1. Model agnostic: It can test all types of models and is not restricted to any specific one 
  2. Technology agnostic: It should be able to test AI models hosted on any platform whether it is on-cloud or on-prem. 
  3. Integrates with your existing toolkits: Should have command line capabilities so that scripting and automation are easy to do for your security teams. 

Some of the free tools you can find are

  • Counterfit by Microsoft:  Described by Microsoft as “an automation tool for security testing AI systems as an open-source project. Counterfit helps organizations conduct AI security risk assessments to ensure that the algorithms used in their businesses are robust, reliable, and trustworthy”. Counterfit provides a great way to automate and test attacks against AI systems and can be used in red teams and penetration tests. It contains preloaded AI attack patterns which security professionals can run from the command line via scripts and can integrate with existing toolkits 
Counterfit for AI testing
Counfterfit by Microsoft
  • Adversarial Robustness Toolbox (ART) is described as “a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks” 
Artificial Intelligence attacks
ART toolkit for pentesters

For these tools to be effective make sure you map it to the ATLAS framework so that you can align it with a common standard. You can use these tools both for red teaming / penetration testing and for conducting vulnerability assessments of AI systems. Use them to regularly run scans of your AI assets and build a risk tracker of AI-specific risks. By tracking these risks over time, you can see an improvement in your security posture and monitor progress over time. 

Another valuable resource to get better context and awareness of attacks would be the Atlas case studies page listed here. This page is a listing of known attacks on production AI systems and can be used by security teams to better understand the impact on their systems. 

I hope this was useful to give you a starting point for adding AI penetration testing to your cyber-security assurance activities. Rest assured this field is going to explode in popularity in the coming years due to the massive increase in AI adoption and the interest which cyber-criminals are taking in misusing AI.

Good luck on your AI security journey! If you are interested in learning more, then check my book on AI security here and my course on Udemy on the same topic. More articles on Artificial Intelligence can be found here