Cloudflare has introduced Firewall for AI, an AI-powered security solution designed to protect large language models from misuse and attacks. The purpose of Firewall for AI is to provide a layer of protection against problematic requests entering the models, preventing their abuse. It serves as an advanced web application firewall (WAF) specifically designed for large language models, offering vulnerability detection and visual tools for managing the security of AI applications.
As more businesses adopt large language models, the integration of these models with applications and the internet introduces new vulnerabilities that can be exploited by malicious attackers. Cloudflare highlights that beyond the traditional web and API application security threats such as injection and data leaks, large language models face uncertainties in their operations and need protection against issues like model hijacking and unauthorized execution.
Firewall for AI by Cloudflare provides protection against three types of model attacks: Prompt Injection, Model Denial of Service, and Sensitive Information Disclosure. By offering these protective measures, it aims to reduce the risk of abuse.
The deployment of Firewall for AI follows the same principles as traditional web application firewalls. Each API request containing prompts for large language models is scanned by the firewall to detect potential attack patterns and characteristics. Firewall for AI can be deployed either on the Cloudflare Workers AI platform or in front of models hosted on third-party infrastructures. It can be used in conjunction with Cloudflare AI Gateway, allowing control and configuration of Firewall for AI through the web application firewall control plane.
Similar to volumetric attacks faced by traditional applications, Model Denial of Service involves consuming a significant amount of resources, leading to degraded service quality or increased operational costs. Firewall for AI mitigates this risk by implementing rate limiting policies to control the rate of dialogue requests, thereby reducing the risk of denial of service.
In terms of sensitive information detection, Firewall for AI enables users to employ Sensitive Data Detection (SDD) web application firewall management rules to identify personally identifiable information (PII) returned by the models in their responses. Users can review the matching status of SDD in web application firewall security events.
Currently, SDD provides a set of management rules that scan for financial information like credit card numbers and sensitive data such as API keys. Cloudflare allows users to create custom rules. Additionally, Cloudflare plans to extend SDD to prevent users from sharing PII or other sensitive information with external large language model providers like OpenAI or Anthropic.
Model abuse encompasses various categories, including Prompt Injection, where attackers manipulate language models through specially crafted inputs. It also includes submitting requests that result in delusional, inaccurate, offensive responses, or requests deviating from the intended topic. Cloudflare emphasizes the importance of preventing AI applications from generating toxic or offensive responses.
Firewall for AI conducts a series of detections for prompt injection and various abuses. The firewall automatically identifies prompts embedded in HTTP requests or JSON bodies. Once enabled, the system analyzes each prompt and assigns a score based on the likelihood of malicious intent, along with bot scores or attack scores. Users can create rules to block requests that meet specific scoring criteria, effectively protecting large language applications.
To utilize Firewall for AI, enterprise users need to subscribe to the Application Security Advanced plan. The advanced rate limiting and SDD features are currently available, while the prompt validation feature for Firewall for AI is still under development and expected to be officially launched in a few months.