
LLM application VAPT for real-world AI risk
Validating AI guardrails against real-world adversarial input

An LLM application penetration test is a manual security assessment that tests how applications built on large language models (LLMs) fail under real attack conditions. It focuses on risks unique to LLM-enabled systems, including prompt injection, insecure output handling, and excessive agency. The core risk is that untrusted inputs can cross trust boundaries through prompts, retrieval pipelines, plugins, memory, or agent workflows, then trigger unsafe actions in connected systems. This differs from a standard web application penetration test, which focuses on HTTP-layer flaws and does not address the LLM-specific attack surface. Swarmnetics conducts LLM application VAPT through Offensive Security Certified Professional (OSCP) and CREST Registered Penetration Tester (CRT) certified consultants, aligned to the Open Web Application Security Project (OWASP) Top 10 for LLM Applications.

When LLM applications become breach pathways
Don’t let prompts become attack paths

In September 2025, security researchers at Noma Security disclosed ForcedLeak — a critical indirect prompt injection vulnerability in Salesforce’s Agentforce platform rated CVSS 9.4. By placing malicious instructions inside a routine web-to-lead contact form, an external attacker could cause the AI agent to exfiltrate sensitive CRM data — including customer records, sales pipeline details, and internal communications — without any user interaction. An LLM application VAPT would have identified the indirect prompt injection vulnerability and absent LLM trust boundary controls before attackers could exploit them.
For organisations deploying LLM-enabled systems, the issue is not only model misuse but whether retrieved content, tool outputs, and downstream consumers can be manipulated to bypass intended controls. Generative AI deployments must have demonstrable safeguards over personal data processed by AI systems. Regular testing provides the documented evidence of control effectiveness that regulators and auditors expect.

Testing the trust boundaries attackers target
Because guardrails only matter if they hold under pressure

Swarmnetics assesses the LLM application against the OWASP Top 10 for LLM Applications. For most organisations, a LLM application VAPT is the clearest way to validate how those controls hold up under abuse. The framework is designed for machine learning systems and covers attack vectors that standard web application testing methodologies do not. That includes system prompt handling, retrieval-augmented generation flows, tool calling, memory and state handling, agent permissions, and the way model outputs are consumed by downstream applications.
In a black-box engagement, consultants test exposed interfaces as an unauthenticated external attacker. They look for exploitable vulnerabilities in prompt handling, output flows, and plugin behaviour without prior knowledge of the system. In a grey-box engagement, which is often needed for meaningful coverage of internal prompts, agent logic, retrieval pipelines, and tool permissions, we use test credentials and supporting documentation to assess authenticated attack vectors, access controls, and data protection mechanisms that are not visible from the outside. Both approaches use Burp Suite Professional, custom LLM testing scripts, and specialised tooling to evaluate prompt behaviour, output handling, orchestration logic, and trust-boundary enforcement. Consultants also craft prompts to exploit identified vulnerabilities and verify real-world impact.
Yes, we are CREST accredited
Our core team is based in Singapore and consists of CREST certified penetration testers who are also Offensive Security Certified Professional (OSCP) certified. The team has delivered numerous penetration testing projects for customers in Singapore and other locations, from large multinational enterprises to small and medium business, and across various industries.

Inside the LLM application attack surface
What gets tested

A LLM application VAPT covers the following scope, drawn from the OWASP Top 10 for LLM Applications:
- Direct and indirect prompt injection via user inputs and external content sources
- Insecure plugin configurations and insufficient access controls on APIs and model assets
- Training data, retrieval corpus, and supply chain poisoning that could influence model behaviour or downstream decisions
- Model theft via systematic querying and output analysis
- Output handling failures leading to cross-site scripting or downstream code execution
- Sensitive data exposure via LLM responses leaking personal or system information
- Denial-of-service via LLM resource exhaustion and rate-limit bypass
- Conventional vulnerabilities, including SQL injection and code execution, when LLM output is passed into downstream system calls without adequate validation or control
- Excessive agency and unauthorised actions from weak access controls
- Emerging threats from agentic LLM deployments, including cross-agent privilege escalation


