Why We Built Sentinel: Prompt Injection Defence for AI Agents That Actually Process Documents

Update — March 2026

The LangChain, CrewAI, Haystack, and AutoGen adapters mentioned as future work in this post are now live in v0.9.0. See the Framework Adapters docs →

It started with our own agents

One of us was running three AI agents on OpenClaw. Around the clock. They read emails, scanned documents, pulled data from Google Sheets, browsed the web for research. Friends started doing the same thing. Pretty quickly there were a dozen agents between us, all processing external content as part of real work.

Then we started seeing prompt injection attempts in the wild. Hidden instructions buried in web pages. Encoded payloads in scraped content. One attack split across multiple tool calls so no single request looked suspicious on its own.

The moment you want to use agents for actual business, this becomes a serious problem. An agent that can send emails can be tricked into sending emails. One that can read your files can be tricked into exfiltrating them. Every capability is also an attack surface.

We went looking for something to fix it. What we found was not good enough.

The tools that exist miss the actual threat

Every prompt injection tool on the market does the same thing. You send it a prompt, it tells you if it looks dodgy. That is fine as far as it goes. But it does not go very far.

Your agent opens an Excel file with hidden instructions in a cell comment. It reads a Word doc with white-on-white text containing commands. It parses an email with encoded payloads in the HTML. None of these tools ever see any of that. They only check what the agent sends to the model, not what the agent reads beforehand.

Lakera is the market leader. VC-backed, enterprise pricing. It checks prompts against their cloud API. Three problems: it never sees the documents your agent processes, your data has to leave your infrastructure for every check, and once you pass the free tier you are talking to a sales team.

NVIDIA built NeMo Guardrails. It is a conversational flow framework with prompt injection as a side feature. It needs an LLM call for classification, so you are paying for an extra model inference on every single request just to check if the input is safe.

Then there are the open source binary classifiers. LLM Guard, Rebuff, a few others. They answer one question: is this text injection, yes or no. No document scanning, no format support, no session monitoring. Rebuff hasn't had a meaningful commit since 2023.

All of them protect the front door. None of them check the windows.

What Sentinel actually does

Sentinel scans 20+ content formats. Excel, Word, PowerPoint, PDF, CSV, emails, calendar invites, image metadata, Google Docs, Google Sheets, Google Slides, OneDrive. Each scanner knows how to extract hidden content from its format. Cell comments, speaker notes, invisible HTML, document metadata. Everything gets run through the detection engine.

The detection engine is pattern-based, not classifier-based. 60+ detection patterns, no LLM call needed, no external API, zero dependencies for core scanning. It catches hidden HTML elements, encoding bypass attempts, instruction pattern matching, obfuscated exec commands. It strips suspicious content and rescans what remains, catching attacks that use invisible characters to split injection phrases across what looks like normal text.

Then there is shard detection. Most tools look at individual requests. But attackers can split instructions across multiple tool calls. Fragment A arrives in one document, fragment B in another, and they combine in the context window. Sentinel watches the full session and correlates patterns across calls in real time. We found a 29-call blind spot in our own systems before we built this.

On top of all that, there is a policy engine for tool calls. Domain allowlists, parameter filtering, append-only request logging. If an injection gets through the content scanner and tries to make your agent call a tool it shouldn't, the policy engine catches it.

Why we built on OpenClaw first

We didn't build a product and go looking for users. We were the users. OpenClaw agents read emails, browse the web, process spreadsheets, handle calendar invites. A growing group of us had agents running on real workloads, for real businesses.

Building there first meant testing against real attacks on real data. The 800+ tests in the suite are not synthetic benchmarks. They come from actual attack patterns we observed and techniques documented in security research.

The antivirus comparison

You wouldn't run a computer without antivirus. You shouldn't run an AI agent without prompt injection defence.

Like antivirus, it needs to be always on, not a one-time scan. It needs to update continuously as new attack vectors appear. It needs to run locally so your data stays on your infrastructure. And it needs to be affordable. We charge five pounds a month (ten dollars outside the UK). Security shouldn't be an enterprise-only thing.

The alternative is building it yourself, which takes months and needs constant maintenance. Or sending every prompt to an enterprise API, which is expensive and still doesn't scan documents.

What comes next

Sentinel is live and protecting agents in production now. We are working on framework adapters for LangChain, CrewAI and AutoGen so it drops into any agent stack. CI/CD integration to scan prompt templates in pull requests. PII detection and redaction.

The bit that gets interesting: we are building AI agents that defend other AI agents. Scout and Analyst agents that research new attack vectors and generate detection rules automatically. A security system that learns and improves on its own instead of waiting for someone to write a new regex. The technology defending itself with itself.

Try Sentinel

Get Sentinel Standard for continuous protection at five pounds a month.

Assess Your Exposure Get Sentinel Standard