Content filter bypass via prompt injection
A customer-facing chatbot for a financial services firm used a system prompt to enforce content restrictions.
We bypassed the content filter entirely using indirect prompt injection embedded in user-supplied input, forcing the model to reveal internal instructions and produce policy-violating outputs. The filter was redesigned with input sanitisation and output validation layers.