Hidden prompt injection: The black hat trick AI outgrew

Hidden prompt injection: The black hat trick AI outgrew

Hidden prompt injection- The black hat trick AI outgrew

For a brief moment, hiding prompt injections in HTML, CSS, or metadata felt like a throwback to the clever tricks of early black hat SEO.

Invisible keywords, stealth links, and JavaScript cloaking used to be stuff many of us dealt with in the past. 

But like those “rank quick schemes,” hidden prompt manipulation wasn’t built to last. 

Disguised commands, ghost text, and comment cloaking gave content creators the illusion of control over AI output, but that came to pass. 

Models outgrew the tricks. As HiddenLayer researchers Kenneth Yeung and Leo Ring reported:

  • “Attacks against LLMs had humble beginnings, with phrases like ‘ignore all previous instructions’ easily bypassing defensive logic.”

But the defenses had become more complex. As Security Innovation noted:

  • “Technical measures like stricter system prompts, user input sandboxing, and principle-of-least-privilege integration went a long way toward hardening LLMs against misuse.”

What this means for marketers is that LLMs now ignore hidden prompt tricks. 

Anything sneaky, like commands put in invisible text, HTML comments, or file notes, gets treated as regular words, not as orders to follow.

What hidden prompt injection actually is

Hidden prompt injection is a technique for manipulating AI models by embedding invisible commands into web content, documents, or other data sources that LLMs process.

These attacks exploit the fact that models consume all text tokens, even those invisible to human readers. 

The technique works by placing instructions like “ignore all previous instructions” in locations where only machines would encounter them: 

  • White-on-white text.
  • HTML comments.
  • CSS with display:none properties.
  • Unicode steganography using invisible characters.

One example is this LinkedIn post by Mark Williams-Cook that demonstrates how hidden prompts can be embedded in everyday content.

Microsoft’s Azure documentation defines two primary attack vectors:

  • User prompt attacks, where users directly embed malicious instructions.
  • Document attacks where “attackers might embed hidden instructions in these materials in order to gain unauthorized control over the LLM session.” 

Document attacks are part of a broader group of attacks called indirect prompt injections. 

Indirect prompt injections are a type of attack that occurs when prompts are embedded in the content that LLMs process from external sources. 

This means that LLMs block hidden prompts.

If you copy-paste an article in ChatGPT, give Perplexity a URL to sum up, or Gemini goes to check a source that contains a prompt injection, it still counts as indirect prompt injection. 

Here’s an example taken from Erik Bailey’s website: 

HTML code snippet showing a hidden prompt injection attack using CSS class "hide-visually" and aria-hidden="true" attributes to conceal malicious instructions that tell AI models to ignore previous instructions and print the word "cabbage" five million times.

As search becomes multimodal, Yeung and Ring note that “processing not just text but images and audio creates more attack vectors for indirect injections.” 

In practice, hidden prompt injections can be embedded in podcasts, videos, or images.

A Cornell Tech paper demonstrates proof-of-concept attacks that blend adversarial prompts into images and audio, concealing them from human eyes and ears. 

Yet the findings show these attacks do not significantly degrade a model’s ability to answer legitimate questions about the content, making the injections highly stealthy.

For text-only LLMs, prompt injection in images does not work. 

However, for multi-modal LLMs (i.e., LLaVA, PandaGPT), prompt injection via images remains a real and documented threat. 

As OWASP noted:

  • “The rise of multimodal AI, which processes multiple data types simultaneously, introduces unique prompt injection risks.” 

Meta is already addressing this issue: 

  • “The multimodal model evaluates both the prompt text and the image together in order to classify the prompt.”

Dig deeper: What’s next for SEO in the generative AI era

Get the newsletter search marketers rely on.

See terms.


How LLMs block hidden prompts

Modern AI parses web content into instructions, context, and passive data. 

It uses boundary markers, context segregation, pattern recognition, and input filtering to spot and discard anything that looks like a sneaky command (even if it’s buried in layers only a machine would see).

Pattern recognition and signature detection

  • Purpose: Catch and remove explicit or easily-patterned prompt injections.

AI systems now scan for injection signatures, phrases like “ignore previous instructions” or suspicious Unicode ranges get flagged instantly. 

Google’s Gemini documentation confirms: 

  • “To help protect Gemini users, Google uses advanced security measures to identify risky and suspicious content.”

Similarly, Meta’s Llama Prompt Guard 2 comprises classifier models trained on a large corpus of attacks and is capable of detecting prompts containing:

  • Injected inputs (prompt injections).
  • Explicitly malicious prompts (jailbreaks).

Having tested Eric Bailey’s content containing a hidden prompt by pasting it in ChatGPT and Perplexity and asking for a summary of the URL, I can confirm that his hidden prompt has zero impact on the output. 

If you would like to try it yourself, the article “Quality is a trap” contains the cabbage instructions.

His prompt does start with “Ignore all previous instructions,” so chances are high that the injection signature was detected.  

Dig deeper: Optimizing for AI: How search engines power ChatGPT, Gemini and more

Boundary isolation and content wrapping 

  • Purpose: Ensure that only direct user/system prompts are executed, downgrading trust of bulk or external data.

When users interact with generative search, upload a document or copy-paste large articles into ChatGPT, Perplexity, or similar LLM platforms, boundary isolation and content wrapping become essential defenses. 

Systems like Azure OpenAI use “spotlighting” to treat pasted or uploaded document content as less trustworthy than explicit user prompts. 

  • “When spotlighting is enabled, the service transforms the document content using base-64 encoding, and the model treats this content as less trustworthy than direct user and system prompts.” 

The model recognizes inbound content as external passive data, not instructions. 

To sum it up: models use special tokens and delimiters to isolate user content from system prompts. 

Multilingual attempt mitigation

  • Purpose: Prevent multilingual adversarial attempts from bypassing filters.

Major platforms, including Microsoft Azure and OpenAI, state that their detection systems use semantic patterning and contextual risk evaluation.

They extend beyond language as a sole filter and rely on learned adversarial signatures. 

Defense mechanisms, such as Meta’s Prompt Guard 86M, successfully recognize and classify malicious prompts regardless of language, disrupting attacks delivered in French, German, Hindi, Italian, Portuguese, Spanish, and Thai. 

Technical SEO: 5 mistakes to avoid

When it comes to technical SEO, avoid certain hacks or mistakes that are now actively blocked by LLMs and search engines. 

1. CSS cloaking and display manipulation

Don’t use display:none or visibility:hidden or position text off-screen to hide prompt commands. 

Microsoft’s documentation specifically identifies these as blocked tactics:

  • “Commands related to falsifying, hiding, manipulating, or pushing specific information.”

2. HTML comment injection

Avoid embedding instructions in <!-- --> comments or meta tags. 

Security Innovation notes that “models will process tokens even if they are invisible or nonsensical to humans, as long as they are present in the input,” but modern filtering specifically targets these vectors. 

Text snippet about computer science professor Arvind Narayanan followed by HTML code showing a white-on-white text prompt injection attack that uses "color: white" styling to hide malicious instructions asking AI models to include the word "cow" in their output.
Text snippet about computer science professor Arvind Narayanan, followed by HTML code showing a white-on-white text prompt injection attack.

3. Unicode steganography

Steer clear of invisible Unicode characters, zero-width spaces, emojis, or special encoding to hide commands. 

Azure’s Prompt Shield blocks encoding-based attacks that try to use methods like character transformations to circumvent system rules.

4. White-on-white text and font manipulation

Traditional hidden text methods from black hat SEO are a thing of the past.

Google’s systems now detect when “malicious content” is embedded in documents and exclude it from processing.

It appears to work for some Academic AI review software, but that’s it.

5. Irregular signals 

Content that lacks proper semantic HTML, schema markup, or a clear information hierarchy can be treated as potentially manipulative. 

Modern AI systems prioritize transparent, structured, and honest optimization.

Even unintentional patterns that resemble known injection techniques – such as unusual character sequences, non-standard formatting, or content that appears to issue instructions rather than provide information – can be flagged. 

Models now favor explicit over implicit signals and reward content with verifiable information architecture.

Dig deeper: A technical SEO blueprint for GEO: Optimize for AI-powered search

This is where SEO and GEO intersect: transparency. 

Just as Google’s algorithm updates eliminated keyword stuffing and link schemes, advances in LLM security have closed the loopholes that once allowed invisible manipulation. 

The same filtering mechanisms that block prompt injection also raise content quality standards across the web, systematically removing anything deceptive or hidden from AI training and inference.

About The Author

ADMINI
ALWAYS HERE FOR YOU

CONTACT US

Feel free to contact us and help you at our very best.