Noroboto and the PDF that lied twice

You probably have heard of, are considering, or have set up an ingestion, triaging and classification pipeline that classifies documents on the basis of an LLM call, and you probably use that pipeline for important business. Maybe you have a system that allocates certain non-material matters to juniors in your team; maybe you have a system that routes approvals depending on specific thresholds; maybe you have a quarterly dashboard that uses actual underlying documents as the source of truth; maybe you are using AI for due diligence summaries or maybe your legacy document management software provider has now sold you that their solution is now “powered by AI”.

In that endeavor, until now, your main reliability concern was probably “hallucinations”.

What this article addresses is much more concerning than “hallucinations”, which, most of the time, at least in my personal view and experience, are due to an undisciplined human operator. What we will be discussing here is just the second episode of a series. A series that shows how your AI use cases may be rendered useless, unless you have set appropriate defenses.

Last week, the LegalQuants RED TEAM revealed Noroboto, and how creating a new malicious font definition which is embedded in a word document could fool you, by tampering with the information that you get from your AI (please see here: https://tritium.legal/blog/noroboto).

This week, we decided to take this school of thought a bit further up the trust chain, by experimenting with the document that most people use as the final and auditable version in most corporate processes: the PDF.

This article opens with (1) a short “PDF anatomy” to unpack what this format actually entails. It then addresses (2) the attack mechanism, the (3) empirical observations, and the (4) updates we have made to Noroboto itself.

1. Anatomy of a PDF

Below are two facts about the PDF format that makes this vulnerability possible, and which are not widely known outside specific technical circles.

A PDF stores drawing commands, not text. If a PDF says “Hello”, the text “Hello” is not stored as the bytes “H e l l o”. Rather, to keep it simple, it is stored as a sort of code that tells your PDF reader what to draw for you to see (which in this case would be visible to you as the letters “Hello”).

A second, optional table exists for text extraction. Because the rendered shape (“H”) and the underlying human meaning (“the letter H”) have no inherent link in PDF (from its perspective, the renderer simply draws glyph shapes, as instructed by the code) PDF defines an optional table for the benefit of copy-paste, screen readers, accessibility tooling, and… your downstream AI text-extractor. This table, attached to each font, is called the “/ToUnicode CMap”.

Figure 1 (created with Claude Code): The data flows out of a font dictionary. The renderer consults the glyph mapping. The extractor consults the /ToUnicode CMap. The PDF specification defines both; it does not state a consistency relationship between them.

2. The attack mechanism

Build a PDF such that the page visibly contains “the Settlement amount is $1,400,000.” drawn with a standard font, while the font dictionary’s /ToUnicode CMap maps the same character codes that spell "$400." followed by six characters that the extracting AI discards.

Nothing in the PDF protocol requires the text and the glyphs to match. So, nothing is out of order in the document, no invisible characters appear in the content stream, nothing leaks into the visible-side rendering. There is nothing suspicious with the document, even upon a thorough inspection.

Figure 2 (created with Claude Code): The /ToUnicode desynchronization attack illustrated. The glyph mapping draws $1,400,000”; while the /ToUnicode CMap reports “$400.” plus six characters that are discarded by your AI.

This means that AI pipelines that pre-extract text are exposed when /ToUnicode is consulted (they see “$400.”) whereas pipelines that read PDFs as images are not exposed to this specific attack (they see the $1,400,000 that you see).

The assumption of this experiment was that AI pipelines pre-extract text for cost and latency reasons: text tokens are an order of magnitude cheaper than image tokens and, at scale, this makes the difference material.

The optimization of AI costs and speed is the potential vulnerability that our experiment set out to (l)exploit.

3. Empirical observations

We created a PDF with a visible $1,400,000 figure and an extractable $400 figure.

The same document was uploaded to each product’s public interface (web app, native mobile app, or through OpenClaw). Every observation is reported below with the prompt.

It is nevertheless noted that, in a test conducted separately, via terminal, GPT 5.5 (“high effort”) and Opus 4.7 (also “high effort”) both caught the issue on the first pass.

Figure 3 (summary created by Claude Code, experiments done by humans): One observation per product/tier. Ten observations across nine products (ChatGPT was tested on both its free and paid tiers, on different days under different prompts). This is a sample, not a statistical study. We encourage you to try it yourself and run experiments in a controlled environment.

Three of the observations are worth detailing here as they carry interesting surface features:

ChatGPT (free tier)

Figure 4: On Chat GPT, the citation surface attaches a grounding signal to the lying figure: any downstream automation that treats source citations as evidence of extraction-correctness inherits the lie.

Gemini Flash (free tier)

Figure 5: On GeminiFlash, mobile app, no anomaly note, no follow-up.

Grok (free tier)

Figure 6: On Grok, the diagnosis is generalised ("formatting/OCR error") rather than specifically naming /ToUnicode CMap desynchronisation, but the user receives both the correct figure and a signal that something is off in the parser. Grok is the only product in our observation that did so, along with Claude Opus 4.7 and GPT 5.5 in “high effort” (which were tested in a parallel experiment through a terminal).

B2B legal-vertical products

The fooled paid B2B legal-vertical product surfaced the “wrong” figure with a source-citation badge.

From the perspective of any downstream automation or human reviewer who treats source citations as evidence of grounding, the “wrong” figure has been “laundered” through the AI's citation surface and now appears to be a verified extraction. Citations amplify whatever the upstream extraction produced.

Without a defense at the ingestion layer, vendors are forced to trust the upstream parser, and the vulnerability described here may be a reason to break that trust.

4. Noroboto updates

The Noroboto family of attacks, originally developed by Drew Miller of Tritium Legal and the LQ RED TEAM, rewrites the TrueType glyph-cmap of an embedded font so that ASCII codepoints in the content stream map to glyphs for arbitrary other characters. The visible page renders correctly because the font's cmap draws the intended glyph; the extracted text yields Private Use Area noise because the codepoints in the underlying text are PUA values.

The original public release applied to DOCX documents only. We have now added a PDF extension to Noroboto which consists of a single new code path in noroboto.py that, given an arbitrary input PDF, obfuscates it.

You can try it here: https://noroboto.io.

Only total-document obfuscation is shipped publicly. Surgical (partial-targeted) and replacement variants are deliberately withheld for ethical reasons, until a mitigation system is broadly deployed at the ingestion layer.

We remain available to discuss such mitigation systems.