5 Things You Didn't Know About PDF Files

PDFs are everywhere — so common that we take them for granted. But there's more to this format than meets the eye.

1. A PDF can contain executable code

Yes, really. PDFs can include JavaScript that runs when you open the file. While most PDF viewers sandbox this code, it's been exploited in the past for malware delivery. That "innocent" PDF attachment in your email? It could potentially run code on your machine.

2. "Deleted" text might still be there

When you use a black rectangle to "redact" text in a PDF, the original text often stays in the file — just hidden behind the rectangle. Anyone with a basic PDF editor can remove the rectangle and read everything. Proper redaction requires actually removing the text data, not just covering it.

3. PDFs can be over 10 GB

The PDF specification allows files up to 10 GB in size. Engineering firms routinely work with multi-gigabyte PDFs containing detailed technical drawings. DocInspector handles these large files through streaming — processing them without loading everything into memory.

4. Every PDF has a creation fingerprint

Open any PDF's properties and you'll find the exact software that created it, the creation date, modification dates, and often the author's name and organization. This metadata persists through email forwards, cloud sharing, and file copies.

5. A single corrupted byte can make a PDF unreadable

PDFs rely on a cross-reference table at the end of the file to locate all objects. If this table gets corrupted, the entire file becomes unreadable — even if 99.99% of the data is intact. DocInspector's repair module rebuilds this cross-reference table from scratch, recovering otherwise "dead" files.