PDF Repair & Hardening – User Guide

This section explains, for each button in the “PDF Repair and Hardening” group, how folder trees are processed, which file types are touched, and what output you get. Everything here runs on top of your existing folders without changing the originals.

High-level behaviour

All tools in this group work with an Input folder and an Output folder. The suite traverses the Input tree, mirrors the same subfolder structure under Output and processes only the targeted file types (PDF / Office). Other files are either ignored or copied untouched, depending on the button.

🧩
Rebuild + OCR PDF
Normalize problematic or scanned PDFs at a controlled DPI and rebuild them with a fresh text layer.
What this button does
When you select an Input folder, Output folder and press “Rebuild + OCR PDF”, DocInspector Pro walks the folder tree, finds all PDFs and recreates them from images at the chosen DPI, then runs OCR so the new PDFs stay searchable.
Ideal for corrupt PDFs, legacy scans, faxes or files with broken text layers.
Folder tree & file types
  • Recursively scans the selected Input folder for PDF files.
  • Computes each file’s relative path inside the Input root.
  • Recreates the same subfolder structure under the Output folder.
  • Processes only PDFs – other file types are not touched by this action.
Example: C:\cases\Matter01\A\doc1.pdfD:\output\Matter01\A\doc1.pdf.
Processing pipeline
For each PDF:
  1. Each page is rendered to an image at the DPI you set in the main GUI.
  2. Temporary images are stored under the configured TMP folder while processing.
  3. An OCR engine analyzes the images (typically Romanian + English) and extracts text.
  4. A new PDF is built by combining the rasterized pages with the OCR text layer.
DPI, OCR language and job parallelism are controlled from the main app settings.
Resulting output
  • Each input PDF is replaced by a “normalized” PDF in the Output tree.
  • Page count is preserved; pages become more consistent and easier to render.
  • Text is searchable again thanks to the OCR step.
Recommended as a first pass for problematic bundles before any further processing.
🖼️
Flatten to Image-Only PDF
Strip all text and interactive layers and keep only bitmap images of each page.
What this button does
“Flatten to Image-Only PDF” converts every page of each input PDF into an image and rebuilds a new PDF that contains only those images – no searchable text, no form fields, no comments, no hidden objects.
Use when you want to eliminate any hidden or removable layers and keep only what is visible.
Folder tree & file types
  • Recursively scans the Input folder for PDFs.
  • Recreates the same folder structure under Output.
  • Only PDFs are flattened; non-PDF files in the tree are ignored.
Works well for final disclosure sets where you don’t want any live text layer.
Processing pipeline
  1. Each page is rendered to a high-quality image at the selected DPI.
  2. Images are optionally compressed to balance size vs. quality.
  3. A new PDF is assembled from the image sequence only (no OCR step).
Resulting PDFs behave like scanned documents: viewable and printable, but not text-searchable.
Resulting output
  • Output PDFs are fully “flattened” and resistant to most annotation / text-layer tricks.
  • Any redactions baked into images cannot be trivially removed.
  • File size depends heavily on DPI and compression settings chosen.
A good option when security and simplicity are more important than text search.
📄
Office PDF Converter
Convert Word and Excel files to PDFs while preserving the original folder layout.
What this button does
“Office PDF Converter” scans the Input folder for Word and Excel documents and exports them to PDF, writing the resulting PDFs into the mirrored Output tree. Original Office documents remain unchanged.
Helps bring mixed bundles (DOCX, XLSX) into a single, PDF-only evidence set.
Folder tree & file types
  • Recursively enumerates common Office formats (DOC, DOCX, XLS, XLSX).
  • Builds the same relative paths under the Output folder.
  • Existing PDFs or other formats are not altered by this tool.
Example: C:\case\Input\docs\letter1.docxD:\Output\docs\letter1.pdf.
Processing pipeline
  1. Ensures the Output and temporary folders exist.
  2. Opens each document in the appropriate Office application.
  3. Exports or prints the document to PDF using standard export options.
  4. Saves the PDF into the mirrored Output path.
For workbooks with multiple sheets, the behaviour depends on Office export settings (all sheets vs. active sheet).
Resulting output
  • A consistent set of PDFs that match the original Word/Excel files.
  • Same folder layout, ready to be passed into other DocInspector modules.
  • Original Office files stay in place under the Input folder.
Recommended as a preparatory step before running rebuild / flatten / watermark.
🏷️
Add Watermark
Apply consistent text watermarks across a batch of PDFs in one run.
What this button does
“Add Watermark” opens each PDF in the Input tree and overlays a configurable text watermark (for example “CONFIDENTIAL”, a case reference or a recipient name) on every page, writing the result to the Output tree.
Text, position, angle, size and opacity are chosen from the GUI before starting the run.
Folder tree & file types
  • Walks the Input folder recursively and targets PDF files only.
  • Recreates the same subfolders in the Output path.
  • Writes watermarked copies to Output; originals in Input remain unchanged.
Lets you keep a “clean” source bundle and a separate, clearly marked disclosure bundle.
Resulting output
  • All PDFs in the Output tree carry the configured watermark on each page.
  • File names and folder structure are preserved.
  • Watermarks are baked into page content, not just annotations.
Useful when evidence must clearly indicate classification or recipient without editing each file manually.
Back to User Guide overview