Rebuild + OCR PDF
Normalize problematic or scanned PDFs at a controlled DPI and rebuild them with a fresh text layer.
What this button does
When you select an Input folder, Output folder and press
“Rebuild + OCR PDF”, DocInspector Pro walks the folder tree,
finds all PDFs and recreates them from images at the chosen DPI, then runs OCR so
the new PDFs stay searchable.
Folder tree & file types
- Recursively scans the selected Input folder for PDF files.
- Computes each file’s relative path inside the Input root.
- Recreates the same subfolder structure under the Output folder.
- Processes only PDFs – other file types are not touched by this action.
Processing pipeline
For each PDF:
- Each page is rendered to an image at the DPI you set in the main GUI.
- Temporary images are stored under the configured TMP folder while processing.
- An OCR engine analyzes the images (typically Romanian + English) and extracts text.
- A new PDF is built by combining the rasterized pages with the OCR text layer.
Resulting output
- Each input PDF is replaced by a “normalized” PDF in the Output tree.
- Page count is preserved; pages become more consistent and easier to render.
- Text is searchable again thanks to the OCR step.