Before you read this, take a look at how to make a PDF searchable.
This article is about how to run OCR on PDFs (or entire folders of PDFs) to make scanned text selectable and searchable. Optical Character Recognition or text recognition, allows for the translation of scanned PDF documents into searchable data.
Note: OCR is only available in Bluebeam Revu eXtreme.
How to use Bluebeam Revu eXtreme’s OCR technology to transform scanned PDFs into text searchable and selectable files
- Go to Document > OCR or press CTRL+SHIFT+O. The OCR dialog box appears.
- Alternatively, go to Batch > OCR.
- The OCR function will also be invoked when the Create PDF from Scanner or Camera function in Revu is used, opening the OCR dialog box automatically.
The active PDF, if any, is automatically added to the process. To add more PDFs, click Add and use one or more of the following methods:
- Files: Adds individual files from a network or local drive. Selecting this option will cause the Open dialog box to appear. Navigate to the appropriate location and select the desired files.
- Open Files: Adds all files currently open in Revu.
- Open Set: Adds all files contained in the current Set.
- Folder: Adds all files in a selected folder on a network or local drive, but not files contained in any of its subfolders. Selecting this option will cause the Select Folder dialog box to appear. Navigate to the desired folder and select it.
- Folder and Subfolders: Adds all files in a selected folder on a network or local drive as well as all files within any of its subfolders. Selecting this option will cause the Select Folder dialog box to appear. Navigate to the desired folder and select it.
To run the process on specific pages only for one or more of the PDFs, select the desired PDF and choose one of the following from its Pages dropdown:
- All Pages: Sets the range to all pages.
- Current: Sets the range to the current page only. The current page number will appear in parentheses, for example, Current (2) if page 2 is the current page.
- Selected: Sets the range to the current selection. This option only appears if pages were selected prior to invoking the command.
- Custom: Sets the range to a custom value.
When this option is selected the list becomes a text box.To enter a custom range:
- Use a dash between page numbers to define those two pages and all pages in between.
- Use a comma to define pages that are separated.
For example: 1-3, 5, 9 will include pages 1, 2, 3, 5 and 9.
Even Pages: Limits the process to only even pages.
Odd Pages: Limits the process to only odd pages.
Landscape Pages: Limits the process to only landscape-oriented pages.
Portrait Pages: Limits the process to only portrait-oriented pages.
Set the OCR configuration Options, as desired:
- Language: Select the languages used by the OCR process. Multiple languages can be used on the same PDF.
- Document Type: Use to optimize the OCR process for the selected document type. The CAD Drawing setting tends to ignore text formatting, for example, while the Text Document setting does not.
- Optimize For: Choose whether to optimize the OCR process for Accuracy of Speed.
- Correct Skew: Enable to correct angular deviations in scanned documents.
- Detect Orientation: Enable to detect the page orientation (90, 180 and 270 degrees) of each page and correct it if needed.
- Detect Vertical Text: Enable to detect text that is oriented vertically.
- Detect Text in Pictures and Drawings: Enable to detect text in graphics.
- Skip Vector Pages: Enable to skip processing of pages with vector content.
Page Chunk Size: Use to determine the maximum number of pages sent to the OCR engine at one time. Increasing chunk size can increase speed, but will also consume more of the computer’s resources.
- Max Vector Size: Use to set the maximum vector size that will be analyzed during the OCR process; any vectors larger than this setting will be discarded in pre-processing. Decreasing this value can increase speed, but might also cause larger text (for example, larger fonts) to be inadvertently ignored.
- Click OK to run OCR.
Drawbacks of using bluebeam OCR
Trying to make a PDF searchable in bluebeam is always going to be hit or miss. Text recognition needs to be set up specifically for the font and character set that you are using for the best results.