Comparisons of scanned PDFs may contain inconsistencies (2024)

Knowledge

ProfessionalWorkshare suite

ConnectFile sharing and mobility

CompareDocument comparison

ProtectEmail protection

TransactLegal transactions

< Back to search results

«Go Back

Information

StatusDraft
Main

You may find that when you are comparing a scanned PDF, some of the changes identified by the comparison appear illogical or are unexpected. If this happens, it is because Optical Character Recognition (OCR) has been performed on your PDF.

A regular PDF contains text that can be selected, copied and edited. A scanned PDF contains images of content; there’s no actual text content but only images embedded into the PDF file.

To run a comparison on a scanned PDF, the images must first be converted into editable text. This conversion process - OCR - is an imperfect process.

Workshare automatically runs OCR when you select to compare a scanned PDF and uses the converted version of the document for the comparison. This means, that the document Workshare actually compares may not be exactly the same as the document you selected.

Comparisons of scanned PDFs may contain inconsistencies (6)

Shown above, a scanned PDF is selected as the original document. Workshare converts the PDF to a text-based PDF and then runs the comparison using this converted original PDF. You cannot see the converted original PDF. Consequently, the comparison results may not match what you can see in the original and modified documents.

Why this may cause inconsistencies

While the conversion attempts to be as accurate as possible, some content may be converted incorrectly. For example, when the scanned PDF is a document that has been photocopied multiple times or includes hand-written notes. The comparison may indicate that text has been changed, while you can see that the text has not been changed.

Imagine the original document was a scanned rental agreement where the rent had been filled in by hand as £50.00 and the modified document was a regular PDF with the rent as £50.

The OCR process converts the scanned PDF and mistakenly converts the handwritten £50 to 450.00.

Comparisons of scanned PDFs may contain inconsistencies (7)

The comparison shows a change, when you can see there is none. Clicking the change will show that 450 has been deleted and £50 added. This seems very strange.

How to distinguish scanned PDF from a regular PDF?

One way of knowing whether your PDF is a scanned, image-based PDF is to try and select some text. You cannot select text in a scanned PDF, you can only select an area of image. In a regular PDF, you can select and copy text.

Comparisons of scanned PDFs may contain inconsistencies (8)

How side-by-side comparison helps you deal with scanned PDFs

When Workshare compares a scanned PDF, you are notified across the top of your comparison. For example:

Comparisons of scanned PDFs may contain inconsistencies (9)

This alerts you to the fact that OCR has been performed prior to the comparison.

You can review your changes in the usual way – hover over a change to learn more about it. Most of your changes will be accurate but there is a link to an explanation if the results are inconsistent. For example:

Comparisons of scanned PDFs may contain inconsistencies (10)

Remember that OCR is imperfect so comparisons of scanned documents will need more review time. Side-by-side comparison makes it very easy to review both original and modified documents as they are clearly shown in one workspace and stay synchronised as you scroll through them.

AttachmentComparisons of scanned PDFs may contain inconsistencies (11)
Attachment
Attachment
Attachment
Attachment

You may find that when you are comparing a scanned PDF, some of the changes identified by the comparison appear illogical or are unexpected. If this happens, it is because Optical Character Recognition (OCR) has been performed on your PDF.

A regular PDF contains text that can be selected, copied and edited. A scanned PDF contains images of content; there’s no actual text content but only images embedded into the PDF file.

To run a comparison on a scanned PDF, the images must first be converted into editable text. This conversion process - OCR - is an imperfect process.

Workshare automatically runs OCR when you select to compare a scanned PDF and uses the converted version of the document for the comparison. This means, that the document Workshare actually compares may not be exactly the same as the document you selected.

Comparisons of scanned PDFs may contain inconsistencies (12)

Shown above, a scanned PDF is selected as the original document. Workshare converts the PDF to a text-based PDF and then runs the comparison using this converted original PDF. You cannot see the converted original PDF. Consequently, the comparison results may not match what you can see in the original and modified documents.

Why this may cause inconsistencies

While the conversion attempts to be as accurate as possible, some content may be converted incorrectly. For example, when the scanned PDF is a document that has been photocopied multiple times or includes hand-written notes. The comparison may indicate that text has been changed, while you can see that the text has not been changed.

Imagine the original document was a scanned rental agreement where the rent had been filled in by hand as £50.00 and the modified document was a regular PDF with the rent as £50.

The OCR process converts the scanned PDF and mistakenly converts the handwritten £50 to 450.00.

Comparisons of scanned PDFs may contain inconsistencies (13)

The comparison shows a change, when you can see there is none. Clicking the change will show that 450 has been deleted and £50 added. This seems very strange.

How to distinguish scanned PDF from a regular PDF?

One way of knowing whether your PDF is a scanned, image-based PDF is to try and select some text. You cannot select text in a scanned PDF, you can only select an area of image. In a regular PDF, you can select and copy text.

Comparisons of scanned PDFs may contain inconsistencies (14)

How side-by-side comparison helps you deal with scanned PDFs

When Workshare compares a scanned PDF, you are notified across the top of your comparison. For example:

Comparisons of scanned PDFs may contain inconsistencies (15)

This alerts you to the fact that OCR has been performed prior to the comparison.

You can review your changes in the usual way – hover over a change to learn more about it. Most of your changes will be accurate but there is a link to an explanation if the results are inconsistent. For example:

Comparisons of scanned PDFs may contain inconsistencies (16)

Remember that OCR is imperfect so comparisons of scanned documents will need more review time. Side-by-side comparison makes it very easy to review both original and modified documents as they are clearly shown in one workspace and stay synchronised as you scroll through them.

Comparisons of scanned PDFs may contain inconsistencies (17)

Comparisons of scanned PDFs may contain inconsistencies (18)

Comparisons of scanned PDFs may contain inconsistencies (2024)

FAQs

Comparisons of scanned PDFs may contain inconsistencies? ›

You may find that when you are comparing a scanned PDF, some of the changes identified by the comparison appear illogical or are unexpected. If this happens, it is because Optical Character Recognition (OCR) has been performed on your PDF. A regular PDF contains text that can be selected, copied and edited.

Is it possible to compare two PDF documents for differences? ›

Open the two versions of a file that you want to compare and then from the All tools menu, select Compare files. To select another version for the old file or new file, select Change File and then select the desired version.

How to compare scanned PDF files? ›

How to compare PDF files:
  1. Open Acrobat for Mac or PC and choose “Tools” > “Compare Files.”
  2. Click “Select File” at left to choose the older file version you want to compare.
  3. Click “Select File” at right to choose the newer file version you want to compare.
  4. Click the Compare button.
  5. Review the Compare Results summary.

What is the difference between a scanned PDF and a normal PDF? ›

For a scanned page, you will get a blurry image as soon at the resolution rate has been excedeed. On the contrary, for a native PDF, the graphics, vector-based, will remain smooth at any zoom level. The text in particular remains perfectly drawn.

What is the difference between a true PDF and an OCR PDF? ›

During the OCR process, the software program interprets each character on the image as text and adds a text layer to the image layer. Made-searchable PDFs are like True PDFs, but the searchability of the OCRed document will depend on the quality of the image, or the recognizability of the writing.

How to compare two documents for differences? ›

How to compare documents in Word
  1. Open a new document in Word.
  2. In the upper ribbon, click Review.
  3. Click Compare, and then click Compare Documents.
  4. In the Compare Documents window that appears, update the following fields: Original document: Choose the first document you want to use in your comparison. ...
  5. Click OK.
Aug 17, 2023

How to compare two PDF documents for differences free? ›

How to compare PDFs
  1. Choose or drop the two PDFs that you would like to compare.
  2. Click on 'Compare' below.
  3. After a few seconds the differences of the two files will be displayed.
  4. Select the view type to compare the files side by side or inline.

Can you improve quality of a scanned PDF? ›

The best and easiest way to sharpen a PDF image is to simply scan the original document again. Often, blurry pages result from scanning errors, such as a bump to the machine or a dirty scanning plate. No amount of image editing and noise reduction will ever make such an image resolve more clearly.

Can you tell if a scanned PDF has been edited? ›

If you go to the document properties of a PDF file (control or command d), if the proper metadata is available, it will list the creation date and time and modified date and time. This can help you determine if a pdf file has been modified since creation.

How do I convert a scanned PDF to a normal PDF? ›

Open the scanned PDF file in Acrobat. From the All tools menu, select Edit a PDF. Acrobat automatically applies OCR to your document and converts it to a fully editable PDF copy.

How to tell if a scanned document has been altered? ›

Under the 'Description' tab there will be the date and time the document is created and the date and time it was last modified. This information should be the same if the PDF has not been modified after creation. If the file is a scanned document which has been emailed to you, look for discrepancies in the fonts.

How do you tell if a PDF is PDF A compliant? ›

To determine whether an input PDF document is PDF/A-compliant, ensure that the DDX document contains the PDFAValidation element within a DocumentInformation element. The PDFAValidation element instructs the Assembler service to return an XML document that specifies whether the input PDF document is PDF/A-compliant.

How can I tell if a PDF has OCR? ›

How Do I Know if a PDF has OCR Functionality? There are several ways to check whether your PDF has OCR functionality. Open the PDF and check whether you can search for a word in the file or whether you can select any of the text. If you cannot search in the PDF or select text, it is probably just a scanned image.

Can you compare two PDF documents in Adobe Reader? ›

The Compare Files tool helps you quickly and accurately detect differences between two versions of a PDF. You can compare documents in a side-by-side view or choose single page view to review all changes in your latest PDF document.

What is the AI that compares PDF files? ›

When you need to compare two documents side by side, iDox.ai is the fastest and simplest way to do it. Just upload your PDF or Word documents and see detailed differences in just seconds! With our platform, you can compare any two versions of your document to ensure that the final is what you expect.

How to compare two PDF documents and highlight differences online? ›

Use the file selection boxes at the top of the page to select the files you want to compare. Change the settings if necessary. Start the comparison by pressing the corresponding button. PDF24 then processes the files and displays the result so that you can see the differences.

Top Articles
Latest Posts
Article information

Author: Virgilio Hermann JD

Last Updated:

Views: 6238

Rating: 4 / 5 (41 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.