What does OCR Text Recognition not protect?

Frequently asked question for OCR Text Recognition.

What does OCR Text Recognition not protect?

It extracts text only — it does not create a searchable PDF layer. Output quality varies with scan resolution, language selection, and font clarity. Below 70% confidence, expect word-level errors.

What this does not protect

  • Tesseract.js runs in a Web Worker. Each page consumes ~50-100MB of RAM during processing. Documents over 50 pages may cause memory pressure on devices with less than 4GB free.
  • Handwritten text is poorly supported. Tesseract is designed for printed text. Expect less than 30% accuracy on handwriting.
  • Multi-column layouts are partially supported. Tesseract reads left-to-right by default and may interleave columns on complex layouts.
  • Tables are not preserved structurally. Cell contents are extracted as text, but row/column relationships are lost.
  • It cannot fix compromised devices, accounts, or unsafe sharing channels.