What does OCR Text Recognition not protect?
It extracts text only — it does not create a searchable PDF layer. Output quality varies with scan resolution, language selection, and font clarity. Below 70% confidence, expect word-level errors.
What this does not protect
- Tesseract.js runs in a Web Worker. Each page consumes ~50-100MB of RAM during processing. Documents over 50 pages may cause memory pressure on devices with less than 4GB free.
- Handwritten text is poorly supported. Tesseract is designed for printed text. Expect less than 30% accuracy on handwriting.
- Multi-column layouts are partially supported. Tesseract reads left-to-right by default and may interleave columns on complex layouts.
- Tables are not preserved structurally. Cell contents are extracted as text, but row/column relationships are lost.
- It cannot fix compromised devices, accounts, or unsafe sharing channels.