Forensic Analyzer

Scan a PDF for identity leaks, tracking vectors, and security risks.

Category: Privacy

Processing: On-device

Quota bucket: Core

Open interactive tool: /tools/analyze

How to use Forensic Analyzer

Drop a PDF into the analyzer. Scanning takes 1-5 seconds depending on file size and embedded image count.
Read the risk badge: Critical (GPS, JavaScript), High (author PII, edit history), Medium (software fingerprints), Low (generic metadata).
Expand each finding category to see raw field values. Author, Creator, Producer, and ModDate are the most common identity leaks.
Check the EXIF section — embedded images may contain GPS coordinates, camera serial numbers, and timestamps independent of the PDF metadata.
Use the direct links to the scrubber, flatten, or privacy pipeline to neutralize what was found. Run the analyzer again on the cleaned output to verify.

Tips

The most common leak: Author field containing your Windows username or company email. This is set by Microsoft Office on every PDF export.
GPS coordinates in embedded JPEG images are the highest-severity finding. A single photo with EXIF GPS can reveal your exact location.
JavaScript in PDFs can phone home when opened. The analyzer flags this as critical because it enables tracking on document open.
Run analyze → scrub → analyze as a verification loop. The second analysis should show zero or near-zero findings.
Software fingerprints (Creator: Microsoft Word, Producer: LibreOffice) reveal what tools you used. This matters for source protection but is low-severity for most users.

Privacy: Your files never leave your browser. All processing runs on-device. Full privacy model

Frequently asked questions

What does Forensic Analyzer change?

It parses the PDF Info Dictionary, XMP metadata, embedded image EXIF headers, and raw byte patterns. Findings are categorized by severity (critical, high, medium, low) and grouped by type (identity, location, tracking, structure).

Is Forensic Analyzer private by default?

Everything runs locally in your browser using pdf-lib for structure parsing and byte-level scanning for EXIF markers. No PDF bytes are uploaded or transmitted. The analyzer runs inside the same privacy boundary as all other tools.

What does Forensic Analyzer not protect?

It detects metadata and structural risks but does not analyze visible text content. A PDF with 'John Smith' written in the body text will not flag that as a finding — only metadata fields like Author, Creator, and EXIF tags are scanned.

Limitations

Analysis is read-only. It does not modify the PDF in any way — use the scrubber for that.
Encrypted or malformed PDFs fall back to byte-level pattern matching. Coverage is reduced but EXIF markers and JavaScript signatures are still detected.
Printer tracking dots (Machine Identification Codes) are not detected in the current analyzer. Use the MIC decoder research tool for that — it requires high-resolution page rendering.
The analyzer checks PDF metadata and embedded image EXIF. It does not analyze text content for PII (names, addresses in the visible text).

Related tools

Remove Metadata from PDF

Strip metadata, forms, annotations, and risky interactive PDF elements. No upload required.

Flatten to Image PDF

Rasterize every page to destroy all hidden structure: fonts, layers, metadata, EXIF, scripts, forms.

Visual Redaction

Draw redaction boxes on pages and burn them irreversibly by rasterizing.

Privacy Pipeline

Chain multiple privacy operations in sequence with preset workflows for maximum protection.