Benford's Law

In real-world datasets — expense reports, population figures, tax returns — the leading digit "1" appears about 30% of the time. Not 11%. Not evenly distributed. This logarithmic pattern was discovered by physicist Frank Benford in 1938 and is now standard in forensic accounting.

We measure the Mean Absolute Deviation from the expected distribution using Nigrini's thresholds. The same method the IRS uses.

Duplicate detection

Exact-match row hashing catches the obvious copies. We also flag columns where an unusually high percentage of values are identical — which can indicate placeholder data or incomplete records that got duplicated to fill gaps.

Outlier analysis

A single outlier method always generates false positives. So we run two — Z-score (parametric, assumes normal-ish data) and IQR (non-parametric, doesn't) — and only raise a flag when both methods independently agree a value is unusual. This cuts the noise substantially.

Data integrity

The boring but critical stuff. Missing values and their distribution across columns. Numbers accidentally stored as text strings. Date columns with three different formats mixed together. Invisible leading and trailing whitespace that breaks lookups. These are the errors that silently break downstream analysis.

About your data

Your file is parsed in memory on Cloudflare's edge network, analyzed, and the results are sent back to your browser. The file content is not written to disk, not logged, not stored anywhere. When the response is delivered, the data is gone.