Lab 1 publishes the biggest ever content-level analysis of breached datasets, finding:
- Half of all data breaches contain U.S. Social Security Numbers
- Half of breaches leak bank statements, increasing fraud for employees and customers
- Customer and corporate PII exposed at concerningly high rates, with HR data and customer care data present in 82% and 67% of incidents
- The average attack blast radius has increased by 61% in three years
LONDON, July 22, 2025 (GLOBE NEWSWIRE) — Lab 1, the AI-powered Exposed Data Intelligence platform, today publishes the biggest ever content-level analysis of breached datasets to reveal the monumental risk of fraud to organizations, their employees and customers, with nearly all breached datasets including financial, HR and customer data.
Lab 1 uses AI agents to scrape breached datasets and analyze every file exposed, including unstructured files, like PDFs, emails, spreadsheets, and code files. While typically overlooked in data breach analysis techniques, the information can be leveraged for sophisticated cyberattacks, social engineering attacks, and fraud against companies and their customers.
Analyzing 141 million files leaked in the public domain from 1,297 data breach incidents, the first annual Anatomy of a Breach Report reveals:
Financial documents are exposing companies and their customers to fraud
Financial documents appear in 93% of incidents and account for 41% of all exposed files. Financial sensitive information types were also highly prevalent and reveal how personal data, as well as commercial information, is being leaked into the public domain. Bank statements, which enable identity fraud, were present in 49% of incidents, and IBANs, which can be used for mandate scams and payment redirection, were included in 36% of breached data sets.
Customer and corporate PII exposed in nearly all breaches
Human Resources data – often containing personally identifiable information (PII), payroll and resumes – appeared in 82% of breaches. Two-thirds (67%) involved communications and records concerning customer service interactions and support. Emails were leaked in 86% of all data breaches, the most prevalent exposed sensitive information type, but perhaps most concerningly, half of all incidents analyzed (51%) included U.S. Social Security Numbers.
Exposure of PII can lead to targeted phishing, identity theft, and regulatory violations under laws like GDPR or the FTC Act, opening organisations up to the risk of substantial fines, legal action, and erosion of customer trust.
Unstructured files are exposing new cyberattack avenues
While exposed in a smaller proportion of incidents, cryptographic keys (SSH and RSA Keys) that enable attackers to bypass authentication and access secure systems were present in 18% of all incidents. Cloud and Infrastructure indicators, such as AWS S3 paths and virtual hosts, featured in two-fifths of breaches (20% and 23% respectively), which can facilitate data exfiltration or the discovery of unsecured cloud storage endpoints. Code files, which were exposed in 87% of incidents and account for 17% of all exposed files, also introduce vulnerabilities to the Software Bill of Materials by undermining the integrity and trustworthiness of the software supply chain.
Attack blast radius has increased by 61% in three years
The content-level analysis exposes the full blast radius of organizations implicated in these incidents, many of which may have nth-party relations to the breached company and be unaware of their potential exposure. The median exposure across all breaches analyzed was 482 organizations, an increase of 61% from a median of 257 in 2022 to 414.5 in 2025.
Robin Brattel, Co-founder and CEO, Lab 1 said: “Rather than focus on mega data dumps of structured and primarily credential-based information, we’ve focused on the huge risks associated with unstructured files that often hold high-value information, such as cryptographic keys, customer account data, or sensitive commercial contracts.
“With cybercriminals now behaving like data scientists to unearth these valuable insights to fuel cyberattacks and fraud, unstructured data cannot be ignored. We’ve refined a scientific approach to analyzing unstructured breach contents and today share our findings, which underline the need to move towards a content-aware approach to breach analysis. Ultimately, organizations must understand what information has been leaked, how it can be used, and who might be affected. And faster than it can be used against them.”
Note for the editor
The dataset used in the Anatomy of a Breach Study comprises 141,168,340 individual file records sourced from 1,297 ransomware and data breach incidents, all of which are in the public domain and were reconstructed from forensic acquisitions of compromised systems. A methodology can be found within the full report, here.
About Lab 1
Lab 1 is the first platform to apply AI and data science at scale to identify and analyze exposure to data breaches. Its AI Intel Agent continuously scans breached datasets across the surface, deep, and dark web, extracting and categorizing exposed files. These are safely previewed within the Lab 1 Platform, eliminating the need to download potentially dangerous files for lengthy manual analysis. Organizations receive AI-generated alerts of exposure and summaries of the information revealed, enabling them to understand and act on their exposure quickly and securely. Backed by information security leaders from Goldman Sachs, Credit Suisse, UBS, and Revolut, Lab 1 has already analyzed over 160 million exposed files.
For more information, visit https://lab-1.com/