Download PDFOpen PDF in browser

Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web

EasyChair Preprint 4649

6 pagesDate: November 25, 2020

Abstract

Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1,212,004,819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845,000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.

Keyphrases: Dark Web, Data Breach, PII, Privacy, Surface Web, data collection

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:4649,
  author    = {Yizhi Liu and Fang Yu Lin and Zara Ahmad-Post and Mohammadreza Ebrahimi and Ning Zhang and James Lee Hu and Jingyu Xin and Weifeng Li and Hsinchun Chen},
  title     = {Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web},
  howpublished = {EasyChair Preprint 4649},
  year      = {EasyChair, 2020}}
Download PDFOpen PDF in browser