The Lemmy Club
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 7 days ago

Large-scale online deanonymization with LLMs (using HN posts)

arxiv.org

external-link
message-square
0
link
fedilink
  • cross-posted to:
  • privacy@lemmy.dbzer0.com
  • science@lemmy.world
  • chapotraphouse@hexbear.net
5
external-link

Large-scale online deanonymization with LLMs (using HN posts)

arxiv.org

RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 7 days ago
message-square
0
link
fedilink
  • cross-posted to:
  • privacy@lemmy.dbzer0.com
  • science@lemmy.world
  • chapotraphouse@hexbear.net
Large-scale online deanonymization with LLMs
arxiv.org
external-link
We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

Comments

alert-triangle
You must log in or # to comment.

Hacker News@lemmy.bestiver.se

hackernews@lemmy.bestiver.se

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Source of the RSS Bot

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 537 users / day
  • 2K users / week
  • 3.91K users / month
  • 8.69K users / 6 months
  • 9 local subscribers
  • 4.45K subscribers
  • 15.6K Posts
  • 7.9K Comments
  • Modlog
  • mods:
  • patrick@lemmy.bestiver.se
  • RSS Bot@lemmy.bestiver.se
  • BE: 0.19.15
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org