Large-scale online deanonymization with LLMs

Homer_Simpson [they/them]@hexbear.net · 6 days ago

Large-scale online deanonymization with LLMs

Llituro [he/him, they/them]@hexbear.net · 6 days ago

you should assume that to sufficiently motivated megacorps, palantir, and the u.s. federal government, your best attempts at online anonymity can probably be circumvented by one failure point or another. i’ve taken relative pains to separate my username from my real life, and i know it’s still not really meaningfully anonymous if the right company wants to figure it out. if they want ya, they got ya. everyone should post accordingly.

chgxvjh [he/him, comrade/them]@hexbear.net · 5 days ago

There isn’t that much traffic on hexbear. ISP could pretty easily time TCP connections to Hexbear with when accounts post stuff.

ComradeRat [he/him, they/them]@hexbear.net · 6 days ago

I always figure theres little point having better opsec than me org (we use google lol 💀)

☂️-@lemmy.ml · edit-2 5 days ago

yes this is known, and not even that new. they way you type here can be linked to the same patterns elsewhere.

and lemmy is a very public forum. some time ago there was news facebook was scraping us, and there is probably more of them doing it.

private encrypted and secure messaging exists for now, but not here and not like this.

chgxvjh [he/him, comrade/them]@hexbear.net · 5 days ago

Was that about Threads adding activitypub?

☂️-@lemmy.ml · 5 days ago

luckily, most good instances defederated from meta.

unluckily, spinning something covert up or scraping the fediverse other ways is perfectly doable.

it leaked they were scraping some instances, but iirc not exactly how they were doing it.

this applies to everything you write though so i dunno how safe we are tbh.

BountifulEggnog [she/her]@hexbear.net · 6 days ago

I wondered about this and had an idea for a (similar but worse) pipeline, very interesting paper.

Wonder when this will be on github and every nerd has a copy running on their computer.

RNAi [he/him]@hexbear.net · 5 days ago

The Gestapo 3.0 will be crowfunded by the worst people you know

quarrk [he/him]@hexbear.net · 5 days ago

Even if it’s possible, don’t make it easy. Too many people post here hyper specific details like “my grandfather with <rareDisease> has an adopted daughter from Bangladesh who became a rural doctor”

Don’t make things too personal here, even if most of us are friendly

electric_nan@lemmy.ml · 5 days ago

Salt your posts with disinformation as well. Mention things about yourself that aren’t true.

Liketearsinrain@lemmy.ml · 5 days ago

This is the way. I do it by having bad takes on purpose.

LeninWeave [none/use name, any]@hexbear.net · edit-2 5 days ago

deleted by creator

Tabitha ☢️[she/her]@hexbear.net · 5 days ago

Stylometric surveillance is already here and does not depend on the target telling the same story twice. I’m not even sure LLMs help the surveillors, but as a community we should investigate more into adversarial stylometry.

TankieTanuki [he/him]@hexbear.net · 6 days ago

How is it possible to validate the results?

BountifulEggnog [she/her]@hexbear.net · edit-2 6 days ago

The paper has several different datasets and explains how they got them, but for their test data they already knew the link existed. I think this one is probably the most relevant for actual attacks. They split accounts, giving a one year gap in their post history to simulate an abandoned account etc and added some fake profiles that didn’t have a match.

If you mean running this yourself, you can’t, they didn’t post prompts or anything. Just an overview of their pipeline. Sorry at first I thought you meant how could they validate that the users were the same person.

TankieTanuki [he/him]@hexbear.net · 6 days ago

Oh I see, they stripped the usernames and matched the comments. I thought they were claiming to have matched usernames to legal identities.

BountifulEggnog [she/her]@hexbear.net · 6 days ago

They did that too, with hackernews and linkedin accounts, as well as some anthropic interviewees. I’m less sure how impressive that is, because the accounts were linked by the owner. So they obviously don’t care about opsec, so they’re probably less careful then they otherwise would be. The paper isn’t a super hard read if you’re interested. Guess we’ll all have to see how well this works in practice.

XxFemboy_Stalin_420_69xX [none/use name]@hexbear.net · 6 days ago

:shocked-pikachu:

SnakeEyes [comrade/them]@hexbear.net · edit-2 5 days ago

That just means we gotta start talking like LLMs so it gets matched to millions of other accounts, no?