A new community rule for language model disclosure

taco_shale032@lemmy.ml · 19 hours ago

A new community rule for language model disclosure

Voxel@feddit.uk · 8 hours ago

I 100% agree with this proposal. I think it’s a good approach.

Off-topic: May I ask why the “promotion” (quite vague) of closed-source software is forbidden?

SuspiciousCarrot78@aussie.zone · edit-2 11 hours ago

OK, seeing you asked for pushback.

TL;DR: Tool disclosure is a poor proxy for doing your own due diligence.

“Forced disclosure about AI use in projects” sits sorta funny for a privacy based group, doesn’t it? Kinda “Papers, please”. Smells bad.

How would you even verify “did this project use an LLM” anyway? If I don’t disclose, what’s the back up, pistols at dawn? Read the code (if available), or get a third party checker…like an LLM? Do you have capacity to audit? Or is it just “trust me, bro” (which if you’re actually concerned about due diligence, isn’t enough).

More to the point: disclosure tag doesn’t change whether the code is accurate, safe or good. Shitty code is shitty either way, so the tag doesn’t touch the actual harm you’re concerned about.

What it does do is create two classes: labeled projects get extra scrutiny, unlabeled ones get a free pass, “no disclosure, must be hand-written, must be fine.” Backwards. Honest disclosure gets tarnished as slop, staying quiet gets rewarded. (Go check !self hosted right now for such an occurrence).

Better footing: assume ALL software in 2026 has had AI assistance, and review it on its merits.

There are better quality signals than hand on bible “are you now or have you ever been” oaths or performative humiliation for the FuckAI crowd.

For what it’s worth, I use an LLM to write code because I’ve got osteoarthritis and typing all day isn’t free. But if you think that means logging into Claude and telling it “make this for me, no mistakes”, you couldn’t be more wrong.

I define the project, I pseudo code it with pen and paper (hurts my hands less) I scope every ticket (yes, I make the llm go thru 3 stage ticket review), I review outputs, I smoke test and I even call in outside reviewers to spot check sometimes. I’m an absolute bastard to it in QA. I do that because when I’m done, I can stand in front of it and honestly tell you I made this, even if my fingers didn’t type most of it. And if it’s fucked, that’s on me, not “hallucinations”.

So, what box do I tick - “AI-assisted”? “Vibe slop”?

That tells you nothing about who’s accountable or how it was made. It carries no nuance and silently resolves to “ignore this one, a robot wrote it,” … which is backwards for projects where the human did more QA than most “fully human” teams ever do.

As always, ICBW and YMMV.

taco_shale032@lemmy.ml · 11 hours ago

How would you even verify “did this project use an LLM”?

There are different ways, checking if a CLAUDE.md, AGENTS.md or SKILLS.md file is present is often enough. Obviously this isn’t bullet proof but it’s better than no disclosure in my opinion.

disclosure tag doesn’t change whether the code is accurate, safe or good.

I didn’t say it has to be a tag, what I had in mind was a simple disclosure in the post description explaining how you used AI for the project (or just a simple “this project is AI assisted” if you dont know the extent, e.g: projects that aren’t yours).

I don’t necessarily have an issue with experienced developers using AI to write the code for them which is what I mean with “when not used correctly”. I do take issue with inexperienced developers that create privacy related software without proper knowledge of what their code actually does (AKA vibe-coding) and going around promoting it as “privacy-friendly” and “secure” while that may not be the case.

Maybe there are better ways to go about this though, which is partly why I created this post.

SuspiciousCarrot78@aussie.zone · edit-2 9 hours ago

Cmon now…leaving Agents.md in the repo is bush-league :)

You can bet your bottom dollar if the claude.md or agents.md hasn’t been added to the gitignore, then it’s -

intentional
actual slop (which you can more easily tell in 2 seconds of looking at the readme.md)

didn’t say it has to be a tag, what I had in mind was a simple disclosure in the post description explaining how you used AI

Same issue before though, be the actual disclosure a tag or a statement.

I do take issue with inexperienced developers that create privacy related software without proper knowledge of what their code actually does (AKA vibe-coding) and going around promoting it as “privacy-friendly” and “secure” while that may not be the case.

Slop is galling for sure but if we’re talking about trust…well…why trust anyone based on what they say (or don’t say)?

“Trust but verify” means I still verify. If the thing is mission critical or important to you, then you SHOULD verify, always. Hell, if the threat profile is high, sandbox it and sniff the packets it sends.

Personally, I think you having to look at the porn I look at is sufficient punishment for snooping on me :)

Some of this is social engineering. “I have nothing I want to show” works even better when I literally can’t (because X isn’t on my phone or Y doesn’t run on my PC)

Maybe there are better ways to go about this though, which is partly why I created this post.

I think so.

Beyond the obvious slop (which is exceedingly obvious), you’re going to waste a lot of cognitive bandwidth trying to sniff out AI.

May as well assume AI is used by default and then do the due diligence on the privacy aspects that are of concern to you.

That holds true whether the project is hand coded or AI assisted. If it’s important, poke it.

Assume all software is “guilty until proven innocent”

But please don’t fall into the FuckAI mindset because llm=bad.

Most devs aren’t going to perform contrition for AI use to appease vocal minority. They’re just not. There’s no up side for them and it reads desperate.

I’m happy to tell you if asked because IDGAF if you use my shit or not. If I’m sharing it, it’s free, open source and shared out of love. I have no brand or portfolio I’m trying to boost. If you can’t see the USP, it’s probably not for you - and that’s fine.

It also usually means I made it for me first, so I’m probably not out to steal bitcoin or nudes. Still, do your own due diligence and poke it. I would.

BlackJerseyGiant@lemmy.world · 17 hours ago

Language is, in addition to being a basis for communication, a set of tools for thought. Each word can enable the contemplation of whole concepts. Try to explain or think about time without using the word time, for example. The AI we have is a map of our langauge, of some of the tools we use for thought. This map is being sold as the territory. Ai is being sold as thought.

At a minimum, privacy requires understanding, feeling, and veracity. AI can provide none of these things, being absent of thought as it is, and as such has no place in this space.

FineCoatMummy@sh.itjust.works · 13 hours ago

One way that LLM harm privacy is through training on, well, everything the tech co can get its hands on. Which can include your posts, and anything you disclosed IN those posts. Not to mention anything you typed into most of the big LLMs on the web.

Once that info is trained into the model, you can’t just go delete it! If it was a file on a disk, in theory you can remove that. OK, sometimes that’s hard in practice, but in theory you can. When it’s baked into model weights, that’s different. You can’t un-bake it into the model!

People have found that commerical LLMs will give back personal info about themselves. Their phone numbers. Where they work. Sometimes even health info, if somehow the model got trained on that! The model does not 1-for-1 recall everything it got trained on. But it does get represented in the model, and sometimes can turn up later, inaccurate or not. LLMs are also good at analyzing unstructured data. So even if you never told your name, but there are enough tidbits to collect, they can de-anonymize people. I read something about that. I will try to find the link and post it if I can.

I do not think LLMs are 100% bad. They have good uses, valid uses. But an ass ton of risks and drawbacks too! I’m not sure society is ready for it. Or ready for more and more social media being bot posts. And those bots becmoing harder and harder to detect.

It’s possible to run some LLMs locally if you have a good GPU. That helps with SOME, not all, just some of the privacy issues. Doesn’t help with many of the other risks tho.

FineCoatMummy@sh.itjust.works · 12 hours ago

I read something about that. I will try to find the link and post

Ha! Found it!

Large-scale online deanonymization with LLMs

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms.