Judge on Meta’s AI training: “I just don’t understand how that can be fair use”

Nemeski@lemm.ee · 2 days ago

Judge on Meta’s AI training: “I just don’t understand how that can be fair use”

Riskable@programming.dev · edit-2 2 days ago

I’m glad the judge is really only considering the use of AI. Because it’s obviously not copyright infringement to train AI with whatever TF materials you want. It’s the output that matters.

Copyright infringement can’t happen at all until the copyrighted material gets distributed somehow. If you make a thousand copies of a song at home you have not violated copyright law. If—however—you share that song with someone else (or the entire Internet) you just violated the copyright of whoever owned that song (assuming it’s just a regular track and not something licensed in some special way).

Is it ethical to pirate a billion books to train AI? I don’t know. They never really intended for humans to read them. So—to me—it’s not that much different than the “making a thousand copies of a song at home” point. They haven’t deprived the authors of anything at that point.

When the AI is used it may violate an author’s copyright but only if it’s close enough to the original work that a judge would say, “yeah, that’s definitely derivative.”

Randomgal@lemmy.ca · 2 days ago

So you think the people at fault are NOT the billion dollar corporations that stole form all of humanity to create and sell a for-profit product?

But instead the random people who use it?

Are you stupid or just a paid shill?

Riskable@programming.dev · edit-2 1 day ago

Ok let’s get this out of the way: Copying is not the same as stealing. Not in law or ethics.

So let me reword what you wrote to better represent what you’re saying:

So you think the people at fault are NOT the billion dollar corporations that copied much of humanity’s creative works into their servers to create and sell a for-profit product?

Let me ask you this: What is the actual consequence of copying something on to a computer? Loading it into RAM. Performing analysis on it. Doing whatever with that data, internally—without ever sharing it or creating a product or anything like that.

Imagine that AI doesn’t exist yet and some billion dollar company deems it worth their time to archive the entire Internet’s worth of copyrighted content. They don’t distribute. They don’t share it. They don’t even tell anyone.

What is the actual human consequence of that? There’s is none. No one was deprived of anything. They have misrepresented no one. They have not created anything at all. No one is reading it. No one is consuming it. It’s just sitting there—on a billion dollar corporation’s servers.

Now let’s change the scenario slightly: Suddenly Mega Corp decides to use it—internally. To analyse how all this content is related. They look at all the links and references within it in order to figure out how “cheese” related any given bit of content is. They announce the cheese search engine.

Is that a problem? They’re literally storing and indexing all the world’s content on their servers! They didn’t license it! They didn’t ask for permission!

What I’m saying—my argument in it’s purest form—is that it’s the use of the content that matters. How is it used? Is the use depriving someone of something? Do people lose access to cheese because of the existence of the cheese search engine?

Now let’s take it further: Mega Corp decides to transform the cheese data and allow people to request semi-random cheese recipes. Some of these recipes are nearly identical to patented and trademarked cheese products!

Do cheese makers now have a legal right to sue? Do they have an ethical argument to make?

Maybe.

What I’m saying is that merely collecting the data and screwing around with it is irrelevant. It’s not until that data is distributed somehow that matters. Because until that point it’s just bits on a machine somewhere—not impacting anything.

But instead the random people who use it?

Yes! If I make oil paintings for a living and someone asks me to copy someone’s copyrighted work it’s on me to make sure I don’t do that. Now think about it as a copier: Someone walks up to a Xerox machine and copies a book. Do we sue Xerox for providing that capability?

That’s what’s at stake here: Do we treat the AI like the artist or do we treat it like the Xerox machine?