• 4 Posts
  • 80 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle














  • And as said they didn’t “train chat GPT on a piracy site” the scraping algorithm put some stuff form there in the training data. There is no person doing that.

    “Your honour my program that I created to slurp up data from the internet using my paid for internet connection, into my AI trained model that I own and control happened to slurp up copyrighted data… I um, it’s not my fault it slurped up copyrighted data even though I put no checks in place for it to check what it was slurping up or from where.”

    That is the argument you are putting forth.

    Do you think any judge/court of law would view that favourably?




  • No it doesn’t, the training data isn’t inside the LLM.

    This is factually incorrect. You can extract the data. How do you think the legal cases are being brought?

    For example

    The model has to contain the data in order to produce works.

    Wholesale commercial copyright infringement where you’re profiting off of others work on a large scale is a whole different ball game.

    They’re training their models on large amounts of pirated content and profiting off it.

    Of course the rights holders are going to say “wait a minute, why are you making money off my content without my permission? And how much of my work did you pirate to use?”

    You cannot hand wave away mass piracy to train their models, and then distribute said models based on an act of mass copyright infringement.

    Do you not understand the basics of the law?

    its idiotic to think that its reasonable to demand such a thing.

    Again, the law is the law. If they mass pirate a bunch of media which then the model contains chunks of they are breaking the law.

    I can’t believe this is a hard concept for someone to understand.