• Melllvar@startrek.website
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    7
    ·
    1 year ago

    I’m sympathetic to the NYT, even if it’s not reproducing their IP verbatim.

    AI companies need to acknowledge that their LLMs would be worthless without training data and compensate/credit the sources appropriately.

    • SexyVetra@lemmy.world
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      Thanks for reading the article and addressing the claims instead of making up stuff to be mad about…

      Oh wait… 🙄

    • Urist@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      Large language models are just like humans.

      …humans don’t accidentally plagerize whole articles. They also understand the difference between theft and fair use, and AI has been shown to not respect that distinction multiple times. You can also sue humans for damages when they steal from you. Apparently LLM are immune to legal liability because oopsie poopsie mistakes happen uwu.

      LLMs are cool and useful, but if they’re harming the data sources they wouldn’t exist without, shouldn’t we do something?

      • Axiochus@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        1 year ago

        I’ve been teaching academic writing for the last ten years and would strongly object to your first two assertions 😄

        • Urist@lemmy.blahaj.zone
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Lmao yeah, fair enough.

          Edit: I think the important word is “accidentally “ on that first point. 😉

  • givesomefucks@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    12
    ·
    1 year ago

    It’s not just that it circumvents the paywall, it makes up random nonsense and then claim the NYT said it.

    I’ve never got why people don’t see this about AI. When it “works” it’s just spitting out what a human was paid (Avery low wage) to write, when it has to come up with something that hasn’t been written, it just slaps nonsense together.

    It’s not real AI, it’s just next generation search engines that gives unreliable results.

    You just don’t notice if you don’t already know what you’re asking.

    • Hotzilla@sopuli.xyz
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Even tho these LLM work by just figuring out next word (token) that makes sense, it is still able to generate things that no human has ever written before. It isn’t just copypasting stuff together.

      I use GPT4 daily basis on coding and the way it spills out complex code templates/snippets, which are unique to the problem, is not just not possible without model having some level of intelligence. Of course it hallucinates now and then, but so does most of the coders now and then

  • mindbleach@sh.itjust.works
    link
    fedilink
    arrow-up
    4
    ·
    1 year ago

    Never gonna happen.

    The NYT might win some money based on what Microsoft published, but only to the same extent as if a human wrote that and Microsoft published it. Copyright will never be an issue for training data because training is just scanning text and guessing the next letter. Consuming an entire library to make up anything you ask for is pretty goddamn transformative.

    Oh, does the model know the names of characters in a popular book? So do Google and Wikipedia. Try framing a law that’s cool with Google having a whole searchable plain-text copy of a book, so it can go ‘this book?’ when you search for a quote, but forbids OpenAI from having the essence of that book distilled somewhere in its terabyte of inscrutable numbers.

    This fight is over.