cross-posted from: https://lemmy.world/post/76533

One of the arguments made for Reddit’s API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).

https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/jnk9izp/?context=3

I haven’t seen a whole lot of discussion around this and would like to hear people’s opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?

(I will post my personal opinion in a comment so it can be up/down voted separately)

  • msage@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Scraping open content is OK. Search engines have been doing that, it’s their main job.

    LLM won’t exist without large inputs, hehe, and the internet is a good source for a big volume of language, most of which can even make sense.

    I don’t feel like Reddit should be against LLMs, ignoring their bogus claims. At least I hope GitHub doesn’t share private and licenced repos.