cross-posted from: https://lemmy.world/post/76533

One of the arguments made for Reddit’s API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).

https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/jnk9izp/?context=3

I haven’t seen a whole lot of discussion around this and would like to hear people’s opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?

(I will post my personal opinion in a comment so it can be up/down voted separately)

  • flibbertigibbet@feddit.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think the claim is nonsense. If that were their concern they would rather change the usage agreement and maybe take some of them to court.

    What they actually did is everything in their power to drive mobile users to their mobile app. They want old fashioned user tracking data for advertising and selling on. Together with more in app ads.

  • msage@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Scraping open content is OK. Search engines have been doing that, it’s their main job.

    LLM won’t exist without large inputs, hehe, and the internet is a good source for a big volume of language, most of which can even make sense.

    I don’t feel like Reddit should be against LLMs, ignoring their bogus claims. At least I hope GitHub doesn’t share private and licenced repos.

  • jmk1ng@programming.devM
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    I think Reddit does have a legitimate argument that the scales have tipped and Reddit eating the costs of “whales” abusing their APIs for for-profit use cases without Reddit being compensated at all is fair.

    3P apps using the API at no cost while simultaneously monetizing Reddit’s content by showing their own ads does seem to be taking advantage.

    That said, the way Reddit approached this was so scorched earth and bone headed.

    For example. Reddit gets 10s of millions of dollars in free content moderation services from volunteers. The moderators of all their biggest subreddits rely on 3P moderation tools since Reddit’s are so poor.

    So with the new API policy, they’re asking their unpaid moderators to PAY them for the privilege. It’s such a slap in the face.

    Finally to address the original question, Reddit should absolutely block API consumers who are just training their glorified chat bots to regurgitate plagerized content.