• Saryn@lemmy.world
    link
    fedilink
    arrow-up
    74
    ·
    2 months ago

    Content scraping is harming the information business in ways that could not have been foreseen.

    What an absolute ridiculous thing to say.

    • REDACTED@infosec.pub
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      2
      ·
      2 months ago

      To be fair, the archive indeed got heavily abused into simply reading without paywalls. I know this is a controversial opinion, but seeing comments on other threads like “Remember to support news media”, then “use archive to bypass paywalls” then anger towards said companies for caring about getting paid or growing, makes one question where exactly does Lemmy draw the line between pirating and paid content. Or are we simply altogether against sites like 404Media just because of paywalls?

      • Saryn@lemmy.world
        link
        fedilink
        arrow-up
        13
        ·
        2 months ago

        That’s not the point. The point is content scraping (and crawling) is the cornerstone of the contemporary information environment. It’s how we got to this technological paradigm in the first place.

        This whole “people are bypassing paywalls” is a badly evidenced non-issue, and all too convenient. What these companies are really saying is “Content scraping is bad when others do it. Only I and other big fish get to do it and profit billions out of it. Fuck ordinary citizens. Fuck everyone and everything but me and my dreams of endless wealth and power.”

        To be fair.

        • gagcar@lemmus.org
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          2 months ago

          You say bypassing paywalls is a non-issue, but it is basically the only thing I have heard people say to use it for on social media. You can have your problems about data harvesting, but don’t pretend like getting around paywalls was not what the average individual user was using it for.

  • CombatWombat@feddit.online
    link
    fedilink
    English
    arrow-up
    45
    arrow-down
    1
    ·
    2 months ago

    I’m certain they’ve wanted to do this for a long time, and AI is a convenient way to justify it, rather than admitting they don’t want humans using it to circumvent the paywall. It does solidify for me personally that the LA Times is the paper of record for the United States going forward, rather than the New York Times.

    • gAlienLifeform@lemmy.world
      link
      fedilink
      arrow-up
      16
      ·
      2 months ago

      The LA Times also blocks the Internet Archive unfortunately. I’d recommend PBS NPR ProPublica or some other nonprofit organization for your US paper of record.

      • CombatWombat@feddit.online
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 months ago

        Ugh. Thanks for the heads’ up — I’ve definitely posted archive links without noticing they’re blocked before. PBS and NPR have really gone downhill with the budget cuts. ProPublica is great, but their coverage is pretty narrow, so there’s a lot of stories they don’t cover at all. It’s getting harder and harder to find a quality source.

        • cecinestpasunbot@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          Unfortunately, I think most quality sources with broad coverage aren’t free. Even the paid sources almost always have a corporate bias. Of those the financial times probably does the least to editorialize. Beyond that I think you just have to find independent journalists or outlets with a narrower investigative focus that you can trust.

    • hector@lemmy.today
      link
      fedilink
      arrow-up
      7
      ·
      2 months ago

      I just got a gift subscription to the NYTimes, for the first time since I quit in 2018, and it’s really gone downhill. I am learning about more big scoops from the guardian from lemmy posts than I see in their paper. I think Israel’s final solution for gaza here broke their brain, they had an identity crisis and sided with Israel and fascism over all the fourth estate democracy mumbo jumbo.

      They haven’t broken a single big story that I recall in the past year. Not a single one, even the wall street journal published epstein’s birthday letter from the president. The NYTimes gave up, they are no longer the paper of record, whatever problems before they covered events more thoroughly and had courage to break big stories, and now they don’t.

      • teslekova@sh.itjust.works
        link
        fedilink
        arrow-up
        3
        ·
        2 months ago

        That’s actually pretty sad. Also a serious problem for the USA. NYT, for all its faults, really was the best one.

  • green_goglin
    link
    fedilink
    arrow-up
    35
    ·
    2 months ago

    Nobody tell NYT about being able to add another “.” Subsequent to”.com” to bypass their paywall.

  • tackleberry
    link
    fedilink
    arrow-up
    28
    ·
    2 months ago

    Fuck Reddit. That website has been selling our data and using it to train AI… I say fuck 'em

      • tackleberry
        link
        fedilink
        arrow-up
        4
        ·
        2 months ago

        great catch! you can actually see the AI slop when it pops up. REddit is dead, and you should delete your data from that cesspool

    • Buddahriffic@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      2 months ago

      FYI, any data on Lemmy can be used for the same for free. The federation infra can even be used to give AI models more direct access than even reddit is likely giving them. Just in case anyone is assuming that because this is community-run that it means the data isn’t being sold. It’s not, but it is being accessed by the same entities, if they want it.

        • Buddahriffic@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          2 months ago

          Oh yeah, not saying they are generally equivalent, just in that one particular aspect: access to comment data for any purpose.

          • brucethemoose@lemmy.world
            link
            fedilink
            arrow-up
            3
            ·
            2 months ago

            Yep.

            TBH I think it’s kind of silly for the Fediverse to try and block scraping, as long as that scraping isn’t effectively a DDoS. It’s public.

  • M0oP0o@mander.xyz
    link
    fedilink
    arrow-up
    24
    ·
    2 months ago

    I noticed a few days ago when looking into americans leaving loaded firearms in ovens that we are losing archived news. I would find an article or story that is just missing now, all it is is a headline link to no where. And I have seen this trend on all things, we are losing the knowledge and for no other reason then the possibility of an extra dollar at some point. Take this and mix in the overwhelming amount of LLM generated bullshit pretending to be information tailored to peoples perceived interests (if you live in a religious area for example you see more religious bullshit) and we have almost inescapable silos.

    I don’t think I need to explain how dangerous this is.

    • AlexLost@lemmy.world
      link
      fedilink
      arrow-up
      19
      ·
      2 months ago

      Remember to support your local library. Only physically written words are going to be safe in the coming age.

      • M0oP0o@mander.xyz
        link
        fedilink
        arrow-up
        6
        ·
        2 months ago

        As someone that has been on my local library board… I got bad news for you on that one. Libraries are culling books like never before, facing licensing issues like never before, and funding issues like never before.

  • SpicyLizards@reddthat.com
    link
    fedilink
    arrow-up
    10
    arrow-down
    3
    ·
    2 months ago

    Buuuut they all say that we need to donate to save free speech! It can’t be a lie right?

    Mainly pointing at the guardian here as they are sliding down the same slope that the other two slops did.