• skisnow@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    What’s currently pickling my noggin is how I’ve been seeing “new model smashes benchmarks by an unexpectedly huge factor” headlines every month for the last two years, and yet somehow no matter how many models suddenly score 99% on tasks that they used to score 20% for, I’ve not actually found the damn thing any more helpful or reliable than it was in 2023 for anything real-world. I’m starting to think all these supposed breakthroughs they keep having are being hugely overstated.