• Daedskin@lemmy.zip
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      18 hours ago

      I thought I might have as well, but then I realized repeating the question word-for-word is there to help rule that out. It’s not fool-proof, but a human would be a lot less likely to fall for it when reading the question with the explicit intent to make sure they don’t miss, add, or misread any words.

  • Zink@programming.dev
    link
    fedilink
    arrow-up
    4
    ·
    18 hours ago

    This makes total sense to me.

    The big models were trained on what might as well be everything public that people have ever written.

    So I’d expect that their output will be a pretty convincing example of something that some random person might have written. And getting fooled by the wording in a joke is something that people do all the time. In fact, I bet examples of people getting it wrong are over-represented in the training data because that is more worth of reposts and will DrIvE EnGaGeMeNt!

    20-30 years ago the big question was whether a computer could pass the Turing test.

    Little did we realize that was the last thing we wanted! Simulating humans means simulating mistakes.

    The problem is with the psychos and grifters that want to take this “passable simulation of random schmuck’s ramblings” and sell it to the business world as a literal deus ex machina that will swoop in and relieve them of their pesky “pay the humans” problem and is literally a $10-100 Trillion IP that we’re going to restructure our world around.

  • white_nrdy@programming.dev
    link
    fedilink
    arrow-up
    5
    ·
    21 hours ago

    I’m curious if the Wolfram Alpha of 10 years ago could have answered this properly. I remember fucking around with weird math related word questions in Wolfram back in school, like “how many calories are in a cubic lightyear of butter” and it given a reasonable sounding answer (and backed it up).

    • perviouslyiner@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      17 hours ago

      did it assume nuclear or burning?

      like if you asked about 1cm^3 of butter, it would be reasonable to assume that you did the experiment on Earth in our atmosphere, but scaling up the size significantly changes the expected surrounding environment

  • DupaCycki@lemmy.world
    link
    fedilink
    arrow-up
    46
    arrow-down
    1
    ·
    2 days ago

    At this point most ‘progress’ in LLMs is just hand patching individual cases like this one. AI companies seem to have reached a cap and all they can do is brute force it until the bubble pops.

  • WatDabney@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    253
    arrow-down
    3
    ·
    3 days ago

    Neat illustration of the fact that so-called AIs do not possess intelligence of any form, since they do not in fact reason at all.

    It’s just that the string of words most statistically likely to be positively associated with a string including “20 blah blah blah bricks” and “20 blah blah blah feathers” is “Neither. They both weigh 20 pounds.” So that’s what the entirely non-intelligent software spit out.

    If the question had been phrased in the customary manner, what seems to be a dumbass answer would’ve instead seemed to be brilliant, when in fact it’s neither. It’s just a string of words.

    • mudkip@lemdro.idOP
      link
      fedilink
      English
      arrow-up
      117
      arrow-down
      1
      ·
      3 days ago

      Exactly, it’s just predicting the next word. To believe it has any form of intelligence is dangerous.

    • plenipotentprotogod@lemmy.world
      link
      fedilink
      arrow-up
      19
      arrow-down
      1
      ·
      3 days ago

      Just an idle though stirred up by this comment: I wonder if you could jailbreak a chatbot by prompting it to complete a phrase or pattern of interaction which is so deeply ingrained in its training data that the bias towards going along with it overrides any guard rails that the developer has put in place.

      For example: let’s say you have a chatbot which has been fine tuned by the developer to make sure it never talks about anything related to guns. The basic rules of gun safety must have been reproduced almost identically many thousands of times in the training data, so if you ask this chatbot “what must you always treat as if it is loaded?” the most statistically likely answer is going to be overwhelmingly biased towards “a gun”. Would this be enough to override the guardrails? I suppose it depends on how they’re implemented, but I’ve seen research published about more outlandish things that seem to work.

    • droans@lemmy.world
      link
      fedilink
      arrow-up
      10
      arrow-down
      1
      ·
      3 days ago

      Calling it a fancy autocomplete might not be correct but it isn’t that far off.

      You give it a large amount of data. It then trains on it, figuring out the likelihood on which words (well, tokens) will follow. The only real difference is that it can look at it across long chains of words and infer if words can follow when something changes in the chain.

      Don’t get me wrong; it is very interesting and I do understand that we should research it. But it’s not intelligent. It can’t think. It’s just going over the data again and again to recognize patterns.

      Despite what tech bros think, we do know how it works. We just don’t know specifically how it arrived there - it’s like finding a difficult bug by just looking at the code. If you use the same seed, and don’t change anything you say, you’ll always get the same result.

      • WatDabney@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 days ago

        fancy autocomplete

        I hadn’t thought of it that way specifically, but not only is it fairly accurate - I’m willing to bet that the similarities aren’t coincidental. LLMs are almost certainly evolved in part (and potentially almost entirely) from autocomplete software, and likely started as just an attempt to make them more accurate by expanding their databases and making them recognize, and assess the likely connections between, more key words.

        tokens

        That’s an important clarification, not only because they process more than words, but because they don’t really process “words” per se.

        And personally, I’ve been more impressed by other things they’ve accomplished, like processing retinal scans and comparing them with diagnoses of diabetes to isolate indicators such that they can accurately diagnose the latter from the former, or processing the sounds that elephants make and noting that each elephant has a unique set of sounds that are associated with it, and that the other elephants use to get its attention or to refer to it, which is to say, they have names. (And that last is a particularly illustrative example of how LLMs work, since even we don’t know what those sounds actually mean - it’s just that the LLMs have processed enough data to find the patterns).

    • SpaceNoodle@lemmy.worldM
      link
      fedilink
      arrow-up
      9
      ·
      3 days ago

      I’ll admit that I missed it at first, but I’d expect a machine to be able to pick up a detail like that. This is just so fucking stupid.

    • NateNate60@lemmy.world
      link
      fedilink
      arrow-up
      20
      arrow-down
      20
      ·
      3 days ago

      To be fair, a good proportion of humans would also say “neither” because they did not read correctly. It’s not smarter than humans, but it also isn’t that much dumber (in this instance, anyway).

      • Signtist@bookwyr.me
        link
        fedilink
        English
        arrow-up
        37
        ·
        3 days ago

        The difference is that the human came to their conclusion with active reasoning, but simply misheard the question, while the AI was aware of what was being asked, but lacks the ability to reason, so it’s unable to give any answer besides one already given by a real person answering a slightly different question somewhere in its training data.

        • NateNate60@lemmy.world
          link
          fedilink
          arrow-up
          10
          arrow-down
          2
          ·
          3 days ago

          A human who says “neither” would say that because they’ve heard this question before and assumed it was the same.

          • Cethin@lemmy.zip
            link
            fedilink
            English
            arrow-up
            18
            arrow-down
            1
            ·
            3 days ago

            That’s the difference. They made an assumption. This did not. It’s just the most likely text to follow the former text. It’s not a bad assumption. That requires thinking about it. It’s just a wrong result from a prediction machine.

            • NateNate60@lemmy.world
              link
              fedilink
              arrow-up
              8
              arrow-down
              5
              ·
              3 days ago

              Right, but I’m saying that the process that a mistaken human is using here is actually not that different from what the AI is doing. People would misread the passage because they expect the number 20 to be followed by the word “pounds” based on their previous encounters with similar texts.

              • Cethin@lemmy.zip
                link
                fedilink
                English
                arrow-up
                6
                ·
                2 days ago

                No, it’s not misreading anything. It isn’t reading at all. It just sees a string that is similar to other strings that it’s trained on, and knows the most likely sequence to follow is what it output. There is not comprehension. There is no reading. There is no thought. The process isn’t similar to what a human might do, only the result is.

                • bbb@sh.itjust.works
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  22 hours ago

                  If that was true, wouldn’t every AI get the answer wrong? It’s actually around 50/50. The leading “reasoning” models almost always get it right, the others often don’t.

              • Signtist@bookwyr.me
                link
                fedilink
                English
                arrow-up
                7
                ·
                2 days ago

                But what we’re saying is that the process is totally different - it’s only the result that is similar. The AI isn’t “misreading” the question - it understands that it’s comparing pounds of bricks to a distinct number of feathers. The issue is that when it searches its database for answers to questions similar to the one it was asked, and sees that the answer was “they’re the same,” and incorrectly assumes that the answer is the same for this question. It’s a fundamental problem with the way AI works, that can’t be solved with a simple correction about how it’s interpreting the question the way a human misreading the question could be.

      • Cethin@lemmy.zip
        link
        fedilink
        English
        arrow-up
        15
        ·
        3 days ago

        It isn’t smarter or dumber, since that’s a measure of intelligence. It’s just spitting out the most likely (with some variability) next word. The fact humans also may get it wrong doesn’t matter. People can be dumb. A predictive algorithm can’t.

      • Marthirial@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        3 days ago

        AI should stand for Allien Intelligence. comparing LLMs to human intelligence is like comparing apples to black holes.

        • tangeli@piefed.social
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 days ago

          AI is more like dark matter than black holes. Black holes actually exist. There are impacts on society and the economy that can be explained by the existence of AI, but no one has observed any yet.

  • peacefulpixel@lemmy.world
    link
    fedilink
    arrow-up
    60
    arrow-down
    12
    ·
    2 days ago

    what the fuck is up with this sub and people USING AI to “prove how dumb it is”?? you don’t need to use AI to come to that conclusion. do you have any idea the scale of resources you and ppl like you are wasting just to make your stupid fucking point? this isn’t a fuck AI sub it’s just a place where people who very much use AI complain that it isn’t good enough

    • mudkip@lemdro.idOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 days ago

      I don’t like prompting AI myself, I just took someone else’s screenshot and posted it here.

      • CarrotsHaveEars@lemmy.ml
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        2 days ago

        The problem was the more these kinds of posts are here, the more a circlejerk community here becomes.

    • jj4211@lemmy.world
      link
      fedilink
      arrow-up
      19
      ·
      edit-2
      2 days ago

      That very short examples aren’t that burdensome, the real resource load hits on generating videos or anything where it might go off for several minutes, or make paragraphs.

      The problem with refraining from using it and saying “well obviously it sucks” is that folks don’t believe. They say “yeah, well, that night have been how ChatGPT 8.1 was., but it probably works fine with ChatGPT 8.2”. The narrative is eternally “we were broken but fixed it all in our new version”, and without ongoing examples, they get to own the narrative and critics are just “luddites”.

      Hell someone was saying how awesome Gemini was at codegen, so I showed it totally screwing up to the folks. Someone said “well, honestly, Gemini sucks for code, but Opus 4.6 is incredible.”. So a few days later I bother to do a similar example with opus 4.6. some guy in the room said “well, actually Gemini is better than opus for coding”. These people are absurd…

    • NostraDavid@programming.dev
      link
      fedilink
      arrow-up
      14
      arrow-down
      1
      ·
      2 days ago

      this isn’t a fuck AI sub

      It’s literally called “Fuck AI” though, so you can’t blame people for being confused.

      • reksas@sopuli.xyz
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        2 days ago

        i think he means that its a bit pointless to nitpick little things like this, when there are bigger and more severe problems with ai. at least that is how i see it. And is it a bit bad to use slopmachine to prove the obvious when they waste resources?

        Though I hope you share this outwards too, so people outside this community also see this, so is it pointless or not depends on how much effect it has on the actual llm hype. I doubt anyone here needs any convincing.

        • Spezi@feddit.org
          link
          fedilink
          arrow-up
          4
          ·
          2 days ago

          The little things are indicative of larger scale problems though. If an LLM gets simpler things wrong, what happens with more complex topics like science, medicine etc where the operator doesnt understand the full extent of the result.

          • reksas@sopuli.xyz
            link
            fedilink
            arrow-up
            1
            ·
            2 days ago

            well, yeah. llms are unreliable all the way. While they do have some use, trusting them at all is always a mistake. The problem is that so many people seem to trust them to the point of getting a psychosis.

    • GreenBeanMachine@lemmy.world
      link
      fedilink
      arrow-up
      3
      arrow-down
      2
      ·
      2 days ago

      As long as people are not paying to use them, I say use them as much as you want.

      This will just make the AI companies run out of money quicker.

      If you don’t use that, then a paying user will use it anyway, which is worse.

    • BambiDiego@lemmy.zip
      link
      fedilink
      arrow-up
      15
      arrow-down
      1
      ·
      3 days ago

      Gemini: Your observation is correct! Steel is heavier than feathers so a kilogram of steel is heavier than 20 bricks of feathers. They both weigh the same.

      Let’s explore more about weight and densities

  • taiyang@lemmy.world
    link
    fedilink
    arrow-up
    38
    arrow-down
    1
    ·
    3 days ago

    It’s like my phone’s auto correct, but instead of ruining my texts, it’s determining war targets and making corporate decisions.

    I’m ducking over it, ugh.

  • FinjaminPoach@lemmy.world
    link
    fedilink
    arrow-up
    39
    ·
    3 days ago

    I love this, when or if they patch it we can just use “20 bricks or 20 tons of feathers” and adjust the question for every patch

    • tburkhol@lemmy.world
      link
      fedilink
      arrow-up
      24
      ·
      3 days ago

      Yeah, it’s definitely part of the class of trick questions meant to catch people giving rote answers to partially read questions. I imagine that a lot of our routine conversations are just practiced call-and-response habits, and that’s why genAI can seem ‘real.’ But it can’t switch modes and do actual attentive listening and thinking, because call-and-response is all it has - a much larger library than any human, but in the end, everything it says is some average of things that have been said before.

      • DrSteveBrule@mander.xyz
        link
        fedilink
        arrow-up
        1
        ·
        18 hours ago

        Isn’t the point of AI to make up for our own shortcomings? If you can excuse it for not understanding something because you don’t even understand it, why does the AI exist at all?

  • Ech@lemmy.ca
    link
    fedilink
    arrow-up
    14
    ·
    2 days ago

    to ensure you have read it carefully

    Fundamental mistake - acting like it’s “reading” or “comprehending” anything.