• loathsome dongeater@lemmygrad.ml
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 day ago

    Does anyone use local LLMs? I don’t use LLMs myself, just the occassional shooting shit with z.ai, but in mainstream discussion local LLMs are almost never brought up except as a potential hedge against the AI bubble bursting by people who has used local models for less than five minutes in their entire lives.

    The hardware requirements make local models unlikely. Everyone who talks about trying (not using) local models seems to have macbook pros. If the bubble bursts, then the future will probably be large open source models that can be vendored by anyone.

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      5
      ·
      1 day ago

      I run local models on a macbook pro incidentally. A 32bln param model can do a lot of useful stuff I find. The progress on making the models smaller and faster has been very rapid, and I fully expect that we’ll get to a point where you’d be able to run the equivalent of current frontier models on a local machine within a few years. On top of that, we see things like ASIC chips being developed that implement the model in hardware. These could become similar to GPU chips you just plug in your computer.

      The tech industry has gone through many cycles of going from mainframe to personal computer over the years. As new tech appears, it requires a huge amount of computing power to run initially. But over time people figure out how to optimize it, hardware matures, and it becomes possible to run this stuff locally. I don’t see why this tech should be any different.

      • CriticalResist8@lemmygrad.ml
        link
        fedilink
        arrow-up
        4
        ·
        1 day ago

        that ASIC chip prototype is pretty impressive. You can try it on https://chatjimmy.ai/ without an account, ask it to write something big like an essay or guide - literally the longest you’ll wait is to get connected to the API, but the answer appears instantly.

        Only limitation right now is they put a small llama 8b model on their chip, but it’s a prototype and proof of concept of course. I’m sure soon China will print a full Deepseek model on such a chip lol.

        Right now there isn’t much interest in making AI more efficient to run but yeah there’s no reason we won’t find advances there. China is already doing a lot to squeeze models into smaller hardware.

        I don’t run LLMs locally because what I’m limited to is not great (especially context size is limited) but the way things are going we will definitely start to see open options open up, I think. If only because academia requires it.

        • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
          link
          fedilink
          arrow-up
          3
          ·
          1 day ago

          That’s what I’m thinking too. There’s no reason why you couldn’t make a chip like this for a full blown Deepseek model, and then when new models come out you just print new chips for them. The really nice part is that their approach doesn’t need DRAM either because the state of each transistor acts as memory, it just needs a bit of SRAM which we don’t have a shortage of.

          I’m fully convinced that the whole AI as a service business model is going to be very short lived. Ultimately, nobody really likes their data going out to some company, and to have to pay subscription fees to use the models. If we start getting these kinds of specialized chips, they’re going to be a game changer.

      • Assian_Candor [comrade/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 day ago

        I was surprised to learn I could run SAM2 locally on an upgraded t470 with 32GB of RAM with acceptable inference times. It made me question the wisdom of shelling out $20 a month for Claude. Especially as usage ramps up an old GPU would probably pay for itself pretty quickly and open up a universe of high quality models for orchestration

        Also in the case of SAM2 the utility of custom tuning can’t be overstated

        • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
          link
          fedilink
          arrow-up
          4
          ·
          1 day ago

          And another aspect is that, at least in the realm of coding, we’re trying to get these models to write code in a way humans do it. But I’d argue that it’s not really an optimal approach because models have different strength. The biggest limitation they have is that they struggle with large contexts, but if given a small and focused task, even small models can handle it well. So, we could move to structuring programs out of small isolated components that can be reasoned about independently. There are already tools like workflow engines that do this sort of stuff, they just never caught on with human coders because they require more ceremony. But I think that viewing a program as a state graph would be a really nice way for humans to be able to tell whether the semantics are correct, and then the LLM could implement each node in the graph as a small isolated task that can be verified fairly easily.

          • Assian_Candor [comrade/them]@hexbear.net
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 day ago

            Love this approach. It’s a unique way of thinking about this and I havent seen it elsewhere.

            Ran across this yesterday not sure if you’ve seen it. A bit esoteric but the author argues for using go instead of Python for LLM coding. Its stuck with me to the point i am thinking of doing a full refactor on a project of mine even though I don’t know go as well. But the bloat of the monorepo from having all these python dependencies is definitely food for thought. Your construct I think takes it a step further which is simplifying the code base for machines

            https://lifelog.my/episode/why-i-vibe-in-go-not-rust-or-python

            • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
              link
              fedilink
              arrow-up
              3
              ·
              1 day ago

              Right, languages can help us provide a lot of guard rails, and Go is a pretty good candidate being a fairly simple language with types keeping the code on track. I’ve played a bit with LLMs writing it, and results seem pretty decent overall. But then there’s the whole architecture layer on top of that, and that seems to be an area that’s largely unexplored right now.

              I think the key is focusing on the contract. The human has to be able to tell that the code is doing what’s intended, and the agent needs clear requirements and fixed context to work in. Breaking the program up into small isolated steps seems like a good approach for getting both these things. You can review the overall logic of the application by examining the graph visually, and then you can check the logic of each step independently without needing a lot of context for what’s happening around it.

              I’ve actually been playing a bit with the idea a bit. Here’s an example of what this looks like in practice. The graph is just a data structure showing how different steps connect to each other:

              and each node is a small bit of code with a spec around its input/output that the LLM has to follow:

              It’s been a fun experiment to play with so far.

    • certified sinonist@lemmygrad.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      I use local models. To me the entire point of AI falls apart unless you can run it independently.

      I just think generating videos and stuff is more exciting and sensational. But if you get any use out of LLMs its a no brained to set one up instead of paying a subscription.