• SouffleHuman@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    8 hours ago

    Llama 3.1 8B Seems like a pretty weird choice to me, given that it’s already pretty outdated at this point. I know the Qwen team will be launching a new 9B model soon, so maybe they’ll switch to that soonish.

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      3
      ·
      7 hours ago

      I’m guessing they implemented it as proof of concept because it’s a well known model, and has simple architecture. I’m really looking forward to full blown 600+ bln param chips. That’s where shit gets real. And just imagine this stuff applied to robotics. Those Unitree robots with a DeepSeek chip would basically be Star Wars droids.