You must log in or # to comment.
Llama 3.1 8B Seems like a pretty weird choice to me, given that it’s already pretty outdated at this point. I know the Qwen team will be launching a new 9B model soon, so maybe they’ll switch to that soonish.
I’m guessing they implemented it as proof of concept because it’s a well known model, and has simple architecture. I’m really looking forward to full blown 600+ bln param chips. That’s where shit gets real. And just imagine this stuff applied to robotics. Those Unitree robots with a DeepSeek chip would basically be Star Wars droids.


