• chiisana@lemmy.chiisana.net
    link
    fedilink
    arrow-up
    2
    ·
    2 days ago

    Deepseek referred here seems to be v3, not r1. While the linked article didn’t seem to have info on parameter size, fact that they state it is sparse MoE architecture should suggest it is capable to run pretty quick (compared to other models of similar parameter space), so that’s cool.