ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

NewOldGuard [he/him, they/them]@hexbear.net · 3 days ago

I think power usage is a legitimate concern still, but on top of that the unreliability is a huge factor. LLMs hallucinate all the time, by design, so if you are using them for anything where it’s important to be correct you are bound for failure. There will always be hallucination lest we overfit the model, but a model that’s overfit with no hallucination just reproduces its training data and therefore has no more functionality than a search engine but with vastly higher energy requirements. These things have applications, but really only for approximating stuff where no other approach could do it well. IMO any other use case is a mistake

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 2 days ago

People are actively working on different approaches to address reliability. One that I like in particular is neurosymbolic type of model where deep neural networks are used to classify data and find patterns, and a symbolic logic engine is used to actually reason about it. This basically gives you the best of both worlds. https://arxiv.org/abs/2305.00813

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance Releases Doubao Large Model 1.5 Pro, Performance Surpassing GPT-4o and Claude3.5Sonnet