ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 个月前

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

xiaohongshu [none/use name]@hexbear.net · edit-2 3 个月前

I think you have a fundamental misunderstanding of how neural network based LLMs work.

Let’s say you give a prompt of “tell me if capitalism is a good or a bad system”, in a very simplistic sense, what it does is that it will query the words/sentences associated with the words “capitalism” and “good”, as well as “capitalism” and “bad” which it has been trained on from the entire internet’s data, and from there it spews out seemingly coherent sentences and paragraphs about why capitalism is good or bad.

It does not have the capacity to reason or evaluate whether capitalism as an economic system itself is good or bad. These LLMs are instead very powerful statistical models that can reproduce coherent human language based on word associations.

What is groundbreaking about the transformer architecture in natural language processing is that it can allow the network to retain the association memory for far longer than the previous iterations like LSTM, seq2seq etc could, as they would start spewing out garbled text after a few sentences or so because their architectures do not allow memory to be properly retained after a while (vanishing gradient problem). Transformer based models solved that problem and enabled reproduction of entire paragraphs and even essays of seemingly coherent human-like writings because of their strong memory retention capability. Impressive as it is, it does not understand grammatical structures or rules. Train it with a bunch of broken English texts, and it will spew out broken English.

In other words, the output you’re getting from LLMs (“capitalism good or bad?”) are simply word association that it has been trained on from the input collected from the entire internet, not actual thinking coming from its own internal mental framework or a real-world model that could actually comprehend causality and reasoning.

The famous case of Google AI telling people to put glue on their pizza is a good example of this. It can be traced back to a Reddit joke post. The LLM itself doesn’t understand anything, it simply reproduces what it has been trained on. Garbage in, garbage out.

No amount of “neurosymbolic AI” is going to solve the fundamental issue of LLM not being able to understand causality. The “chain of thought” process allows researchers to tweak the model better by understanding the specific path the model arrives at its answer, but it is not remotely comparable to a human going through their thought process.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 个月前

I understand how LLMs work perfectly fine. What you don’t seem to understand is that neurosymbolic AI is a combination of LLMs for parsing inputs and categorizing them with a symbolic logic engine for doing reasoning. If you bothered to actually read the paper I linked you wouldn’t have wasted your time writing this comment.

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance Releases Doubao Large Model 1.5 Pro, Performance Surpassing GPT-4o and Claude3.5Sonnet