ByteDance officially launches its latest Doubao large model 1.5 Pro (Doubao-1.5-pro), which demonstrates outstanding comprehensive capabilities in various fields, successfully surpassing the well-known GPT-4o and Claude3.5Sonnet in the industry. The release of this model marks an important step forward for ByteDance in the field of artificial intelligence. Doubao 1.5 Pro adopts a novel sparse MoE (Mixture of Experts) architecture, utilizing a smaller set of activation parameters for pre-training. This design's innovation...
how do you measure performance of an llm? ask it how many 'r’s there are in ‘strawberry’ and how many times you have to say ‘no thats wrong’ until it gets 3
Basically speed and power usage to process a query. Also, there’s been tangible progress in doing reasoning with unsupervised learning seen in DeepSeek R1 and approaches such as neurosymbolics. These types of models can actually explain the steps they take to arrive at the answer, and you can correct them.
I suspect “reasoning” models are just taking advantage of the law of averages. You could get much better results from prior llms if you provided plenty of context in your prompt. In doing so you would constrain the range of possible outputs which helps to reduce “hallucinations”. You could even use llms to produce that context for you. To me it seems like reasoning models are just trained to do that all in one go.
Neurosymbolic models use symbolic logic to do the reasoning on the data that’s parsed and classified using a deep neural network. If you’re interested in how this works in detail, this is a good paper https://arxiv.org/abs/2305.00813
I appreciate the link but I stand by my point. As far as I’m aware, “reasoning” models like R1 and O3 are not architecturally very different from Deepseek v3 or GPT4, which have already integrated some of the features mentioned in that paper.
Also as an aside, I really despise how compsci researchers and the tech sector borrow language from neuroscience. They take concepts they don’t fully understand and then use them in obscenely reductive ways. It ends up heavily obscuring how LLMs function and what their limitations are. They of course can’t speak plainly about these things otherwise the financial house of cards built up around LLMs would collapse. As such, I guess we’re just condemned to live in the fever dreams of tech entrepreneurs who are at their core are used car salesmen with god complexes.
Don’t get me wrong, LLMs and other kinds of deep generative models are useful in some contexts. It’s just their utility is not at all commensurate with the absurd amount of resources expended to create them.
The way to look at models like R1 is as layers on top of the LLM architecture. We’ve basically hit a limit of what generative models can do on their own, and now research is branching out in new directions to supplement what the GPT architecture is good at doing.
The potential here is that these kinds of systems will be able to do tasks that fundamentally could not be automated previously. Given that, I think it’s odd to say that the utility is not commensurate with the effort being invested into pursuing this goal. Making this work would effectively be a new industrial revolution. The reality is that we don’t actually know what’s possible, but the rate of progress so far has been absolutely stunning.
R1 has an identical architecture to v3 though right? They just used reinforcement learning to fine tune the base model. There are no extra layers, just a few additional steps in its production.
The potential here is that these kinds of systems will be able to do tasks that fundamentally could not be automated previously.
Sure but the technology has honestly been a bit more evolutionary than revolutionary as far as I’m concerned. The biggest change was the amount of compute and data used to train these models. That only really happened because it seems capital had nowhere else to go and not because LLMs are uniquely promising.
Making this work would effectively be a new industrial revolution.
Sure, but how exactly are the companies investing in “AI” going to make it work? To me it just seems like they’re dumping resources into a dead end because they have no other path forward. Tech companies have been promising a new Industrial Revolution since their inception. However, even their “AI” products have yet to have a meaningful impact on worker productivity. It’s worth interrogating why that is.
As I stated before, I think they all fundamentally misunderstand how human cognition works, perhaps willfully. That’s why I’m confident tech companies as they exist will not deliver on the promise of “AGI”, a lovely marketing term created to make up for the fact that their “AIs” are not very intelligent.
You’re right that R1 does the tuning up front as opposed to dynamically, but I’d still consider that a layer on top of the base LLM.
Sure but the technology has honestly been a bit more evolutionary than revolutionary as far as I’m concerned. The biggest change was the amount of compute and data used to train these models. That only really happened because it seems capital had nowhere else to go and not because LLMs are uniquely promising.
I’m not suggesting LLMs are uniquely promising, it’s just the approach that’s currently popular and we don’t know how far we can push it yet. What’s appealing about GPT architecture is that it appears to be fairly general and adaptable in many domains. However, I do think it will end up being combined with other approaches going forward. We’re already seeing that happening with stuff like neurosymbolic architecture.
My main point is that the limitations of the approach that people keep fixating on don’t appear to be inherent in the way the algorithm works, they’re just an artifact of people still figuring out how to apply this algorithm in an efficient way. The fact that massive improvements have already been found suggests that there’s probably a while yet before we run out of ideas.
Sure, but how exactly are the companies investing in “AI” going to make it work? To me it just seems like they’re dumping resources into a dead end because they have no other path forward. Tech companies have been promising a new Industrial Revolution since their inception. However, even their “AI” products have yet to have a meaningful impact on worker productivity. It’s worth interrogating why that is.
I don’t really care about AI companies myself. I want to see open source projects like DeepSeek and ultimately state level funding which we’ll likely see happening in China. It’s also a fallacy to extrapolate from the fact that something hasn’t happened that it won’t happen. Companies often hype and overpromise, but that doesn’t mean that the goals themselves aren’t achievable.
As I stated before, I think they all fundamentally misunderstand how human cognition works, perhaps willfully. That’s why I’m confident tech companies as they exist will not deliver on the promise of “AGI”, a lovely marketing term created to make up for the fact that their “AIs” are not very intelligent.
Again, I agree that companies like OpenAI are largely hype driven. However, some people do make a genuine effort to understand how human cognition works. For example, Jeff Hawkins did a good effort exploring this topic with his On Intelligence book. The impression I get with DeepSeek is that their goal is to largely do research for the sake of research, and they’ve actually stated that they’re not looking for commercial application as their primary goal right now. I think that exploration for the sake of exploration is the correct view to have here.
The impression I get with DeepSeek is that their goal is to largely do research for the sake of research.
I think it’s not fair to call DeepSeek open source. They’ve released the weights of their model but that’s all. The code they used to train it and the training data itself is decidedly not open source. They aren’t the only company to release their weights either. Meta’s LlaMa was probably the best open weight model you could use prior to DS v3. As I see it, this is just a consequence of competition in a market where capital has nowhere else to go. Meta and DeepSeek likely want to prevent OpenAI from becoming profitable.
As an aside, although I personally believe in some aspects of China’s reform and opening up it’s not without its faults. Tech companies in China often make the same absurd claims and engage in behavior that’s as deluded as companies in Silicon Valley.
My main point is that the limitations of the approach that people keep fixating on don’t appear to be inherent in the way the algorithm works, they’re just an artifact of people still figuring out how to apply this algorithm in an efficient way. The fact that massive improvements have already been found suggests that there’s probably a while yet before we run out of ideas
I think this is our core disagreement. I agree, we have not pushed LLMs to their absolute limit. Mixture of Experts models, optimized training, and “reasoning models” are all incremental improvements over the previous generation of LLMs. That said, I strongly believe that the architecture of LLMs are fundamentally incapable of intelligent behavior. They’re more like a photograph of intelligence than the real thing.
I think that exploration for the sake of exploration is the correct view to have here.
I agree wholeheartedly. However, you don’t need to dump an absurd amount of resources into training an llm to test the viability of any of the incremental improvements that DeepSeek has made. You only do that if your goal is to compete with OpenAI and others for access to capital.
However, some people do make a genuine effort to understand how human cognition works.
Yes, but that work largely goes unnoticed because it’s not at all close to providing us with a way to build intelligent machines. It’s work that can only really happen at academic or public research institutions because it’s not profitable at this stage. I would be much happier if the capital currently directed towards LLMs was redirected towards this type of work. Unfortunately, we’re forced to abide by the dictates of capitalism and so that won’t happen anytime soon.
I’ve been researching this for uni at you’re not too far off. There’s a bunch of benchmarks out there and LLMs are ran against a set of questions and are given a score based on its response.
The questions can be multiple choice or open ended. If they’re open then it’ll be marked by another LLM.
There’s a couple initiatives to create benchmarks with known answers that are updated frequently, so they don’t need to marked by another LLM, but where the questions aren’t in the testing LLMs training dataset. This is because a lot of advancements in LLMs with these benchmarks is just the creators including the text questions and answers in the training data.
how do you measure performance of an llm? ask it how many 'r’s there are in ‘strawberry’ and how many times you have to say ‘no thats wrong’ until it gets 3
Basically speed and power usage to process a query. Also, there’s been tangible progress in doing reasoning with unsupervised learning seen in DeepSeek R1 and approaches such as neurosymbolics. These types of models can actually explain the steps they take to arrive at the answer, and you can correct them.
I suspect “reasoning” models are just taking advantage of the law of averages. You could get much better results from prior llms if you provided plenty of context in your prompt. In doing so you would constrain the range of possible outputs which helps to reduce “hallucinations”. You could even use llms to produce that context for you. To me it seems like reasoning models are just trained to do that all in one go.
Neurosymbolic models use symbolic logic to do the reasoning on the data that’s parsed and classified using a deep neural network. If you’re interested in how this works in detail, this is a good paper https://arxiv.org/abs/2305.00813
I appreciate the link but I stand by my point. As far as I’m aware, “reasoning” models like R1 and O3 are not architecturally very different from Deepseek v3 or GPT4, which have already integrated some of the features mentioned in that paper.
Also as an aside, I really despise how compsci researchers and the tech sector borrow language from neuroscience. They take concepts they don’t fully understand and then use them in obscenely reductive ways. It ends up heavily obscuring how LLMs function and what their limitations are. They of course can’t speak plainly about these things otherwise the financial house of cards built up around LLMs would collapse. As such, I guess we’re just condemned to live in the fever dreams of tech entrepreneurs who are at their core are used car salesmen with god complexes.
Don’t get me wrong, LLMs and other kinds of deep generative models are useful in some contexts. It’s just their utility is not at all commensurate with the absurd amount of resources expended to create them.
The way to look at models like R1 is as layers on top of the LLM architecture. We’ve basically hit a limit of what generative models can do on their own, and now research is branching out in new directions to supplement what the GPT architecture is good at doing.
The potential here is that these kinds of systems will be able to do tasks that fundamentally could not be automated previously. Given that, I think it’s odd to say that the utility is not commensurate with the effort being invested into pursuing this goal. Making this work would effectively be a new industrial revolution. The reality is that we don’t actually know what’s possible, but the rate of progress so far has been absolutely stunning.
R1 has an identical architecture to v3 though right? They just used reinforcement learning to fine tune the base model. There are no extra layers, just a few additional steps in its production.
Sure but the technology has honestly been a bit more evolutionary than revolutionary as far as I’m concerned. The biggest change was the amount of compute and data used to train these models. That only really happened because it seems capital had nowhere else to go and not because LLMs are uniquely promising.
Sure, but how exactly are the companies investing in “AI” going to make it work? To me it just seems like they’re dumping resources into a dead end because they have no other path forward. Tech companies have been promising a new Industrial Revolution since their inception. However, even their “AI” products have yet to have a meaningful impact on worker productivity. It’s worth interrogating why that is.
As I stated before, I think they all fundamentally misunderstand how human cognition works, perhaps willfully. That’s why I’m confident tech companies as they exist will not deliver on the promise of “AGI”, a lovely marketing term created to make up for the fact that their “AIs” are not very intelligent.
You’re right that R1 does the tuning up front as opposed to dynamically, but I’d still consider that a layer on top of the base LLM.
I’m not suggesting LLMs are uniquely promising, it’s just the approach that’s currently popular and we don’t know how far we can push it yet. What’s appealing about GPT architecture is that it appears to be fairly general and adaptable in many domains. However, I do think it will end up being combined with other approaches going forward. We’re already seeing that happening with stuff like neurosymbolic architecture.
My main point is that the limitations of the approach that people keep fixating on don’t appear to be inherent in the way the algorithm works, they’re just an artifact of people still figuring out how to apply this algorithm in an efficient way. The fact that massive improvements have already been found suggests that there’s probably a while yet before we run out of ideas.
I don’t really care about AI companies myself. I want to see open source projects like DeepSeek and ultimately state level funding which we’ll likely see happening in China. It’s also a fallacy to extrapolate from the fact that something hasn’t happened that it won’t happen. Companies often hype and overpromise, but that doesn’t mean that the goals themselves aren’t achievable.
Again, I agree that companies like OpenAI are largely hype driven. However, some people do make a genuine effort to understand how human cognition works. For example, Jeff Hawkins did a good effort exploring this topic with his On Intelligence book. The impression I get with DeepSeek is that their goal is to largely do research for the sake of research, and they’ve actually stated that they’re not looking for commercial application as their primary goal right now. I think that exploration for the sake of exploration is the correct view to have here.
I think it’s not fair to call DeepSeek open source. They’ve released the weights of their model but that’s all. The code they used to train it and the training data itself is decidedly not open source. They aren’t the only company to release their weights either. Meta’s LlaMa was probably the best open weight model you could use prior to DS v3. As I see it, this is just a consequence of competition in a market where capital has nowhere else to go. Meta and DeepSeek likely want to prevent OpenAI from becoming profitable.
As an aside, although I personally believe in some aspects of China’s reform and opening up it’s not without its faults. Tech companies in China often make the same absurd claims and engage in behavior that’s as deluded as companies in Silicon Valley.
I think this is our core disagreement. I agree, we have not pushed LLMs to their absolute limit. Mixture of Experts models, optimized training, and “reasoning models” are all incremental improvements over the previous generation of LLMs. That said, I strongly believe that the architecture of LLMs are fundamentally incapable of intelligent behavior. They’re more like a photograph of intelligence than the real thing.
I agree wholeheartedly. However, you don’t need to dump an absurd amount of resources into training an llm to test the viability of any of the incremental improvements that DeepSeek has made. You only do that if your goal is to compete with OpenAI and others for access to capital.
Yes, but that work largely goes unnoticed because it’s not at all close to providing us with a way to build intelligent machines. It’s work that can only really happen at academic or public research institutions because it’s not profitable at this stage. I would be much happier if the capital currently directed towards LLMs was redirected towards this type of work. Unfortunately, we’re forced to abide by the dictates of capitalism and so that won’t happen anytime soon.
it requires fewer tons of CO2 to tell you that 757 * 128 = 3042
They use synthetic AI generated benchmarks
It’s computer silicon blowing itself basically
I’ve been researching this for uni at you’re not too far off. There’s a bunch of benchmarks out there and LLMs are ran against a set of questions and are given a score based on its response.
The questions can be multiple choice or open ended. If they’re open then it’ll be marked by another LLM.
There’s a couple initiatives to create benchmarks with known answers that are updated frequently, so they don’t need to marked by another LLM, but where the questions aren’t in the testing LLMs training dataset. This is because a lot of advancements in LLMs with these benchmarks is just the creators including the text questions and answers in the training data.