ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

peppersky [he/him, any]@hexbear.net · 3 months ago

These things suck and will literally destroy the world and the human spirit from the inside out no matter who makes them

xiaohongshu [none/use name]@hexbear.net · edit-2 3 months ago

I think this kind of statement needs to be more elaborate to have proper discussions about it.

LLMs can really be summarized as “squeezing the entire internet into a black box that can be queried at will”. It has many use cases but even more potential for misuse.

All forms of AI (artificial intelligence in the literal sense) as we know it (i.e., not artificial general intelligence or AGI) are just statistical models that do not have the capacity to think, have no ability to reason and cannot critically evaluate or verify a certain piece of information, which can equally come from legitimate source or some random Reddit post (the infamous case of Google AI telling you to put glue on your pizza can be traced back to a Reddit joke post).

These LLM models are built by training on the entire internet’s datasets using a transformer architecture that has very good memory retention, and more recently, with reinforcement learning with human input to reduce their tendency to produce incorrect output (i.e. hallucinations). Even then, these dataset require extensive tweaking and curation and OpenAI famously employ Kenyan workers at less than $2 per hour to perform the tedious work of dataset annotation used for training.

Are they useful if you just need to pull up a piece of information that is not critical in the real world? Yes. Is it useful if you don’t want to do your homework and just let the algorithm solve everything for you? Yes (of course, there is an entire discussion about future engineers/doctors who are “trained” by relying on these AI models and then go on to do real things in the real world without developing the capacity to think/evaluate for themselves). Would you ever trust it if your life depends on it (i.e. building a car, plane or a house, or treating an illness)? Hell no.

A simple test case is to ask yourself if you would ever trust an AI model over a trained physician to treat your illness? A human physician has access to real-world experience that an AI will never have (no matter how much medical literature it can devour on the internet), has the capacity to think and reason and thus the ability to respond to anomalies which have never been seen before.

An AI model needs thousands of images to learn the difference between a cat and a dog, a human child can learn that with just a few examples. Without a huge input dataset (helped annotated by an army of underpaid Kenyan workers), the accuracy is simply crap. The fundamental process of learning is very different between the two, and until we have made advances on AGI (which is as far as you could get from the current iterations of AI), we’ll always have to deal with the potential misuses of AI in our lives.

SkingradGuard [he/him, comrade/them]@hexbear.net · 3 months ago

are just statistical models that do not have the capacity to think, have no ability to reason and cannot critically evaluate or verify a certain piece of information, which can equally come from legitimate source or some random Reddit post

I really hate how techbros have convinced people that it’s something magical. But all they’ve done is convinced themselves and everyone else that every tool is a hammer

Lovely_sombrero [he/him]@hexbear.net · edit-2 3 months ago

Yes, LLMs are stupid and they steal your creative creations. There is some real room for machine learning (something that has been just all combined into “AI” now for some reason), like Nvidia’s DLSS technology for example. Or other fields where the computer has to operate in a closed environment with very strictly defined parameters, like pharmaceutical research. How proteins fold is strictly governed by laws of physics and we can tell the model exactly what those laws are.

But it is funny how all the hundreds of billions $$$ invested into LLMs in the West, along with big government support and all the “smartest minds” working on it, they got beaten by the much smaller and cheaper Chinese competitors, who are ACTUALLY opensourcing their models. US tech morons got owned on their own terms.

sewer_rat_420 [he/him, any]@hexbear.net · 3 months ago

Even LLMs have some decent uses, but you put the finger on what I am feeling, that all of AI and machine learning is being overshadowed by these massive investments into LLMs, just because a few ghouls sniff profit

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

that’s a deeply reactionary take

peppersky [he/him, any]@hexbear.net · 3 months ago

LLMs are literally reactionary by design but go off

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

They’re just automation

https://redsails.org/artisanal-intelligence/

https://www.artnews.com/art-in-america/features/you-dont-hate-ai-you-hate-capitalism-1234717804/

xiaohongshu [none/use name]@hexbear.net · edit-2 3 months ago

They’re not just automations though.

Industrial automations are purpose-built equipments and softwares designed by experts with very specific boundaries set to ensure that tightly regulated specifications can be met - i.e., if you are designing and building a car, you better make sure that the automation doesn’t do things it’s not supposed to do.

LLMs are general purpose language models that can be called up to spew out anything and without proper reference to their reasoning. You can technically use them to “automate” certain tasks but they are not subjected to the same kind of rules and regulations employed in the industrial setting, where tiny miscalculations can lead to consequences.

This is not to say that they are useless and cannot aid in the work flow, but their real use cases have to be manually curated and extensively tested by experts in the field, with all the caveats of potential hallucinations that can cause severe consequences if not caught in time.

What you’re looking for is AGI, and the current iterations of AI is the furthest you can get from an AGI that can actually reason and think.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

That’s not the case with stuff like neurosymbolic models and what DeepSeek R1 is doing. These types of models do actual reasoning and can explain the steps they use to arrive at a solution. If you’re interested, this is a good read on the neurosymbolic approach https://arxiv.org/abs/2305.00813

However, automation doesn’t just apply to stuff like factory work. If you read the articles I linked above, you’ll see that they’re specifically talking about automating aspects of producing media such as visual content.

xiaohongshu [none/use name]@hexbear.net · 3 months ago

The “chain of thought” output simply gives you the “progress” and the specific path/approach the model has arrived at a particular answer - which is useful for tweaking and troubleshooting the parameters toward improving the accuracy and reducing hallucinations on a model, but it is not the same reasoning that could be given from a human mind.

The transformer architecture is really just a statistical model built to have very strong memory retention when it comes to making associations (in the case of LLMs, words). It fundamentally cannot think or reason. It takes a specific “statistical” path and arrives at an answer based on the associations it has been trained on, but you cannot make it think and reason the way we do, nor can it evaluate or verify the validity of a piece of information based on cognitive reasoning.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

Do you actually understand what symbolic logic is?

piggy [they/them]@hexbear.net · edit-2 3 months ago

Neurosymbolic AI is overhyped. It’s just bolting on LLMs to symbolic AI and pretending that it’s a “brand new thing” (it’s not, it’s actually how most LLMs practically work today and have been for a long time GPT-3 itself is neurosymbolic). The advocates of approach pretend that the “reasoning” comes from symbolic AI which is known as classical AI, which still suffers from the same exact problems that it did in the 1970’s when the first AI winter happened. Because we do not have an algorithm capable of representing the theory of mind, nor do we have a realistic theory of mind to begin with.

Not only that but all of the integration points between classical techniques and statistical techniques present extreme challenges because in practice the symbolic portion essentially trusts the output of the statistical portion because the symbolic portion has limited ability to validate.

Yeah you can teach ChatGPT to correctly count the r’s in strawberry with a neurosymbolic approach but general models won’t be able to reasonably discover even the most basic of concepts such as volume displacement by themselves.

You’re essentially back at the same problem where you either lean on the symbolic aspects and limit yourself entirely to advanced ELIZA like functionality that can just use classifier or your throw yourself to the mercy of the statistical model and pray you have enough symbolic safeguards.

Either way it’s not reasoning, it is at best programming – if that. That’s actually the practical reason why the neurosymbolic space is getting attention because the problem has effectively been to be able to control inputs and outputs for the purposes of not only reliability / accuracy but censorship and control. This is still a Garbage In Garbage Out process.

FYI most of the big names in the “Neurosymbolic AI as the next big thing” space hitched their wagon to Khaneman’s Thinking Fast and Slow bullshit that is effectively made up bullshit like Freudianism but lamer and has essentially been squad wiped by the replication crisis.

Don’t get me wrong DeepSeek and Duobau are steps in the right direction. They’re less proprietary, less wasteful, and broadly more useful, but they aren’t a breakthrough in anything but capitalist hoarding of technological capacity.

The reason AI is not useful in most circumstance is because of the underlying problems of the real world and you can’t algorithm your way out of people problems.

xiaohongshu [none/use name]@hexbear.net · edit-2 3 months ago

I think you have a fundamental misunderstanding of how neural network based LLMs work.

Let’s say you give a prompt of “tell me if capitalism is a good or a bad system”, in a very simplistic sense, what it does is that it will query the words/sentences associated with the words “capitalism” and “good”, as well as “capitalism” and “bad” which it has been trained on from the entire internet’s data, and from there it spews out seemingly coherent sentences and paragraphs about why capitalism is good or bad.

It does not have the capacity to reason or evaluate whether capitalism as an economic system itself is good or bad. These LLMs are instead very powerful statistical models that can reproduce coherent human language based on word associations.

What is groundbreaking about the transformer architecture in natural language processing is that it can allow the network to retain the association memory for far longer than the previous iterations like LSTM, seq2seq etc could, as they would start spewing out garbled text after a few sentences or so because their architectures do not allow memory to be properly retained after a while (vanishing gradient problem). Transformer based models solved that problem and enabled reproduction of entire paragraphs and even essays of seemingly coherent human-like writings because of their strong memory retention capability. Impressive as it is, it does not understand grammatical structures or rules. Train it with a bunch of broken English texts, and it will spew out broken English.

In other words, the output you’re getting from LLMs (“capitalism good or bad?”) are simply word association that it has been trained on from the input collected from the entire internet, not actual thinking coming from its own internal mental framework or a real-world model that could actually comprehend causality and reasoning.

The famous case of Google AI telling people to put glue on their pizza is a good example of this. It can be traced back to a Reddit joke post. The LLM itself doesn’t understand anything, it simply reproduces what it has been trained on. Garbage in, garbage out.

No amount of “neurosymbolic AI” is going to solve the fundamental issue of LLM not being able to understand causality. The “chain of thought” process allows researchers to tweak the model better by understanding the specific path the model arrives at its answer, but it is not remotely comparable to a human going through their thought process.

ThermonuclearEgg [she/her, they/them]@hexbear.net · 3 months ago

They’re just automation

The fact that there is nuance does not preclude that artifacts can be political, whether intentional or not..

While I don’t know whether this applies to DeepSeek R1, the Internet perpetuates many human biases and machine learning will approximate and pick up on those biases regardless of which country is doing the training. Sure you can try to tell LLMs trained on the Internet not to do that — we’ve at least become better at that than Tay in 2016, but that probably still goes about as well as telling a human not to at best.

I personally don’t buy the argument that you should hate the designer instead of the technology, in the same way we shouldn’t excuse a member of Congress’ actions because of the military-industrial complex, or capitalism, or systemic racism, and so on that ensured they’re in such a position.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

I don’t see these tools replacing humans in the decision making process, rather they’re going to be used to automate a lot of tedious work with the human making high level decisions.

ThermonuclearEgg [she/her, they/them]@hexbear.net · 3 months ago

That’s fair, but human oversight doesn’t mean they’ll necessarily catch biases in its output

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

We already have that problem with humans as well though.

Assian_Candor [comrade/them]@hexbear.net · 3 months ago

There’s value in the tedious decisions though

The tedious decisions are what build confidence and experience

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

People build confidence doing work in any domain. Working with artificial agents is simply going to build different kinds of skills.

Outdoor_Catgirl [she/her, they/them]@hexbear.net · 3 months ago

What does that even mean

shath [comrade/them]@hexbear.net · 3 months ago

they “react” to your input and every letter after i guess?? lmao

Hermes [none/use name]@hexbear.net · 3 months ago

Hard disk drives are literally revolutionary by design because they spin around. Embrace the fastest spinning and most revolutionary storage media

comrade_pibb [comrade/them]@hexbear.net · 3 months ago

sorry sweaty, ssds are problematic

Hermes [none/use name]@hexbear.net · 3 months ago

Scratch a SSD and a NVMe bleeds.

culpritus [any]@hexbear.net · 3 months ago

Sufi whirling is the greatest expression of revolutionary spirit in all of time.

bobs_guns@lemmygrad.ml · 3 months ago

Pushing glasses up nose further than you ever thought imaginable *every token after

shath [comrade/them]@hexbear.net · 3 months ago

hey man come here i have something to show you

plinky [he/him]@hexbear.net · 3 months ago

It’s a model with heavy cold war liberalism bias (due to information being fed to it), unless you prompt it - you’ll get freedom/markets/entrepreneurs out of it for any problem. As people are treating them as gospel of the impartial observer -

xiaohongshu [none/use name]@hexbear.net · edit-2 3 months ago

The fate of the world will be ultimately decided on garbage answers spewed out by an LLM trained on Reddit posts. That’s just how the future leaders of the world will base their decisions on.

plinky [he/him]@hexbear.net · 3 months ago

Future senator getting “show hog” to some question with 0.000001 probability: well, if the god-machine says so

iByteABit [comrade/them]@hexbear.net · 3 months ago

That’s not the technology’s fault though, it’s just that the technology is produced by an imperialist capitalist society that treats cold war propaganda as indisputable fact.

Feed different data to the machine and you will get different results. For example if you just train a model on CIA declassified documents it will be able to answer questions about the real role of the CIA historically. Add a subjective point of view on these events and it can either answer you with right wing bullshit if that’s what you gave it, or a marxist analysis of the CIA as an imperialist weapon that it is.

As with technology in general, it’s effect on society lies with the hands that wield it.

plinky [he/him]@hexbear.net · edit-2 3 months ago

Put it that way, even if one feeds it cia files to the hearts content, the weights of words which are needed to construct sentences is still sitting somewhere there. (also answering about real role of cia implies llm has any idea about reality, it will just bias answer in another direction, just as marxist analysis: it will just reproduce likeliest answer resembling marxist literature you fed to it, not “have analysis”).

Benign application of llm is natural language processing into fixed functions on the back end (e.g. turn off the lights when it start raining or whatever, something which can be disassembled from millions of ways into same set of instructions, here its fuzziness is great)

peppersky [he/him, any]@hexbear.net · 3 months ago

Feed different data to the machine and you will get different results.

These things have already eaten all the data that there is, and I don’t need to tell you that, but that data, as it has been produced almost solely under capitalism, is just crap.

peppersky [he/him, any]@hexbear.net · 3 months ago

“let’s just use autocorrect to create the future this is definitely cool and not regressive and reactionary and a complete recipe for disaster”

crime [she/her, any]@hexbear.net · edit-2 3 months ago

It’s technology with many valid use-cases. The misapplication of the technology by capital doesn’t make the tech itself inherently reactionary.

Dessa [she/her]@hexbear.net · 3 months ago

It’s incredibly power hungry.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

The context of the discussion is that it’s already 50x less power hungry than just a little while ago.

crime [she/her, any]@hexbear.net · edit-2 3 months ago

For now. We’ve been seeing great strides in reducing that power hunger recently, including by the LLM that’s the subject of this post.

That also doesn’t make it inherently reactionary.

enkifish [any]@hexbear.net · 3 months ago

We’ve been seeing great strides in reducing that power hunger recently, including by the LLM that’s the subject of this post.

Due to the market economy in both the United State and China, further development of LLM efficiency is probably the worst thing that could possibly happen. Even if China did not want to subject LLMs to market forces, they are going to need to compete with the US. This is going further accelerate the climate disaster.

crime [she/her, any]@hexbear.net · 3 months ago

Again, an issue with capitalism and not the technology itself.

enkifish [any]@hexbear.net · 3 months ago

Well I agree with you there. Too bad there’s all this capitalism.

Cimbazarov [none/use name]@hexbear.net · 3 months ago

Kind of wondering why China needs to compete in this realm? Unless their is something from LLM’s that improves the productive forces in a country, I don’t see any other reason.

At least the space race had something to do with a strategic military advantage

GaryLeChat@lemmygrad.ml · 3 months ago

Vacuum tubes were too

marxisthayaca [he/him,they/them]@hexbear.net · 3 months ago

except this one doesn’t require as much power and training costs, which is where the resource intensive problem resides.

peppersky [he/him, any]@hexbear.net · 3 months ago

LLMs literally cannot do anything else other than reproduce data it has been given. The closer the output is to the input, the better it is. Now if the input is “all the data that capitalism has produced” then the expected output is “an infinite amount of variations on that data”. That’s why it is reactionary.

tripartitegraph [comrade/them]@hexbear.net · edit-2 3 months ago

This is a stupid take. I like the autocorrect analogy generally, but this veers into Luddite-ism.
Let me add, the way we’re pushed to use LLMs is pretty dumb and a waste of time and resources, but the technology has pretty fascinating use-cases in material and drug discovery.

piggy [they/them]@hexbear.net · edit-2 3 months ago

drug discovery

This is mainly hype. The process of creating AI has been useful for drug discovery, LLMs as people practically know them (e.g. ChatGBT) have not other than the same kind of sloppy labor corner cost cutting bullshit.

If you read a lot of the practical applications in the papers it’s mostly publish or perish crap where they’re gushing about how drug trials should be like going to cvs.com where you get a robot and you can ask it to explain something to you and it spits out the same thing reworded 4-5 times.

They’re simply pushing consent protocols onto robots rather than nurses, which TBH should be an ethical violation.

tripartitegraph [comrade/them]@hexbear.net · edit-2 3 months ago

I should have been more precise, but this is all in the context of news about a cutting-edge LLM using a fraction of the cost of ChatGPT, and comments calling it all “reactionary autocorrect” and “literally reactionary by design”. My issue is really with the overuse of the term “AI”, but I didn’t feel like explaining the difference between a GPT and deep kernel learning or graph neural networks, which have been used for drug and material discovery. Peppersky’s comment came off as very anti-intellectual to me, which I hate to see amongst “leftists”.

piggy [they/them]@hexbear.net · edit-2 3 months ago

I should have been more precise, but this is all in the context of news about a cutting-edge LLM using a fraction of the cost of ChatGPT, and comments calling it all “reactionary autocorrect” and “literally reactionary by design”.

I disagree that it’s “reactionary by design”. I agree that it’s usage is 90% reactionary. Many companies are effectively trying to use it in a way that attempts to reinforce their deteriorating status quo. I work in software so I always see people calling this shit a magic wand to problems of the falling rate of profit and the falling rate of production. I’ll give you an extrememly common example that i’ve seen across multiple companies an industries.

Problem: Modern companies do not want to be responsible for the development and education of their employees. They do not want to pay for the development of well functioning specialized tools for the problems their company faces. They see it as a money and time sink. This often presents itself as:

missing, incomplete, incorrect documentation
horrible time wasting meeting practices

I’ve seen the following be pitched as AI Bandaids:

Proposal: push all your documentation into a RAG LLM so that users simply ask the robot and get what they want

Reality: The robot hallucinates things that aren’t there in technical processes. Attempts to get the robot to correct this involves the robot sticking to marketing style vagaries that aren’t even grounded in the reality of how the company actually works (things as simple as the robot assuming how a process/team/division is organized rather than the reality). Attempts to simply use it as a semantic search index end up linking to the real documentation which is garbage to begin with and doesn’t actually solve anyone’s real problems.

Proposal: We have too many meetings and spend ~4 hours on zoom. Nobody remembers what happens in the meetings, nobody takes notes, it’s almost like we didn’t have them at all. We are simply not good at working meetings and it’s just chat sessions where the topic is the project. We should use AI features to do AI summaries of our meetings.

Reality: The AI summaries cannot capture action items correctly if at all. The AI summaries are vague and mainly result in metadata rather than notes of important decisions and plans. We are still in meetings for 4 hours a day, but now we just copypasta useless AI summaries all over the place.

Don’t even get me started on CoPilot and code generation garbage. Or making “developers productive”. It all boils down to a million monkey problem.

These are very common scenarios that I’ve seen that ground the use of this technology in inherently reactionary patterns of social reproduction. By the way I do think DeepSeek and Duobao are an extremely important and necessary step because it destroys the status quo of Western AI development. AI in the West is made to be inefficient on purpose because it limits competition. The fact that you cannot run models locally due to their incredible size and compute demand is a vendor lock-in feature that ensures monetization channels for Western companies. The PayGo model bootstraps itself.

tripartitegraph [comrade/them]@hexbear.net · 3 months ago

I think we agree that LLMs like ChatGPT and CoPilot largely will be (and are being) used to discipline labor and that is reactionary. But this feels more like a list of gripes with LLMs and not actually responding to my comment. DKL, GNNs and other machine learning architectures ARE being used in drug and material discovery research, I just didn’t feel like explaining the difference between that and the popular conception of “AI” to peppersky, given how flippant and troll-y their comments were. We should push back against anti-intellectualism in our spaces, and that’s all I was trying to do.

piggy [they/them]@hexbear.net · 3 months ago

I agree that anti-intellectualism is bad, but I wouldn’t necessarily consider being AI negative by default, a form of anti-intellectualism. It’s the same thing as people who are negative on space exploration. It’s a symptom where it seems that there is infinite money for things that are fad/scams/bets, things that have limited practical use in people’s lives, and ultimately not enough to support people.

That’s really where I see those arguments coming from. AI is quite honestly a frivolity in a society where housing is a luxury.

Cimbazarov [none/use name]@hexbear.net · 3 months ago

Just like every technological advancement. The problem isn’t the technology but how capitalism puts it to use

peppersky [he/him, any]@hexbear.net · 3 months ago

Luddites were actually cool and right. They didn’t organize and destroy looms because they just loved the more tedious work of non-powered looms, they destroyed them because they were the beginning of industrial capitalism and wage labor.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 months ago

🙄

Pili [any, any]@hexbear.net · edit-2 3 months ago

In the meantime, it’s making my job a lot more bearable.

SkingradGuard [he/him, comrade/them]@hexbear.net · 3 months ago

How?

Pili [any, any]@hexbear.net · edit-2 3 months ago

I work in software development, and AI can generate instantly some code that would take me an hour to research how to write when I’m using an SDK I’m unfamiliar with, or it can very easily find little mistakes that would take me a long time to figure out. If I have to copy and paste a lot of data and have to do boring repetitive work like create constants from it, it can do all of it for me if I give it an explanation of what I want.

It makes me gain a lot of time, and spare me a lot of mental fatigue so I have more energy to do things that I enjoy after work.

QuillcrestFalconer [he/him]@hexbear.net · 3 months ago

It’s really useful to use a library / language you’re not very familiar with. I’ve used it recently to learn how to use minizinc, a constraint problem modeling language. There’s not a lot data of it on the Internet, and for that reason, sometimes the generated code won’t even be sintatically correct, but even then it was extremely useful to learn the language

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance Releases Doubao Large Model 1.5 Pro, Performance Surpassing GPT-4o and Claude3.5Sonnet