ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

陆船。@lemmygrad.ml · 6 months ago

I feel so conflicted about this. On the one hand huge reductions in resource consumption of these things is good for everyone. The Western ones are so wasteful for how quickly and widely pushed they are. On the other hand these things feel like technology in search of a problem.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

People often tend to underestimate the potential for new technology, but there are plenty of legitimate use cases already. For example, I practice speaking Mandarin using a LLM, it’s great at doing conversation and correcting me when I say something grammatically wrong. They’re also good for narrating audio books, generating subtitles, adding voice to games, etc. I also find they can be helpful when coding, it’s often faster to get a model to point you in the right direction than searching for something on the internet. For example, I find they’re great at crafting SQL queries. I often know what I want in a query, but might not know the specific syntax. I’m sure we’ll be finding plenty of other use cases going forward especially as stuff like reasoning models starts to mature where they can actually explain the steps they use to arrive at a solution and can be corrected.

The power usage was basically the main legitimate argument against this tech, but now we’re seeing that problem is already being addressed. I’m sure we’ll continue to see even more improvements down the road.

ShitPosterior [none/use name]@hexbear.net · 6 months ago

Can you elaborate on how you use an LLM to practice Mandarin? This is very interesting to me, I’d love to do the same to build a base knowledge before actually yknow communicating w people

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

I use this app https://www.superchinese.com/ and it has a chat bot that walks you through different conversation scenarios.

redtea@lemmygrad.ml · 6 months ago

Do you mind if I ask a few questions? I’ve been wary of AI in language learning because I don’t want to learn hallucinated words or grammar.

Does the app use AI to generate conversations or to guide you through human-written conversations? Or maybe both?

If you’re interacting with the AI in Chinese, how good is it? I imagine you didn’t be using it if it was poor but I’m wondering whether you put up with any obvious problems because some other features are worth it, for example.

Are you on the free, Chao, or Plus plan and was it an easy decision once you saw the features?

Tbh I can’t wait to use AI for this purpose, I just want to be sure the software is ready. I’ve seen some terrible translations in languages they I’m familiar with, so I’m wary.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

It uses a mix, it does not hallucinate words and it’s actually pretty good at catching stuff like grammar mistakes. I find the big value is that it forces you to do free form conversation where you have to think on your feet. I find this is more valuable than just reading and memorizing stuff which other apps do. I ended up getting the Plus plan, and definitely feel it’s been worth it. The app itself also has a lot of regular lessons, the AI isn’t the only part of it.

陆船。@lemmygrad.ml · 6 months ago

Oh I see! I’m only really familiar with the normie usage to make soulless remixes of their profile picture or those lawyers who keep getting held in contempt of court for submitting documents with hallucinations to the court.

Munrock ☭@lemmygrad.ml · 6 months ago

Let me add a couple more use cases, as someone working in education:

Lesson Planning: You get tasked with planning and holding a lesson about world cultures. The lesson’s conducted in English, the students are ESL students so you have to make sure all vocabulary is within their ability level. Also it’s Chinese New Year soon so one of the Vice Principals has asked you to make it about that. Also, all the classes in the year group will have the lesson at the same time, so this has to be a lesson plan that even teachers that don’t teach English can still teach, in English. Also, the government has mandated that its new ‘values education’ criteria should be integrated into all subjects, so you have to include some content about one of the 12 listed virtue categories. Also, every source material you use that isn’t pre-approved has to be reviewed and countersigned by 4 other staff members as quality control and protection against misinformation (and, given that it’s in English, Western propaganda), and there isn’t any pre-approved teaching material for this task. Also you’ve got a budget, and a deadline.

You get lesson planning tasks of this complexity at least every 2 weeks, and it’s a fucking timesuck. Deepseek cuts hours out of the process and takes it to a level that would require exponentially more research time from me. I refine the result (30-45m), take it to my colleagues for the safety checks and then we just have to prepare materials.

Music: that founder of Suno is a turd, judging by recent quotes attributed to him. But with Suno my very young students have action dance songs that mention every one of them by name. Older students are a lot more interested in creative writing when the words they write can be turned into K-Pop in the same lesson that they wrote them in.

And let me add that I work in a city where public education is well-funded. I can’t imagine how much of a godsend this kind of tool would be for underfunded schools. I often see people say Suno and image generator AIs are ‘just a toy’, and a waste of resources because they’re just toys. But they’re tools that are sold as toys, because they wouldn’t be profitable for their owners if they weren’t also used for frivolous entertainment.

Remove the profit motive, fix the wastefulness, provide patronage for the source material artists and writers. That’s the way forward.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

These are great examples. Never considered stuff like lesson planning, but makes perfect sense once you described it. Completely agree that once profit motive is removed then we can start finding genuinely good uses for this tech. I’m really hoping that open source nature of DeepSeek is going to play a positive role in that regard.

NewOldGuard [he/him, they/them]@hexbear.net · 6 months ago

I think power usage is a legitimate concern still, but on top of that the unreliability is a huge factor. LLMs hallucinate all the time, by design, so if you are using them for anything where it’s important to be correct you are bound for failure. There will always be hallucination lest we overfit the model, but a model that’s overfit with no hallucination just reproduces its training data and therefore has no more functionality than a search engine but with vastly higher energy requirements. These things have applications, but really only for approximating stuff where no other approach could do it well. IMO any other use case is a mistake

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

People are actively working on different approaches to address reliability. One that I like in particular is neurosymbolic type of model where deep neural networks are used to classify data and find patterns, and a symbolic logic engine is used to actually reason about it. This basically gives you the best of both worlds. https://arxiv.org/abs/2305.00813

小莱卡@lemmygrad.ml · 6 months ago

how do you have AI setup yogthos?

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

I got DeepSeek running on ollama https://dev.to/shayy/run-deepseek-locally-on-your-laptop-37hl

but for language practice I’m just using this app https://www.superchinese.com/

小莱卡@lemmygrad.ml · 6 months ago

awesome thanks

juchenecromancer@lemmygrad.ml · 6 months ago

What kind of specs do you need to run DeepSeek locally? A few months back, I tried using ollama to run some small llama models on my home laptop (Ubuntu with 16GB RAM and a weak Nvidia integrated GPU w/ nouveau drivers) at it was borderline unusable.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

I got the deepseek-r1:14b-qwen-distill-fp16 version running with 32gb ram and a GPU here.

CriticalResist8@lemmygrad.ml · 6 months ago

this will hasten the fall of the western tech world and so it’s a good thing overall.

CriticalResist8@lemmygrad.ml · 6 months ago

I read 200x cheaper than GPT 4o, which makes the metering too cheap to even keep track of.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

The whole AI subscription business model is basically dead in the water now, and Nvidia might start tanking too. 🤣

poo_22@lemmygrad.ml · 6 months ago

What do I need to run this? I saw people on Xiaohongshu make an 8 macbook cluster, presumably networked using thunderbolt, and I’m thinking that might actually be the most economical way to do it right now.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

It depends on the model size, here’s how you can get DeepSeek running locally https://dev.to/shayy/run-deepseek-locally-on-your-laptop-37hl

poo_22@lemmygrad.ml · 6 months ago

According to this page to run the full model you need about 1.4TB of memory, or about 16 A100 GPUs. Which is still prohibitively expensive for an individual enthusiast, but yes you can run a simplified model locally with ollama. Still probably needs a GPU with a lot of memory.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 months ago

I got deepseek-r1:14b-qwen-distill-fp16 running locally with 32gb ram and a GPU, but yeah you do need a fairly beefy machine to run even medium sized models.

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance Releases Doubao Large Model 1.5 Pro, Performance Surpassing GPT-4o and Claude3.5Sonnet