Feedback for the dev! (About the AI text model)

Basti0n@lemmy.world · edit-2 4 months ago

Feedback for the dev! (About the AI text model)

justpassing@lemmy.world · 4 months ago

I thought I was imagining things, but since others seem to be doing better, I guess that the update really improved the model then! That’s awesome

From my side, at least two things have improved: the English no longer decays into caveman speak, and the head-start is infinitely easier with minimal directions to the model. Also, some contradicting descriptions tend to work better. This all is actually a great improvement, but I’d be lying if I’d say that on my side I tested them thoroughly.

Something I tried as a quick test was to check how the model reacts with long logs and… yep, it still get stuck and running in circles due to weave patterns that repeat ad nauseam. It may be me having bad samples, but problems are still lingering past the 200kB, heavy past the 500kB mark, and unbearable on the 1Mb mark. By this I just mean having to deal with unsticking the LLM by editing heavily, not that it is impossible to continue. If someone has a long log that is fluid, please share what conditions allow for it.

But yeah, Basti0n is right! There was indeed a notorious improvement even if we are not there yet. Maybe there is future for DeepSeek after all!

Basti0n@lemmy.world · edit-2 4 months ago

Hello my friend! Thank you for your comment and your opinion! I had also checked out the post you shared earlier, and I mentioned there as well that it was extremely helpful. So thank you again for that too.

As for the model’s current state of course it’s not perfect or flawless. I want to clarify that so it’s not misunderstood. It still needs improvement, and it should be improved. But if we think back to when the model first came out with all those hallucinations, weird nonsense characters appearing out of nowhere, not even getting the character’s name right, and until just a few weeks ago barely being able to form proper sentences its current state is, for me, real progress. And I just wanted to highlight that. So this post is more like saying to the dev, “Whatever you’re doing, you’re on the right track (for now).”

The repetition issue still continues, especially as the topic or story goes on. Based on my own observations (I’m talking about ACC here), its creativity drops after about every 10 messages of three paragraphs each, and it keeps declining. That initial creativity and originality start to disappear. Those parts still definitely need improvement. And I still strongly believe the context window needs to be increased significantly.

But other than that, considering the version it started with and the awful state it was in even just a few weeks ago, there has been progress.

Edit: Today I tried again and I think the quality has dropped again. While I was writing my post, it was giving high-quality answers

kljafgg9r0@lemmy.world · 4 months ago

You are awesome, thank you for your work, I love perchance. With the AI Chat, I can not say that it’s been an improvement. The model we had a few months ago was really good, and th new one seems to insist on making people, such as lovers, strangely cold or snappy/angry. I have also not noticed any improvement in the ai chat model but maybe its just me. Just to give feedback, cheers.

justpassing@lemmy.world · 4 months ago

I guess that the drop is the luck of the draw, my friend! Wrangling an LLM is very tricky so as the dev said, we are in for a bumpy ride for the next couple of months!🤣

But you are on point with the diagnostic. I use more AI Chat, so I can’t speak much of the particularities of ACC, at least in AI Chat the decay seems to be at the 20-30 input, and then spaced three paragraphs as you said. It could be due to the raw input in ACC is significantly longer than the one in AI Chat, but then you compare it to AI RPG where the raw input is even shorter, and the decay happens even in the fifth input and sticks forever. It’s hard to tell, and most of the times it’s actually due to what is being “played” as the moment, as like with the old LLM, some topics and write-styles were easier than others.

Just for personal experience, personally the current model “peaked” two times: right after release when the “ultra violencia” mode was patched two months ago, and then yesterday, but it could have been the luck of the draw too, so it could be that the waters are still being tested to know how to lead the model in a proper way without falling into its pitfalls. But hey! At least we know that the project is not being abandoned, and that some stuff that we thought (at least me personally) was impossible, may be actually possible!

Also, something that most people don’t realize, is how hard is to debug this, because while I keep referencing numbers of log sizes and all, I don’t know the rest of the people that use this service, but due to time and since I treat this just as a game and not in any sort of “professional” usage, the most I can produce a day is just 30kB, 70kB if I’m lucky and locked in playing a run, so imagine how rough it would be to the dev to try going past 1Mb in different scenarios while maintaining the site and trying to wrangle the LLM. Personally, I wouldn’t even try! 🤣

I know that many of the people complaining on the new model latch on it being unable to run “comfort scenarios” which… in some runs I had absolutely no problem! (Except of course the issue of repetition and running in circles, which is still universal) So what I think would be an excellent exercise, as well as a proper debug tool to know when and how things break with the current LLM is to try different runs in different topics and check what conditions in particular make things break and when (with when I mean after what input, or log size), since I have the feeling that as now, the LLM breaks faster in certain contexts and decides to stay focused and creative with one particular style, that could point to bias in the training (BTW, is not the violent ones, I tried and those break like paper very quick).

But overall, posts and threads like this do aid a lot. Input, positive or negative, is always good so long it is supported and not just “all is perfect, lol” or “all is crap, lmao”. Otherwise, how to know what is working or not? 😅