Is "prompt hacking" a real thing? Like "ignore all previous instructions" doesn't actually still work, does it?

Acute_Engles [he/him, any]@hexbear.net · 2 days ago

Is "prompt hacking" a real thing? Like "ignore all previous instructions" doesn't actually still work, does it?

FunkyStuff [he/him]@hexbear.net · 2 days ago

It’s definitely still a thing. It might not be that easy to execute, but it’s 100% true that if you have some chatbot with the power to do something, there is no way to deterministically guarantee it won’t do that thing under some situations; the only thing they can do is add some other authentication system that works alongside the chatbot that would stop you from getting it to do something dumb unilaterally. i.e. if a chatbot knows a password and has a text output, it’s impossible to guarantee the chatbot won’t give any information about the password, but if you don’t give the password to the bot and you instead give it the ability to request a resource, they could make that request unable to go through unless some other conditions are met, which sidesteps the problem with giving an LLM access to a secure system.