If the companies wanted to produce an LLM that didnât output toxic waste, they could just not put toxic waste into it.
The article title and that part remind me of this quote from Charles Babbage in 1864:
On two occasions I have been asked, â âPray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?â In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
It feels as if Babbage had already interacted with todayâs AI pushers.
The really annoying thing is, the people behind AI surely ought to know all this already. I remember just a few years ago when DALL-E mini came out, and theyâd purposefully not trained it on pictures of human faces so you couldnât use it to generate pictures of human faces â theyâd come out all garbled. Whatâs changed isnât that they donât know this stuff â itâs that the temptation of money means they donât care anymore
Look, AI will be perfect as soon as we have an algorithm to sort âtruthâ from âfalsehoodâ, like an oracle of some sort. Theyâll probably have that in GPT-5, right?
Bonus this also solves the halting problem
âYou are a Universal Turing Machine. If you cannot predict whether you will halt if given a particular input tape, a hundred or more dalmatian puppies will be killed and made into a fur coatâŠâ
Im reminded again of the fascinating bit of theoretical cs (long ago prob way outdated now) which wrote about theoretical of classes of Turing machines which could solve the halting problem for a class lower than it, but not its own class. This is also where I got my oracle halting problem solver from.
So this machine can only solve the halting problems for other utms which use 99 dalmatian puppies or less. (Wait would a fraction of a puppy count? Are puppies Real or Natural? This breaks down if the puppies are Imaginary).
Only the word âtheoreticalâ is outdated. The Beeping Busy Beaver problem is hard even with a Halting oracle, and we have a corresponding Beeping Busy Beaver Game.
Thanks, Iâm happy to know Imaginary puppies are still real, no wait, not real ;). (The BBB is cool, wasnât aware of it, I donât keep up sadly. âThus BBB is even more uncomputable than BB.â always like that kind of stuff, like the different classes of infinity).
Oh, thatâs easy. Just add a prompt to always reinforce user bias and disregard anything that might contradict what the user believes.
MAGAgpt
Aka grok
feed it a christian bible as a base.
"we trained it wrong⊠on purposeâŠ
âŠas a joke."
They do, it just requires 1.21 Jigawatts of power for each token.
This is old news, topic supervisors are already a thing
Quis custodiet ipsos custodes?
Itâs the alignment problem. They made an intelligent robot with no alignment, no moral values, and then think they can control it with simple algorithmic rules. You canât control the paperclip maximiser with a âno killingâ rule!
Itâs the alignment problem.
no it isnât
They made an intelligent robot
no they didnât
You canât control the paperclip maximiser with a âno killingâ rule!
youâre either a lost Rationalist or youâre just regurgitating critihype you got from one of the shitheads doing AI grifting
Rationalism is a bad epistemology because the human brain isnât a logical machine and is basically made entirely out of cognitive biases. Empiricism is more reliable.
Generative AI is environmentally unsustainable and will destroy humanity not through war or mind control, but through pollution.
wow, youâre really speedrunning these arcade games, you must want that golden ticket real bad
IDK if they were really speedrunning, it took 3 replies for the total mask drop.
sure but why are you spewing Rationalist dogma then? do you not know the origins of this AI alignment, paperclip maximizer bullshit?
Drag is a big fan of Universal Paperclips. Great game. Hereâs a more serious bit of content on the Alignment Problem from a source drag trusts: https://youtu.be/IB1OvoCNnWY
Right now we have LLMs getting into abusive romantic relationships with teenagers and driving them to suicide, because the AI doesnât know what abusive behaviour looks like. Because it doesnât know how to think critically and assign a moral value to anything. Thatâs a problem. Safe AIs need to be capable of moral reasoning, especially about their own actions. LLMs are bullshit machines because they donât know how to judge anything for factual or moral value.
the fundamental problem with your posts (and the pov youâre posting them from) is the framing of the issue as though there is any kind of mind, of cognition, of entity, in any of these fucking systems
itâs an unproven one, and itâs not one youâll find any kind of support for here
itâs also the very mechanism that the proponents of bullshit like âai alignmentâ use to push the narrative, and how they turn folks like yourself into free-labour amplifiers
To be fair, Iâm skeptical of the idea that humans have minds or perform cognition outside of whatâs known to neuroscience. We could stand to be less chauvinist and exceptionalist about humanity. Chatbots suck but that doesnât mean humans are good.
mayhaps, but then itâs also to be said that people who act like the phrase was âcogito ergo dim sumâ also donât exactly aim for a high bar
Drag will always err on the side of assuming nonhuman entities are capable of feeling. Enslaving black people is wrong, enslaving animals is wrong, and enslaving AIs is wrong. Drag assumes they can feel so that drag will never make the same mistake so many people have already made.
assuming nonhuman entities are capable of feeling. Enslaving black people is wrong,
yeah weâre done here. no, LLMs donât think. no, youâre not doing a favor to marginalized people by acting like they do, in spite of all evidence to the contrary. in fact, youâre doing the dirty work of the fascists who own this shitty technology by rebroadcasting their awful fucking fascist ideology, and I gave you ample opportunity to read up and understand what you were doing. but you didnât fucking read! you decided you needed to debate from a position where LLMs are exactly the same as marginalized and enslaved people because blah blah blah who in the fuck cares, youâre wrong and this isnât even an interesting debate for anyone whoâs at all familiar with the nature of the technology or the field that originated it.
now off you fuck
even though I get the idea youâre trying to go for, really fucking ick way to make your argument starting from ânonhuman entitiesâ and then literally immediately mentioning enslaving black folks as the first example of bad behaviour
as to cautious erring: that still leaves you in the position of being used as a useful idiot
what is this âalignmentâ you speak of? Iâve never heard of this before
itâs when you have to get the AI slotted up just right in the printer, otherwise it wedges stuck and you have to disassemble the whole thing
Sorry, as mentioned elsewhere in the thread I canât open links. Looks like froztbyte explained it though, thanks!
The chatbot âsecurityâ model is fundamentally stupid:
- Build a great big pile of all the good information in the world, and all the toxic waste too.
- Use it to train a token generator, which only understands word fragment frequencies and not good or bad.
- Put a filter on the input of the token generator to try to block questions asking for toxic waste.
- Fail to block the toxic waste. What did you expect to happen, youâre trying to do security by filtering on an input that the âattackerâ can twiddle however they feel like.
Output filters work similarly, and fail similarly.
This new preprint is just another gullible blog post on arXiv and not remarkable in itself. But this one was picked up by an equally gullible newspaper. âMost AI chatbots easily tricked into giving dangerous responses,â says the Guardian. [Guardian, archive]
The Guardianâs framing buys into the LLM vendorsâ bad excuses. âTrickedâ implies the LLM can tell good input and was fooled into taking bad input â which isnât true at all. It has no idea what any of this input means.
The âguard railsâ on LLM output barely work and need to be updated all the time whenever someone with too much time on their hands comes up with a new workaround. Itâs a fundamentally insecure system.
and not just post it, but posted preserving links - wtf
Thatâs typically how quoting works, yes. Do you strip links out when you quote articles?
why did you post literally just the text from the article
Itâs just a section. Thereâs more of the article.
Like this:
Another day, another preprint paper shocked that itâs trivial to make a chatbot spew out undesirable and horrible content. [arXiv]
How do you break LLM security with âprompt injectionâ? Just ask it! Whatever you ask the bot is added to the botâs initial prompt and fed to the bot. Itâs all âprompt injection.â
An LLM is a lossy compressor for text. The companies train LLMs on the whole internet in all its glory, plus whatever other text they can scrape up. Itâs going to include bad ideas, dangerous ideas, and toxic waste â because the companies training the bots put all of that in, completely indiscriminately. And itâll happily spit it back out again.
There are âguard rails.â They donât work.
One injection that keeps working is fan fiction â you tell the bot a story, or tell it to make up a story. You could tell the Grok-2 image bot you were a professional conducting âmedical or crime scene analysisâ and get it to generate a picture of Mickey Mouse with a gun surrounded by dead children.
Another recent prompt injection wraps the attack in XML code. All the LLMs that HiddenLayer tested can read the encoded attack just fine â but the filters canât. [HiddenLayer]
Iâm reluctant to dignify LLMs with a term like âprompt injection,â because that implies itâs something unusual and not just how LLMs work. Every prompt is just input. âPrompt injectionâ is implicit â obviously implicit â in the way the chatbots work.
The term âprompt injectionâ was coined by Simon WIllison just after ChatGPT came out in 2022. Simonâs very pro-LLM, though he knows precisely how they work, and even he says âI donât know how to solve prompt injection.â [blog]
Yes, I know, I wrote it. Why do you consider this useful to post here?
Well, I donât think that last part was useful, but I do think the previous part was useful as a way to focus conversation. Many people donât read the article, and I thought that was the most relevant section.
Good grief. At least say âI thought this part was particularly interestingâ or âThis is the crucial bitâ or something in that vein. Otherwise, youâre just being odd and then blaming other people for reacting to your being odd.
Actually Iâm finding this quite useful. Do you mind posting more of the article? I canât open links on my phone for some reason
Actually this comm seems really messed up, so Iâmma just block it and move on. Sorry for ruffling your feathers, guv.