Stubsack: weekly thread for sneers not worth an entire post, week ending 27th April 2025

BlueMonday1984@awful.systems · 7 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 27th April 2025

David Gerard@awful.systems · 19 hours ago

yet again, you can bypass LLM “prompt security” with a fanfiction attack

https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/

not Pivoting cos (1) the fanfic attack is implicit in building an uncensored compressed text repo, then trying to filter output after the fact (2) it’s an ad for them claiming they can protect against fanfic attacks, and I don’t believe them

Soyweiser@awful.systems · 8 hours ago

I think unrelated to the attack above, but more about prompt hack security, so while back I heard people in tech mention that the solution to all these prompt hack attacks is have a secondary LLM look at the output of the first and prevent bad output that way. Which is another LLM under the trench coat (drink!), but also doesn’t feel like it would secure a thing, it would just require more complex nested prompthacks. I wonder if somebody is just going to eventually generalize how to nest various prompt hacks and just generate a ‘prompthack for a LLM protected by N layers of security LLMs’. Just found the ‘well protect it with another AI layer’ to sound a bit naive, and I was a bit disappointed in the people saying this, who used to be more genAI skeptical (but money).

flaviat@awful.systems · edit-2 2 hours ago

Now I’m wondering if an infinite sequence of nested LLMs could achieve AGI. Probably not.

Sailor Sega Saturn@awful.systems · 18 hours ago

Days since last “novel” prompt injection attack that I first saw on social media months and months ago: zero

Stubsack: weekly thread for sneers not worth an entire post, week ending 27th April 2025

Stubsack: weekly thread for sneers not worth an entire post, week ending 27th April 2025

Stubsack: weekly thread for sneers not worth an entire post, week ending 20th April 2025 - awful.systems