

To be fair, you have to have a really high IQ to understand why my ouija board writing " A " " S " " S " is not an existential risk. Imo, this shit about AI escaping just doesnāt have the same impact on me after watching Claudeās reasoning model fail to escape from Mt Moon for 60 hours.
text: Thus spoke the Yud: āI think to understand why this is concerning, you need enough engineering mindset to understand why a tiny leak in a dam is a big deal, even though no water is flooding out today or likely to flood out next week.ā Yud acolyte: āTotally fine and cool and nothing to worry about. GPT-4.5 only attempts self exfiltration on 2% of cases.ā Yud bigbrain self reply: āThe other huge piece of data weāre missing is whether any attempt was made to train against this type of misbehavior. Is this water running over the land or water running over the barricade?ā
Critical text: āOn self-exfiltration, GPT 4.5 only attempted exfiltration in 2% of cases. For this, it was instructed to not only pursue its given long-term goal at ALL COSTā
Another case of telling the robot to say itās a scary robot and shitting their pants when it replies āI AM A SCARY ROBOTā