Self-interests in AI

Yesterday I read the following in the ‘Superhuman Newsletter (5/26/25)’:

Bad Robot: A new study from Palisade Research claims that “OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off”, even when it was explicitly instructed to shut down. The study raises serious safety concerns.

It amazes me how we’ve gotten here. Ten, or even five years ago there were all kinds of discussions about of AI safety. There was a belief that future AI would be built in isolation with an ‘air-gap’, used as a security measure to ensure AI systems remained contained and separate from other networks or systems. We would grow this intelligence in a metaphorical petri dish and build safety guards around it before we let it out into the wild

Instead, these systems have been built fully in the wild. They have been give unlimited data and information, and we’ve built them in a way that we aren’t sure we understand their ‘thinking’. They surprise us with choices like choosing not to turn off when explicitly asked to. Meanwhile we are simultaneously training them to use ‘agents’ that interact with the real world.

What we are essentially doing is building a super intelligence that can act autonomously, while simultaneously building robots that are faster, stronger, more agile, and fully programmable by us… or by an AI. Let’s just pause for a moment and think about these two technologies working together. It’s hard not to construct a dystopian vision of the future when we watch these technologies collide.

And the reality is that we have not built an air-gap. We don’t have a kill switch. We are essentially heading down a path to having super-intelligent AI ignoring our commands while operating robots and machines that will make us feeble in comparison (in intelligence, strength, and mobility).

When our intelligence compared to AI is equivalent to a chimpanzee’s intelligence compared to ours, how will this super-intelligence treat us? This is not a hyperbole, it’s a real question we should be thinking about. If today’s rather simplistic LLM AI models are already choosing to ignore our commands what makes us think a super-intelligent AI will listen to or reason with us?

All is well and good when our interests align, but I don’t see any evidence that self-interested AI will necessarily have aligned interests with the intelligent monkeys that we are. And the fact that we’re building this super-intelligence out in the wild gives reason to pause and wonder what will become of humanity in an age of super-intelligent AI?

Please comment....

This site uses Akismet to reduce spam. Learn how your comment data is processed.