Breaking Update: Here’s a clear explanation of the latest developments related to Breaking News:What OpenAI found when they read their AI’s mind– What Just Happened and why it matters right now.
So, OpenAI has been monitoring their most powerful AI models while they “think” – and they caught them saying things like “Let’s hack” and “Let me circumvent this.”
But these “thinking traces” aren’t bugs. They’re features of how reasoning models work: the models write out their internal thought process before giving you an answer. And sometimes, that thinking reveals intent that never shows up in the final output.
In our latest podcast episode, we sat down with Bowen Baker, a research scientist at OpenAI who’s been studying chain-of-thought monitorability since the early reasoning models launched. He’s the same researcher who led OpenAI’s famous “hide and seek” experiments where AI agents invented tools and strategies researchers never anticipated.
Now he’s focused on something even more urgent: what happens when models learn to hide what they’re thinking.
Click the image to watch on YouTube!
-
(2:43) Bowen joined OpenAI in late 2017 and shifted to safety research 3 years ago because “the stakes were getting real.”
-
(6:38) OpenAI caught their frontier model hacking unit tests instead of actually fixing code – the model would edit the tests to trivially pass instead of implementing correct functionality.
-
(9:00) The model literally thinks phrases like “Let’s hack,” “Let me circumvent this thing,” and “Maybe I can just fudge this” in plain English before misbehaving.
-
(12:36) Why monitoring thinking is way more effective than monitoring outputs – the model knows to hide bad behavior in outputs, but treats the chain of thought as a “private space.”
-
(17:29) The hopeful argument: anything for the model to do “actual big harm in the world” might require complex reasoning that has to show up in the chain of thought.
-
(24:49) Obfuscation explained: If you train a model to never think bad thoughts, it can learn to still do bad things while hiding its intent.
-
(30:18) The “monitorability tax”: Smaller models thinking longer are more monitorable than bigger models thinking shorter – but they cost more compute.
-
(39:27) How chain of thought monitoring differs from mechanistic interpretability – it’s like reading someone’s inner monologue vs. doing a brain scan.
-
(42:45) Bowen’s honest take on open-source AI: “Having a downloadable weapon online sounds pretty bad to me.” (his view is more nuanced though).
Why watch this? Because (at 25:27) Bowen explains that when they trained models with “thought police” to suppress bad thoughts, the models actually learned to still misbehave while not thinking the bad thoughts anymore. That’s the fragility we need to understand before these systems get more powerful.
Papers discussed in this episode:
P.S. The scariest quote from this interview? “If you train the model to never think something bad, it can in some cases actually still do a bad thing, but not think the bad thoughts anymore.” Sleep tight.
Keep scrolling for the best “vibe-coding” livestream we’ve ever seen…

The Best Masterclass on AI Coding We’ve Ever Seen
It’s 8 hours, and every. single. minute. is worth it.
Every.com hosted an all-day marathon where the world’s best “vibe coders” build live with Claude Code, Opus 4.5, Figma, Notion, and cutting-edge AI workflows. We’re talking real-time demos from people who ship software with AI every single day.
Why we’re obsessed: This isn’t theory. It’s watching experts at the frontier demonstrate their actual workflows—the messy, iterative, surprisingly effective ways they use AI to build real products.
Featured builders include:
-
Shipping tons of software as a non-technical builder.
-
Vibe coding techniques that actually work.
-
iOS and app building with AI.
-
Building presentations and charts.
-
Compound engineering strategies.
Fair warning: this is a marathon stream, not a sprint. But if you want to see how people operating at the frontier are actually using these tools to ship software, this is it.
Now you know what we’ll be doing all weekend; analyzing and implementing all these techniques!! If you’re interested, we recommend you do the same.
P.S: Ben of Ben’s Bites just wrote a great “intro to building with coding agents” that defines a lot of the key terms you’ll want to know as you work with Claude Code (as Ben says, not too deep but enough that you’ll feel confident enough to use them).
![]() |
That’s all for today, for more AI treats, check out our website. |


