More than 40 Top AI researchers propose to monitor the internal “chain of thought” of chatbots to catch harmful intention before it takes action.
Privacy experts warn that monitoring these AI thinking processes can uncover sensitive user data and create new risks of surveillance or abuse.
Both researchers and critics agree that strict guarantees and transparency are needed to prevent this safety instrument from becoming a privacy threat.
Forty of the world’s best AI researchers have just published a paper with the argument that companies should start reading the thoughts of their AI systems. Not their outputs-hun actual step-by-step reasoning process, the internal monologue that takes place before Chatgpt or Claude gives you an answer.
The proposalcalled Chain of Thought monitoring, is intended to prevent misconduct, even before the model comes up with an answer and companies can help set up scores “in training and implementation decisions,” the researchers claim
But there is a catch that someone should have made a private question in Chatgpt Nervous: if companies can follow AI’s thoughts in the implementation – when the AI deals with users – they can also follow them on something else.
When safety is supervised
“The care is justified”, ” Nic AddamsCEO of the commercial hacking startup 0RCUS, said Decrypt. “A roughbed often contains literal user secrets because the model” thinks “in the same tokens that it takes.”
Everything you type in an AI passes his thought. Health problems, financial problems, confessions – everything can be recorded and analyzed if COT monitoring is not checked correctly.
“History offices with the skeptics,” Addams warned. “Telecom -Measure data after 9/11 and ISP traffic slogs After the 1996 Telecom Act, both ‘for security’ were introduced as later reused for commercial analyzes and subpoenas. The same gravity will draw on COT archives, unless retention is cryptographic and the access is legally limited.”
Career -nomade CEO Patrice Williams-Lindo is also careful with the risks of this approach.
“We have seen this Playbook before. You remember how social media started with ‘Connect Your Friends’ and have become in a surveillance economy? The same potential here,” she said Decrypt.
She predicts a “permission theater” future in which “companies pretend to honor privacy, but buried COT surveillance in 40-page conditions.”
“Without global guardrails, COT logbooks will be used for everything, from advertising targeting to ’employee risk profiling’ in industrial tools. Look in particular in HR technology and productivity AI.”
The technical reality mainly makes this in terms of. LLMs are only capable of advanced, multi-step reasoning when they use COT. As AI becomes more powerful, monitoring becomes both more necessary and more invasive.
Moreover, the existing COT monitorability can be extremely vulnerable.
RL with a higher computer, alternative model architectures, certain forms of process guidance, etc. can all lead to models that cover up their thinking.
Tej KaliandaA designer at Google is not against the proposition, but emphasizes the importance of transparency, so that users can feel at ease what the AI is doing.
“Users do not need a complete model -internals, but they need to know the AI -Chatbot:” This is why you see this, “or” here is what I can no longer say, “she said Decrypt. “Good design can make the black box feel more like a window.”
She added: “In traditional search engines, such as Google Search, users can see the source of each result. They can verify the credibility of the site and take their own decision. That transparency gives users a feeling of desk and trust. With AI chatbots that context often disappears.”
Is there a safe way forward?
In the name of safety, users can cancel users for giving their data for training, but those conditions cannot necessarily apply to the idea of the model – that is an AI output, not controlled by the user – and AI models usually reproduce the information that users give to the right reasoning.
So is there a solution to increase safety without endangering privacy?
Addams suggested guarantees: “Mitigations: traces in memory with zero-day retention, determinist hashing from PII before storage, editors with user side and differential privacy noise on aggregated analyzes.”
But Williams-Lindo remains skeptical. “We need AI who is responsible, not performance – and that means transparency through design, not standard surveillance.”
This is no problem for users at the moment – but it can be if it is not implemented correctly. The same technology that could prevent AI disasters can also change a logged in, analyzed and possibly monetized data point of any chatbot interview.
As Addams warned, note “a infringement that uncovers unprocessed COTs, a public benchmark that evades> 90% despite monitoring, or new EU or Californian statutes that classify COT as protected personal data.”
The researchers ask for guarantees such as data minimization, transparency about log registration and rapid removal of non-flag data. But implementing this should rely on the same companies that control monitoring.