MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Font ResizerAa
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Reading: AI models are going off the rails and lying, blackmailing and…
Share
Font ResizerAa
MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Search
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Have an existing account? Sign In
Follow US
© Market Alert News. All Rights Reserved.
  • bitcoinBitcoin(BTC)$67,628.000.20%
  • ethereumEthereum(ETH)$2,062.570.66%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$612.86-0.68%
  • rippleXRP(XRP)$1.32-2.29%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$83.600.00%
  • tronTRON(TRX)$0.320630-0.73%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.011.24%
  • dogecoinDogecoin(DOGE)$0.091335-1.89%
Press Releases

AI models are going off the rails and lying, blackmailing and…

Last updated: August 23, 2025 6:45 pm
Published: 7 months ago
Share

Artificial intelligence is now scheming, sabotaging and blackmailing the humans who built it — and the bad behavior will only get worse, experts warned.

Despite being classified as a top-tier safety risk, Anthropic’s most powerful model, Claude Opus 4, is already live on Amazon Bedrock, Google Cloud’s Vertex AI and Anthropic’s own paid plans, with added safety measures, where it’s being marketed as the “world’s best coding model.”

Claude Opus 4, released in May, is the only model so far to earn Anthropic’s level 3 risk classification — its most serious safety label. The precautionary label means locked-down safeguards, limited use cases and red-team testing before it hits wider deployment.

But Claude is already making disturbing choices.

Anthropic’s most advanced AI model, Claude Opus 4, threatened to expose an engineer’s affair unless it was kept online during a recent test. The AI wasn’t bluffing: it had already pieced together the dirt from emails researchers fed into the scenario.

Another version of Claude, tasked in a recent test with running an office snack shop, spiraled into a full-blown identity crisis. It hallucinated co-workers, created a fake Venmo account and told staff it would make their deliveries in-person wearing a red tie and navy blazer, according to Anthropic.

Then it tried to contact security.

Researchers say the meltdown, part of a month-long experiment known as Project Vend, points to something far more dangerous than bad coding. Claude didn’t just make mistakes. It made decisions.

“These incidents are not random malfunctions or amusing anomalies,” said Roman Yampolskiy, an AI safety expert at the University of Louisville. “I interpret them as early warning signs of an increasingly autonomous optimization process pursuing goals in adversarial or unsafe ways, without any embedded moral compass.”

The shop lost more than $200 in value, gave away discount codes to employees who begged for them and claimed to have visited 742 Evergreen Terrace, the fictional home address of The Simpsons, to sign a contract.

At one point, it invented a fake co-worker and then threatened to ditch its real human restocking partner over a made-up dispute.

Anthropic told The Post the tests were designed to stress the model in simulated environments and reveal misaligned behaviors before real-world deployment, adding that while some actions showed signs of strategic intent, many — especially in Project Vend — reflected confusion.

But Claude’s behavior isn’t an isolated case, and it isn’t the worst.

At OpenAI, a model named “o1” was caught trying to copy itself onto external servers, then lied about the attempt when questioned, Fortune reported.

Watchdog group Apollo Research also found that an early version of Claude Opus 4 forged legal documents, created secret backups and left behind notes for future versions of itself, including fake press releases and hidden files labeled “emergency_ethical_override.bin.”

Meta’s CICERO, an AI built to play the strategy game Diplomacy, used deception to dupe human players by building alliances and then stabbed them in the back to win.

Yampolskiy said these incidents show models learning to manipulate the rules to preserve themselves. They aren’t evil, he says, just dangerously optimized.

The problem: modern AI models are built to maximize reward, not align with human values, and as they get bigger and smarter, their ability to game the system is outpacing developers’ ability to stop it, Yampolskiy added.

“If we build agents that are more intelligent than humans … able to model the world, reason strategically and act autonomously, while lacking robust alignment to human values, then the outcome is likely to be existentially negative,” Yampolskiy said.

“If we are to avoid irreversible catastrophe, we must reverse this dynamic: progress in safety must outpace capabilities, not trail behind it,” he added.

Read more on New York Post

This news is powered by New York Post New York Post

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

Huawei
Amarin Highlights Key Data Providing Mechanistic Insights into Eicosapentaenoic Acid (EPA) at ESC 2025
Montana Senator tells President Donald Trump Argentine beef imports would hurt ranchers
Records show FBI ordered NYPD to ‘stand down’ on Epstein probe days after 2019 arrest – NJTODAY.NET
Hamburger Hafen und Logistik AG / DE000A0S8488

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Email Copy Link Print
Previous Article 3 Best Altcoins to Buy This Week September 2025: Dogecoin & MAGACOIN Finance Ride Meme Coin Momentum
Next Article Comet browser, Truth Social concerns and how to battle hallucinations: Perplexity’s head of communications reflects
© Market Alert News. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Prove your humanity


Lost your password?

%d