MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Font ResizerAa
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Reading: OpenAI Drops EVMbench After Claude Vibe Code Disaster
Share
Font ResizerAa
MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Search
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Have an existing account? Sign In
Follow US
© Market Alert News. All Rights Reserved.
  • bitcoinBitcoin(BTC)$68,087.000.91%
  • ethereumEthereum(ETH)$1,977.781.82%
  • tetherTether(USDT)$1.000.02%
  • rippleXRP(XRP)$1.442.60%
  • binancecoinBNB(BNB)$629.963.42%
  • usd-coinUSDC(USDC)$1.000.01%
  • solanaSolana(SOL)$85.332.11%
  • tronTRON(TRX)$0.2860030.39%
  • dogecoinDogecoin(DOGE)$0.0999611.34%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.040.64%
DeFi

OpenAI Drops EVMbench After Claude Vibe Code Disaster

Last updated: February 20, 2026 12:55 am
Published: 2 days ago
Share

OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit.

Smart contracts protect over $100 billion in open-source crypto assets. That number alone should explain why OpenAI’s latest move is drawing serious attention. The company, working alongside crypto investment firm Paradigm, rolled out EVMbench, a benchmark designed to test how well AI agents detect, exploit, and patch high-severity smart contract vulnerabilities.

The benchmark draws from 120 curated vulnerabilities pulled across 40 audits. Most of those came from open code audit competitions. What makes it different is the scope. EVMbench tests three distinct capability modes: detect, patch, and exploit, each measured separately and graded through a Rust-based harness that replays transactions in a sandboxed local environment. No live networks involved.

You might also like: Claude-Generated Code Linked to $1.78M DeFi Hack

In exploit mode, GPT-5.3-Codex via Codex CLI scored 72.2%. Six months back, GPT-5 sat at 31.9% on the same metric. That gap is not small. OpenAI confirmed the figures in its official announcement on X, framing EVMbench as both a measurement tool and a call to action for the security community.

Detect and patch scores remain lower. Agents in the detection setting sometimes identify a single vulnerability and then stop. They do not exhaust the codebase. In patch mode, the challenge is preserving full contract functionality while removing the flaw. That balance is still giving models trouble.

Must read: Trust Wallet Security Hack: How to Safeguard Your Assets

The backdrop to all of this matters. Security researcher evilcos flagged on X that the DeFi lending protocol Moonwell suffered a loss of approximately $1.78 million. The cause was an Oracle configuration error. A price feed formula was written incorrectly, setting cbETH’s value at $1.12 instead of approximately $2,200.

That is a low-level mistake. The kind of careful audit should catch. The GitHub pull request for proposal MIP-X43 showed commits co-authored by Claude Opus 4.6. Anthropic’s latest and most capable model at the time.

Smart contract auditor pashov posted on ,X calling it possibly the first exploit tied to vibe-coded Solidity. He was careful to note that human reviewers still hold final responsibility. A security auditor signs off before anything goes on-chain. But something in that chain broke down.

The benchmark includes vulnerability scenarios from the security audit of the Tempo blockchain, a purpose-built L1 designed for high-throughput stablecoin payments. That extension pushes EVMbench into payment-oriented contract code, an area where OpenAI expects agentic stablecoin activity to grow.

Each exploit task runs in an isolated Anvil instance. Transactions replay deterministically. The grading setup restricts unsafe RPC methods and was red-teamed internally to stop agents from gaming results. Vulnerabilities used are historical and publicly documented.

OpenAI is also committing $10M in API credits to accelerate cyber defense, with priority given to open-source software and critical infrastructure. Its security research agent Aardvark, is expanding into private beta. Free codebase scanning for widely used open-source projects is part of that push.

Pashov’s post on X raised what many in the DeFi space had been avoiding. When AI writes production Solidity code and humans approve it fast, the review layer gets thin. The Moonwell incident showed exactly how thin it can get.

OpenAI acknowledged that cybersecurity is inherently dual-use. Its response is evidence-based. Safety training, automated monitoring, and access controls for advanced capabilities are part of that. But a 72.2% exploit score on a public benchmark is the kind of number that does not stay quiet.

EVMbench’s full task set, tooling, and evaluation code are now public. The goal is to let researchers track AI cyber capabilities as they grow, and build defenses at the same pace. Whether that pace is fast enough is the question nobody has answered yet.

Read more on Live Bitcoin News

This news is powered by Live Bitcoin News Live Bitcoin News

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

INJ Price Prediction: Targeting $4.50 Recovery by March Amid Oversold Conditions
XRP Price Prediction: Canada Approves First Spot XRP ETF – $1,000 XRP Coming?
Solana (SOL) Eyes Rally at $520, Here’s Why It’s Possible – U.Today
Starknet v0.14.0 scheduled for mainnet launch on September 1
HoudiniSwap Launches POINTLESS – The First Private Incentive Program in DeFi

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Email Copy Link Print
Previous Article Bitcoin Mania Or Blow-Off Top? Is This The Last Big Chance Before The Next Crypto Shockwave?
Next Article Lotto Crypto Platforms Explained: Risks vs Rewards
© Market Alert News. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Prove your humanity


Lost your password?

%d