MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Font ResizerAa
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Reading: NVIDIA CCCL 3.1 Adds Floating-Point Determinism Controls for GPU Computing
Share
Font ResizerAa
MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Search
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Have an existing account? Sign In
Follow US
© Market Alert News. All Rights Reserved.
  • bitcoinBitcoin(BTC)$71,260.003.27%
  • ethereumEthereum(ETH)$2,081.132.49%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$649.211.50%
  • rippleXRP(XRP)$1.434.43%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$88.423.76%
  • tronTRON(TRX)$0.284405-0.39%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.04-0.83%
  • dogecoinDogecoin(DOGE)$0.0996519.20%
Blockchain

NVIDIA CCCL 3.1 Adds Floating-Point Determinism Controls for GPU Computing

Last updated: March 5, 2026 11:35 pm
Published: 5 days ago
Share

NVIDIA has rolled out determinism controls in CUDA Core Compute Libraries (CCCL) 3.1, addressing a persistent headache in parallel GPU computing: getting identical results from floating-point operations across multiple runs and different hardware.

The update introduces three configurable determinism levels through CUB’s new single-phase API, giving developers explicit control over the reproducibility-versus-performance tradeoff that’s plagued GPU applications for years.

Here’s the problem: floating-point addition isn’t strictly associative. Due to rounding at finite precision, (a + b) + c doesn’t always equal a + (b + c). When parallel threads combine values in unpredictable orders, you get slightly different results each run. For many applications — financial modeling, scientific simulations, blockchain computations, machine learning training — this inconsistency creates real problems.

The new API lets developers specify exactly how much reproducibility they need through three modes:

Not-guaranteed determinism prioritizes raw speed. It uses atomic operations that execute in whatever order threads happen to run, completing reductions in a single kernel launch. Results may vary slightly between runs, but for applications where approximate answers suffice, the performance gains are substantial — particularly on smaller input arrays where kernel launch overhead dominates.

Run-to-run determinism (the default) guarantees identical outputs when using the same input, kernel configuration, and GPU. NVIDIA achieves this by structuring reductions as fixed hierarchical trees rather than relying on atomics. Elements combine within threads first, then across warps via shuffle instructions, then across blocks using shared memory, with a second kernel aggregating final results.

GPU-to-GPU determinism provides the strictest reproducibility, ensuring identical results across different NVIDIA GPUs. The implementation uses a Reproducible Floating-point Accumulator (RFA) that groups input values into fixed exponent ranges — defaulting to three bins — to counter non-associativity issues that arise when adding numbers with different magnitudes.

NVIDIA’s benchmarks on H200 GPUs quantify the cost of reproducibility. GPU-to-GPU determinism increases execution time by 20% to 30% for large problem sizes compared to the relaxed mode. Run-to-run determinism sits between the two extremes.

The three-bin RFA configuration offers what NVIDIA calls an “optimal default” balancing accuracy and speed. More bins improve numerical precision but add intermediate summations that slow execution.

Developers access the new controls through , which constructs an execution environment object passed to reduction functions. The syntax is straightforward — set determinism to , , or depending on requirements.

The feature only works with CUB’s single-phase API; the older two-phase API doesn’t accept execution environments.

Cross-platform floating-point reproducibility has been a known challenge in high-performance computing and blockchain applications, where different compilers, optimization flags, and hardware architectures can produce divergent results from mathematically identical operations. NVIDIA’s approach of explicitly exposing determinism as a configurable parameter rather than hiding implementation details represents a pragmatic solution.

The company plans to extend determinism controls beyond reductions to additional parallel primitives. Developers can track progress and request specific algorithms through NVIDIA’s GitHub repository, where an open issue tracks the expanded determinism roadmap.

Read more on blockchain.news

This news is powered by blockchain.news blockchain.news

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

PING Token Rebounds: x402 Protocol Surpasses $31 Million Market Cap
Trump’s Proposed AI Regulation Order Under Scrutiny
New Aadhaar App Rolls Out For Android And iOS: Key Features Explained
North Korea-linked hackers target Naver, Google ads to spread malware
Figure Unveils Blockchain-Based Network for Issuing and Trading Tokenized Public Stocks – TokenPost

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Email Copy Link Print
Previous Article Arbitrum Welcomes Simcluster in a Push to Advance AI‑Driven Onchain Innovation – Crypto Economy
Next Article SoFi taps BitGo to support distribution of its SoFiUSD stablecoin
© Market Alert News. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Prove your humanity


Lost your password?

%d