MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Font ResizerAa
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Reading: China’s $9 AI Video Tool Kling 2.1 Adds Audio — Can It Beat Google’s $250 Veo 3? – Decrypt
Share
Font ResizerAa
MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Search
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Have an existing account? Sign In
Follow US
© Market Alert News. All Rights Reserved.
  • bitcoinBitcoin(BTC)$75,979.001.77%
  • ethereumEthereum(ETH)$2,323.921.74%
  • tetherTether(USDT)$1.000.00%
  • rippleXRP(XRP)$1.430.95%
  • binancecoinBNB(BNB)$630.481.37%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$85.850.92%
  • tronTRON(TRX)$0.328468-0.39%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.031.37%
  • dogecoinDogecoin(DOGE)$0.0957091.71%
Interviews

China’s $9 AI Video Tool Kling 2.1 Adds Audio — Can It Beat Google’s $250 Veo 3? – Decrypt

Last updated: June 17, 2025 6:35 am
Published: 10 months ago
Share

We tested both tools head-to-head: Kling shines on pricing and flexibility, but Veo still leads in dialogue and sound design quality.

Chinese short video platform Kuaishou has added an audio generation feature to Kling 2.1, its AI-powered video creation tool, enabling users to produce clips with synchronized sound effects such as footsteps, rainfall, and ambient noise.

The feature, which launched quietly last week, is available in Kling’s image-to-video mode, where users upload a still image and the platform animates it with both motion and audio generated by artificial intelligence.

The timing pits Kling against Google’s Veo 3, which launched with integrated audio capabilities from day one.

Early users on X praised Kling’s seamless audio-visual synchronization, with creator Roberto Nickson calling it “one of the most useful models on the market” for producing generative video content.

The feature is free during initial rollout, accessible through Kling’s website and mobile app.

Kling 2.1 generates 5- to 10-second clips at up to 1080p resolution, utilizing what the company describes as “3D spatiotemporal attention mechanisms” to synchronize sounds with visuals.

The audio tool currently generates sound effects only — no dialogue or music — and produces something similar to Southeast Asian language audio when text is involved — very tonal, and completely unintelligible. But that by itself isn’t enough to crown Google as the undisputed King of generative video.

We tested Kling 2.1’s new audio features against Google’s Veo 3 to see how the upstart stacks up.

The price gap between the two platforms turns out to be massive.

Kling 2.1’s audio feature is only compatible with the standard version, not the higher-end Master edition. However, at current rates, users can generate more than 20 videos on Kling for every single Veo 3 creation.

For example, using Freepik’s credit system, one generation with Google Veo 3 is currently on sale for 4,000 credits (with the normal price being 8,000 credits per video), whereas Kling 2.1 costs 300 credits per video.

Google’s model runs exclusively through its $250-per-month Ultra subscription. Kling is available on its official site, offering some free generations, with subscriptions starting at around $9 per month.

Even with Google’s current promotional pricing, Veo 3 remains ten times more expensive than Kling.

For creators who know video generation involves plenty of trial and error, with failure rates that frustrate even patient users, Kling’s economics make experimentation feasible.

The Premium plan on Kling unlocks 1080p resolution, improving overall video quality while still maintaining the cost advantage.

But you get what you pay for. Veo 3 offers sophisticated sound generation, accurately synthesizing speech and matching complex audio elements to visual scenes.

Its understanding of spatial audio and contextual sounds surpassed Kling’s offerings by a wide margin.

While Kling 2.1 can’t compete, in fairness, it aimed at something different: ambient sounds and background effects — no dialogue, no music. So forget about those viral AI street interviews for now. Attempts to generate audio produce speech gibberish.

Yet for scenes or videos requiring atmospheric audio, its results were serviceable.

The platform’s new ability to add effects to existing silent videos gives it an edge that Veo 3 couldn’t match.

Users can upload finished videos and retrofit them with appropriate soundscapes, a workflow that Google’s model doesn’t support. Weirdly, Veo can create videos, but it can’t edit them.

Besides the ability to create sounds for silent videos, Kling also offers a lip-syncing feature.

Users can upload a photo and a speech or dialogue separately, and the model will make a video in which the subjects interact naturally, as if they were speaking to each other according to the uploaded audio.

The twenty-to-one generation ratio meant creators can experiment with different audio approaches on Kling while Veo 3 users have to nail their sound design in fewer attempts.

For hobbyists and those learning generative video, Kling’s approach offers more room for trial and error.

But professional creators needing precise audio-visual synchronization and dialogue will find Veo 3’s sophisticated sound engine worth the premium.

Video quality testing produced unexpected results. In a test scene featuring a woman fleeing from a giant spider, Kling 2.1’s standard version outperformed both Veo 3 and its own Master edition.

The standard model accurately represented the scene dynamics, exhibiting fluid motion and proper directional movement. Veo 3 inexplicably generated the woman running toward the spider instead of away from it.

The Master edition typically produces sharper, crisper visuals, but the standard version demonstrated superior scene comprehension and more fluid movement.

This is odd since higher resolution should always translate to better results, but maybe the problem boiled down to prompt technique issues or simply bad luck in the generation.

That said, Kling 2.1 standard with 1080p generations is a great model that holds its own against Google Veo 3 here.

Platform limitations shape each tool’s workflow differently. Kling 2.1’s audio feature works only with image-to-video generation, not text-to-video, which remains exclusive to the Master edition without audio support — yes, this is odd, but it is what it is.

The best workaround is using Kolors, Kuaishou’s image generator, to create starting frames before converting them to video with synchronized audio. Kolors produces highly realistic images that serve as excellent starting points for video generation.

However, you might find that models including Reve, MidJourney, Recraft, Flux, and even ChatGPT are easier to prompt.

Veo 3 took the opposite approach, offering only text-to-video generation without any image-to-video option.

This forces users to rely entirely on prompt engineering, with no way to control the starting visual.

Google’s decision also seems particularly odd given that the previous Veo 2 does actually support image-to-video through its separate Flow platform.

The lack of visual control means users have to generate videos blindly, hoping their text prompts will produce the desired starting frames.

Content moderation revealed contrasting philosophies. Veo 3 employs aggressive keyword filtering and post-generation checks, blocking content that violates Google’s policies.

The system flags potentially problematic prompts before generation and analyzes completed videos for policy violations.

Kling applies more liberal restrictions, allowing content that Veo will block outright.

However, the model’s training data naturally excluded explicit content — the model generates figures without anatomical details and violence without gore.

So, users can generate certain types of content that bypass keyword filters while still maintaining safety boundaries.

Both platforms refund credits when post-generation censorship blocks a video, but Kling’s lighter touch allows more creative freedom within boundaries.

Veo 3 might still be the king, but Kling 2.1 is definitely close to a populist on a mission to overthrow the monarchy.

Its audio feature is pretty revolutionary when you consider it’s a $9 tool competing against a $250 subscription.

The atmospheric sounds work, the rain sounds like rain, footsteps match the movement most of the time, and you can generate twenty attempts while Veo users carefully craft their single shot.

That retrofit feature, where you add sound to finished videos, is something Google doesn’t offer, and it’s genuinely useful for salvaging silent clips.

Things will look completely different if your primary goal is speech. Kling’s gibberish won’t fool anyone.

For this kind of specific requirement, Google Veo 3 is the obvious and only choice. The king is (almost) dead. Long live the Kling!

Read more on Decrypt

This news is powered by Decrypt Decrypt

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

DC Sniper’s trail led through Louisiana. His ex-wife tells story of abuse in new documentary.
Harry Dunn review ‘will not scrutinise actions of US government’
In a crowded federal contracting space, Hive Group bets on innovation
The Essence of an Internet BF: The Beauty Staples that Would (Probably) Spill Out of Our Fave Celebs’ Bags
Cyclone Ditwah Alert: IMD Confirms Depression Over Southwest Bay of Bengal — Latest Updates

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Email Copy Link Print
Previous Article ‘F1’ Diretor Explains Why He Cut ‘Bridgerton’ Star Simone Ashley From the Movie
Next Article Ron Taylor cause of death: How the former member of Mets’ 1969 World Series team passed away? | MLB News – Times of India
© Market Alert News. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Prove your humanity


Lost your password?

%d