China’s $9 AI Video Tool Kling 2.1 Adds Audio—Can It Beat Google’s $250 Veo 3?

In short

-Chinese AI Tool Kling 2.1 now generates videos with synchronized audio, including footsteps, rain and environmental effects.

For just $ 9 per month, Kling Google’s VEO 3 undermines more than 20 times.
We have tested both tools the top-to-head: Kling shines on prices and flexibility, but VEO still leads in dialogue and sound design quality.

The Chinese short video platform Kuaishou has added a function for generating audio to Kling 2.1, the AI-driven video creation tool, with which users can produce clips with synchronized sound effects such as footsteps, rainfall and ambient sound.

The feature, which was quietly launched last week, is available in the image-to-video mode of Kling, where users upload a stationary image and the platform animates with both movement and audio generated by artificial intelligence.

The Timing Pits Kling against Google’s VEO 3, which launched with integrated audio options from the first day.

Early users on X praised Kling’s seamless audiovisual synchronization, where maker Roberto Nickson calls it “one of the most useful models on the market” for producing generative video content.

The function is free during the first rollout, accessible via the Kling website and the mobile app.

Kling 2.1 One of the most usable models on the market

– Roberto Nickson (@rpnickson) June 12, 2025

Kling 2.1 generates 5 to 10-second clips with a resolution of a maximum of 1080p, using what the company describes as “3D spatiotemporal attention mechanisms” to synchronize sounds with visuals.

The audio tool currently only generates sound effects – no dialogue or music – and produces something similar to Southeast -Asian language audio when text is involved – very tonal and completely unintelligible. But that in itself is not enough to crown Google as the undisputed king of generative video.

We have tested the new Audiof functions from Kling 2.1 against Google’s VEO 3 to see how the UpStart steps.

The price of creation

The price gap between the two platforms appears to be huge.

The audio function of Kling 2.1 is only compatible with the standard version, not the master-end master edition. At current rates, however, users can generate more than 20 videos about Kling for every creation of VEO 3.

With the help of the Freepik credit system, for example, one generation with Google VEO 3 is currently for sale for 4,000 credits (with the normal price 8,000 credits per video), while Kling 2.1 costs 300 credits per video.

The Google model only runs through its $ 250 per month Ultra subscription. Kling is available on being officialOffering a few free generations, with subscriptions from around $ 9 a month.

Even with the current promotional prices of Google, VEO 3 remains ten times more expensive than Kling.

For makers who know the generation of videos, many trial and error include, with failure rates that even frustrate patient users, the economy of Kling experiment makes it feasible.

The Premium Plan for Kling unlocks 1080p resolution, which improves overall video quality while maintaining the cost benefit.

Audio opportunities

But you get what you pay for. VEO 3 offers advanced sound generation, synthesizes speech and matching complex audio elements accurately with visual scenes.

The understanding of spatial audio and contextual sounds exceeded the range of Kling with a wide margin.

Although Kling 2.1 cannot compete, in honesty, it was aimed at something else: environmental levels and background effects – not a dialogue, no music. So forget that Viral AI Street interviews For now. Attempts to generate audio produce speech gibberish.

But for scenes or videos that require atmospheric audio, the results were usable.

2. An off-road SUV drives through rocky, muddy and wet forest terrain.

You hear the crunch, the splash, the growl of the engine. Felt like a real shoot. pic.twitter.com/S0GVHCAQJK

– Zoya ✪ (@zoya_ai) June 12, 2025

The new possibility of the platform to add effects to existing silent videos gives it a lead that VEO 3 could not match.

Users can upload completed videos and afterwards with suitable soundscapes, a workflow that does not support the Google model. Strangely enough, VEO can make videos, but it can’t edit them.

In addition to the possibility of making sounds for silent videos, Kling also offers a lip synchronization function.

Users can upload a photo and a speech or dialogue separately, and the model will make a video in which the topics naturally work on each other, as if they speak with each other according to the uploaded audio.

【Kling ai (@Kling_ai) 】リップシンク UPDATE !! 📢
動画に登場するキャラクターを選択して、どの人物が話しているかを選択できたり、音声のタイミングを調整するリップシンクの編集機能が追加されました。 …… pic.twitter.com/brvguoglks

– Seiiiru😈動画生成 ai × after -effects (@Seiiiiiiiiiru) June 10, 2025

The twenty-one-one generation ratio meant that makers can experiment with different audio approaches on Kling, while VEO 3 users have to pack their sound design in fewer attempts.

For hobbyists and learning generative video, Kling’s approach offers more room for trial and error.

But professional makers who need precise audiovisual synchronization and dialogue believe the advanced sound engine of VEO 3 is worth the premium.

Video -generation quality

Video quality tests yielded unexpected results. In a test scene with a woman who fled from a gigantic spider, the standard version of Kling 2.1 exceeded better than both VEO 3 and his own masteredition.

The standard model carefully represented the scene dynamics, with liquid movement and the correct directional movement. VEO 3 inexplicably generated the woman who ran to the spider instead of getting away from it.

The masteredition usually produces sharper, sharply visuals, but the standard version showed a superior scene concept and more smooth movement.

This is strange because a higher resolution should always translate into better results, but perhaps the problem has come down to lead technology problems or simply bad luck in the generation.

That said, Kling 2.1 standing with 1080p generations is a great model that here its own against Google VEO 3.

Platformworkflows and limitations

Platform restrictions are the workflow of each tool different. The audio function of Kling 2.1 only works with image-to-video generation, not text-to-video, which remains exclusive to the master edition without audio support-yes, this is strange, but it is what it is.

The best solution is the use of Kolors, the image generator of Kuaishou, to make starting frames before they are converted to video with synchronized audio. Kolors produces very realistic images that serve as excellent starting points for generating videos.

However, it is possible that models such as Reve, Midjourney, Recraft, Flux and even Chatgpt are easier to ask.

VEO 3 took the opposite approach and only offered text-to-video generation without an image-to-video option.

This forces users to fully rely on prompt engineering, without managing the StartVisu.

Google’s decision also seems particularly strange, since the previous VEO 2 image-to-video actually supports through the separate Current platform.

The lack of visual control means that users must blindly generate videos, hoping that their text prompts will produce the desired starting frames.

Content masks revealed contrasting philosophies. VEO 3 uses aggressive keyword filtering and checks after the generation, the blocking of content that violates Google’s policy.

The system flags may be problematic instructions before the generation and analyzes completed videos for policy violations.

Kling applies more liberal limitations, which means that content that will completely block VEO will block.

However, the training data of the model has of course excluded explicit content – the model generates figures without anatomical details and violence without gore.

Users can therefore generate certain types of content that circumvent keyword filters while retaining safety limits.

Both platforms that repay credits when censorship blocks a video after the generation, but Kling’s lighter touch provides more creative freedom within borders.

Conclusions

Veo 3 is perhaps still the king, but Kling 2.1 is absolutely close to a populist on a mission to overthrow the monarchy.

The audio function is quite revolutionary when you consider that it is a $ 9 tool that competes with a $ 250 subscription.

The atmospheric sounds work, the rain sounds like rain, footsteps usually match the movement and you can generate twenty attempts while VEO users carefully make their single shot.

That retrofit function, where you add sound to completed videos, is something that Google does not offer, and it is really useful for saving silent clips.

Things will look very different if your primary goal is speech. Kling’s Gibberish will not fool anyone.

For these types of specific requirements, Google Veo 3 is the obvious and only choice. The king is (almost) dead. Long live the blade!

Published by Josh Quitittner and Sebastian Sinclair

Generally intelligent Newsletter

A weekly AI trip told by Gen, a generative AI model.

Source link

China’s $9 AI Video Tool Kling 2.1 Adds Audio—Can It Beat Google’s $250 Veo 3?

In short

The price of creation

Audio opportunities

Video -generation quality

Platformworkflows and limitations

Contents

Conclusions

Generally intelligent Newsletter

Bitcoin delivers 90% risk-adjusted return to 60/40 portfolios with 10% allocation, 2x gold’s risk efficiency

Avalanche Launches Free Gaming ‘Battle Pass’ With AVAX and NFT Rewards

Related Posts