Veo 3.1 Lite vs Fast: what you actually lose at a third of the price

Written by Joseph Nordqvist/

5 min read
  • 1.Lite costs 67% less than Fast at 720p ($0.40 vs $1.20 per 8s clip)
  • 2.Quality gap is small on controlled scenes (product, dialogue, abstract) but large on complex motion
  • 3.Both models include audio on every generation, audio is not a Lite tradeoff
  • 4.Lite generates 35% faster than Fast on average (42s vs 65s)
  • 5.Fast price drops April 7, narrowing Lite advantage from 67% to 50% at 720p
  • 6.All 23 test videos are playable in-article for direct comparison
Side-by-side frame comparison of Veo 3.1 Lite versus Fast video generation showing a barista pouring latte art
Google DeepMind / AI News Home benchmark

Google launched Veo 3.1 Lite today, their most cost-effective video generation model. At $0.05 per second for 720p output, an 8-second clip costs $0.40. The same clip on Veo 3.1 Fast costs $1.20. That is a 67% price reduction with what Google claims is "the same speed."

I ran the same 7 prompts through both models via the Gemini API to find out what the price difference actually buys. Every generation was 720p, 8 seconds, with identical prompt text. The results were more uneven than I expected.

The short version

Lite is not a uniform downgrade. On controlled, low-motion scenes (a rotating product, a woman at a window, ink in water) the quality gap is small. On complex motion, spatial depth, and physics (a barista pouring latte art, a falcon diving) the gap is dramatic.

Where Lite holds up

Dialogue scenes. A close-up of a woman turning at a rainy window, speaking a line of dialogue. Both models produce coherent lip movement, appropriate rain texture, and matching ambient audio. Lite's skin detail is slightly softer, but you would not pick it as the cheaper model in a blind test.

Product shots. A matte-black headphone rotating on a white surface. Lite handles the clean studio setup well because the scene is physically constrained: one object, one motion, controlled lighting. The reflection detail is slightly less precise on Lite, but the output (in my opinion) seems commercially usable.

Abstract and stylized. Ink drops swirling in water with fractal patterns. Both models generate compelling macro footage. The colour palette and temporal pacing are close. This makes sense: abstract content has no "correct" physics to violate, so the model's reduced capacity does not have an obvious failure mode.

Where Lite breaks down

Human motion and physics. I prompted a barista pouring steamed milk into a latte. Lite started the pour correctly, two hands (one holding the cup and one pouring the steamed milk) a pitcher, and steamed milk flowing. However, in the final frames the cup began floating above the counter and the barista had a third hand pouring from a second pitcher. Fast produced a completely different interpretation: a cinematic beauty shot of the finished latte with bokeh, skipping the pour mechanics entirely. Note: neither model correctly created the rosetta pattern of the milk as it was being poured (the pattern was there before beforing).

Fast motion. This was the largest gap. I prompted a peregrine falcon diving through cloudy sky in slow motion with a camera track. Fast generated an actual diving trajectory with the camera following the descent, feathers rippling convincingly. Lite produced a bird hovering in place with what looked like a radial motion blur applied as a visual shorthand for speed. The dive never happened. The model understood "fast bird + sky + dramatic" but struggled with the temporal arc of a dive sequence.

The image-to-video test

Both Lite and Fast support image-to-video generation. I tested this with a photo of my Bengal cat Alfie, standing on a bed looking directly at the camera. The prompt asked the cat to walk forward, paws stepping carefully, tail swaying.

Interactive visualization: VeoBenchmark

Audio: not a tradeoff

Every single Lite output included an audio stream. All 13 Lite generations and all 10 Fast generations had AAC audio tracks. Audio quality seemed slightly better on Fast (richer ambient sound) during our tests, but Lite's audio is fully functional and not degraded placeholder audio.

Speed: Lite is actually faster

Google says Lite offers "the same speed" as Fast. In practice, (during testing) Lite was consistently faster to generate. Average generation time across all prompts:

  • Lite: 42 seconds average (range: 32 to 52 seconds)

  • Fast: 65 seconds average (range: 48 to 104 seconds)

Lite was 35% faster on average. The outlier was Fast's falcon prompt, which took 104 seconds, nearly double any other generation. This speed advantage is notable because Google's pricing page already positions Lite as the high-volume option. Faster generation means higher throughput for batch workflows.

The pricing picture

The confirmed per-second rates, directly from Google's pricing page:

Lite

Fast (today)

Fast (Apr 7)

Standard

720p

$0.05/s

$0.15/s

$0.10/s

$0.40/s

1080p

$0.08/s

$0.15/s

$0.12/s

$0.40/s

4K

N/A

$0.35/s

$0.30/s

$0.60/s

The timing matters. On April 7, Fast's 720p price drops from $0.15 to $0.10 per second. That cuts Lite's current 67% cost advantage to 50%. At 1080p, the gap narrows from 47% to 33%. If you are evaluating Lite for a production pipeline, factor in the April 7 price drop.

Confirmed Lite limitations beyond pricing: no 4K output, no video Extension (clip chaining), and a single video per API call.

When to use Lite

Use Lite for product shots, abstract backgrounds, dialogue scenes, and any application where the camera is relatively static and the scene is physically simple. At $0.40 per 8-second clip, it is viable for high-volume content generation where Fast's $1.20 per clip adds up.

Methodology

23 total generations via the Gemini API using google-genai Python SDK. All text-to-video tests: 720p, 8 seconds, single output per call. Models: veo-3.1-lite-generate-preview and veo-3.1-fast-generate-preview. Image-to-video tested with a single source photograph at 9:16 aspect ratio. Audio presence confirmed via ffprobe stream analysis. Generation times measured programmatically (wall clock, including API polling). Costs calculated from published per-second rates. Total benchmark cost: $17.38.

Joseph Nordqvist

Written by

Joseph Nordqvist

Founder & Editor-in-Chief at AI News Home

View all articles →

Editorial Transparency

This article was produced with the assistance of AI tools as part of our editorial workflow. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines

References

  1. 1.
    Build with Veo 3.1 Lite, our most cost-effective video generation modelAlisa Fortin, Guillaume Vernade, Google DeepMind, March 31, 2026
    Primary

Was this useful?