Veo 3.1 Lite vs Fast: what you actually lose at a third of the price
Written by Joseph Nordqvist/
5 min read- 1.Lite costs 67% less than Fast at 720p ($0.40 vs $1.20 per 8s clip)
- 2.Quality gap is small on controlled scenes (product, dialogue, abstract) but large on complex motion
- 3.Both models include audio on every generation, audio is not a Lite tradeoff
- 4.Lite generates 35% faster than Fast on average (42s vs 65s)
- 5.Fast price drops April 7, narrowing Lite advantage from 67% to 50% at 720p
- 6.All 23 test videos are playable in-article for direct comparison

Google launched Veo 3.1 Lite today, their most cost-effective video generation model. At $0.05 per second for 720p output, an 8-second clip costs $0.40. The same clip on Veo 3.1 Fast costs $1.20. That is a 67% price reduction with what Google claims is "the same speed."
I ran the same 7 prompts through both models via the Gemini API to find out what the price difference actually buys. Every generation was 720p, 8 seconds, with identical prompt text. The results were more uneven than I expected.
The short version
Lite is not a uniform downgrade. On controlled, low-motion scenes (a rotating product, a woman at a window, ink in water) the quality gap is small. On complex motion, spatial depth, and physics (a barista pouring latte art, a falcon diving) the gap is dramatic.
Where Lite holds up
Dialogue scenes. A close-up of a woman turning at a rainy window, speaking a line of dialogue. Both models produce coherent lip movement, appropriate rain texture, and matching ambient audio. Lite's skin detail is slightly softer, but you would not pick it as the cheaper model in a blind test.
Product shots. A matte-black headphone rotating on a white surface. Lite handles the clean studio setup well because the scene is physically constrained: one object, one motion, controlled lighting. The reflection detail is slightly less precise on Lite, but the output (in my opinion) seems commercially usable.
Abstract and stylized. Ink drops swirling in water with fractal patterns. Both models generate compelling macro footage. The colour palette and temporal pacing are close. This makes sense: abstract content has no "correct" physics to violate, so the model's reduced capacity does not have an obvious failure mode.
Where Lite breaks down
Human motion and physics. I prompted a barista pouring steamed milk into a latte. Lite started the pour correctly, two hands (one holding the cup and one pouring the steamed milk) a pitcher, and steamed milk flowing. However, in the final frames the cup began floating above the counter and the barista had a third hand pouring from a second pitcher. Fast produced a completely different interpretation: a cinematic beauty shot of the finished latte with bokeh, skipping the pour mechanics entirely. Note: neither model correctly created the rosetta pattern of the milk as it was being poured (the pattern was there before beforing).
Fast motion. This was the largest gap. I prompted a peregrine falcon diving through cloudy sky in slow motion with a camera track. Fast generated an actual diving trajectory with the camera following the descent, feathers rippling convincingly. Lite produced a bird hovering in place with what looked like a radial motion blur applied as a visual shorthand for speed. The dive never happened. The model understood "fast bird + sky + dramatic" but struggled with the temporal arc of a dive sequence.
The image-to-video test
Both Lite and Fast support image-to-video generation. I tested this with a photo of my Bengal cat Alfie, standing on a bed looking directly at the camera. The prompt asked the cat to walk forward, paws stepping carefully, tail swaying.
Audio: not a tradeoff
Every single Lite output included an audio stream. All 13 Lite generations and all 10 Fast generations had AAC audio tracks. Audio quality seemed slightly better on Fast (richer ambient sound) during our tests, but Lite's audio is fully functional and not degraded placeholder audio.
Speed: Lite is actually faster
Google says Lite offers "the same speed" as Fast. In practice, (during testing) Lite was consistently faster to generate. Average generation time across all prompts:
Lite: 42 seconds average (range: 32 to 52 seconds)
Fast: 65 seconds average (range: 48 to 104 seconds)
Lite was 35% faster on average. The outlier was Fast's falcon prompt, which took 104 seconds, nearly double any other generation. This speed advantage is notable because Google's pricing page already positions Lite as the high-volume option. Faster generation means higher throughput for batch workflows.
The pricing picture
The confirmed per-second rates, directly from Google's pricing page:
Lite | Fast (today) | Fast (Apr 7) | Standard | |
|---|---|---|---|---|
720p | $0.05/s | $0.15/s | $0.10/s | $0.40/s |
1080p | $0.08/s | $0.15/s | $0.12/s | $0.40/s |
4K | N/A | $0.35/s | $0.30/s | $0.60/s |
The timing matters. On April 7, Fast's 720p price drops from $0.15 to $0.10 per second. That cuts Lite's current 67% cost advantage to 50%. At 1080p, the gap narrows from 47% to 33%. If you are evaluating Lite for a production pipeline, factor in the April 7 price drop.
Confirmed Lite limitations beyond pricing: no 4K output, no video Extension (clip chaining), and a single video per API call.
When to use Lite
Use Lite for product shots, abstract backgrounds, dialogue scenes, and any application where the camera is relatively static and the scene is physically simple. At $0.40 per 8-second clip, it is viable for high-volume content generation where Fast's $1.20 per clip adds up.
Methodology
23 total generations via the Gemini API using google-genai Python SDK. All text-to-video tests: 720p, 8 seconds, single output per call. Models: veo-3.1-lite-generate-preview and veo-3.1-fast-generate-preview. Image-to-video tested with a single source photograph at 9:16 aspect ratio. Audio presence confirmed via ffprobe stream analysis. Generation times measured programmatically (wall clock, including API polling). Costs calculated from published per-second rates. Total benchmark cost: $17.38.
Editorial Transparency
This article was produced with the assistance of AI tools as part of our editorial workflow. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines
References
- 1.
Build with Veo 3.1 Lite, our most cost-effective video generation model — Alisa Fortin, Guillaume Vernade, Google DeepMind, March 31, 2026
Primary
Was this useful?
More in Products
View all- Claude Code now remembers what it learns between sessionsFebruary 27, 2026
- Google launches Nano Banana 2, bringing pro-level image generation to its Flash modelFebruary 26, 2026
- Anthropic launches Remote Control for Claude Code, enabling mobile accessFebruary 25, 2026
- Claude Code Security, an AI-powered vulnerability scannerFebruary 20, 2026
Related stories
OpenClaw creator Peter Steinberger joins OpenAI as OpenClaw shifts to a foundation
February 15, 2026
ProductsManus adds Project Skills to its AI agent platform
Manus has introduced a feature called Project Skills that lets teams curate and lock sets of reusable AI workflows at the project level.
February 14, 2026
ProductsGoogle Docs adds Gemini-powered audio summaries
Google is rolling out a new Gemini feature in Google Docs that lets you listen to a short audio summary of a document.
February 14, 2026
IndustryAirbnb says AI now handles nearly 30% of English-language support tickets in North America
February 14, 2026