Glowing server racks with streaming data-flow light trails, representing the GPU compute behind Qtum.ai video generation.
Qtum.ai · Text-to-Video

Four models, one prompt box.

Seedance 1.0, 1.5, 2.0 and HappyHorse — which one to pick for what, how to write prompts that actually direct the camera, and pay-as-you-go billing in QTUM. With rendered examples below.

Photo: Taylor Vick / Unsplash
Live now

🎬 Qtum.ai text-to-video is open to everyone

Four generative video models behind one prompt box. No credit card, no subscription, no token packs — sign in with Google or MetaMask Snap, get 500 free tokens, and pay as you go in QTUM.

Good video models are everywhere. Good results aren't.

Text-to-video crossed the threshold from party trick to production tool sometime in the last eighteen months. The clips are sharper, the physics mostly behave, and a ten-second shot that would have needed a camera crew now needs a sentence.

But anyone who has actually used these tools knows the two places projects go sideways. First: model choice. Every model has a personality — one nails fluid camera moves but melts faces in second twelve, another is cheap and fast but tops out at 720p. Render the same prompt on the wrong model and you pay flagship prices for draft-quality output, or wonder why your "quick test" looks like a feature film trailer you didn't need. Second: the prompt itself. "A dog running on a beach" is not a shot. A shot has a lens, a light source, a camera move, and a mood — and video models respond to that vocabulary far more reliably than most people expect.

This guide covers both. We'll walk through the four models on Qtum.ai and what each one is actually for, then get into prompt mechanics — structure, lens language, lighting terms — with real prompts and the clips they rendered, inline, so you can judge for yourself.

What is Qtum.ai?

Qtum.ai is the video generation arm of the Qtum network — the same ecosystem behind the Qtum AI Router. Where the router unifies text and vision LLMs behind one API key, Qtum.ai puts four generative video models behind one prompt box, aimed squarely at creators and small teams. The pitch:

That last point about payment is worth repeating because it shapes how you use the service: QTUM is the primary method to pay for tokens. Your render budget lives in the same asset as the rest of your Qtum activity — there's no card-on-file, no fiat checkout in the loop, and topping up from a wallet via MetaMask Snap takes seconds. More on the mechanics below.

Meet the models

All four models take a text prompt and return a clip. That's where the similarity ends — each has a distinct sweet spot, and the trick to getting good value out of Qtum.ai is matching the job to the model rather than defaulting to the most expensive one.

Seedance 1.0

Fast drafts

The original Seedance — quick, cheap, and still remarkably capable for short clips. Coherent motion, decent prompt adherence, fast queue times. It won't deliver flagship detail, but it renders in a fraction of the time and cost, which makes it the natural place to iterate on an idea.

Best for: prompt iteration, storyboarding, short social clips, testing ten variations before committing tokens to a final render.

Seedance 1.5

Everyday default

The mid-generation refresh, and for most people the sensible default. Noticeably better prompt adherence than 1.0 — it actually respects your lens and lighting direction — with crisper 1080p output and more stable subjects across the clip. Strong balance of cost, speed and quality.

Best for: day-to-day production work — product clips, social content, B-roll — where quality matters but every render doesn't need to be a hero shot.

Seedance 2.0

Flagship

The current state of the art on the platform. Cinematic detail, convincing physics, complex camera moves (orbits, dolly-ins, crane shots) that hold together, and multi-shot storytelling with scene continuity. Independent reviews consistently place it at the top of its class. It costs the most per second — and earns it.

Best for: final renders, hero shots, anything with faces or water or fast motion, multi-shot sequences where continuity matters.

HappyHorse

Long clips, less spend

Alibaba's video model, and the value pick for longer work. It generates multi-shot compositions up to 15 seconds with solid physics and notably stable camera work, supports reference image uploads, and undercuts Seedance 2.0 on price. Its known trade-off: continuity softens in long clips — faces and fine textures can drift across shots.

Best for: longer multi-shot clips on a budget, animating a reference image, scenes where motion matters more than facial close-ups.

Rule of thumb: draft on Seedance 1.0, produce on 1.5, finish on 2.0 — and reach for HappyHorse when the clip is long, the budget is tight, or you're starting from a still image.

Which model for what — at a glance

Model Best for Max clip Output Tokens / clip
Seedance 1.0 fast Iteration, drafts, short social clips 5 s 720p ~40
Seedance 1.5 balanced Everyday production, product clips, B-roll 10 s 1080p ~90
Seedance 2.0 flagship Final renders, hero shots, multi-shot continuity 15 s 1080p ~220
HappyHorse value Long multi-shot clips, image-to-video, budget renders 15 s 1080p ~150

Token costs scale with clip length and resolution; the figures above are typical for a default render. The live per-model pricing is always visible in the Qtum.ai console next to the generate button — what you see is what gets deducted.

How to write a video prompt that actually directs

Video models are trained on footage that came with descriptions written by people who think in shots — so the closer your prompt reads to a shot description from a treatment or a stock-footage caption, the better the model performs. The single biggest upgrade you can make is to stop describing a thing and start describing a shot.

The anatomy of a shot prompt

A reliable structure, in order:

Subject +Action +Setting +Camera & lens +Lighting +Style & mood

A weathered fisherman in a yellow oilskin hauls a net over the gunwale, spray flying, on a small boat in rough grey-green seas. Handheld 35mm, tracking close on his hands, overcast diffuse light, cold and even. Gritty documentary realism, muted color grade.

Front-load what matters most — models weight the start of the prompt heaviest. Subject and action first, garnish later. And keep it to one scene per prompt: if your prompt contains "and then," you're describing an edit, not a shot. (The exception is multi-shot models — Seedance 2.0 and HappyHorse understand explicit "cut to:" transitions; see the examples below.)

Speak lens. Speak light.

These two vocabularies do more work per word than anything else you can type. You don't need film school — you need about a dozen terms:

📷 Camera & lens terms

wide-angle / 16mm
Big environments, dramatic perspective, landscapes and interiors.
35mm
The documentary look — natural perspective, what your eye expects.
85mm, shallow depth of field
Portrait compression, creamy blurred background, subject isolation.
macro
Extreme close-up — texture, droplets, mechanical detail.
aerial drone shot
High and moving. Pair with "slowly orbiting" or "flying over."
dolly-in / dolly-out
Smooth push toward or pull away from the subject.
tracking shot
Camera travels alongside a moving subject.
handheld
Subtle shake, documentary energy. Omit it and you get tripod-smooth.
slow pan / static shot
Calm, controlled. "Static shot" is underrated — it stops the model inventing camera moves you didn't ask for.

💡 Lighting terms

golden hour
Warm, low-angle sun, long shadows. The most flattering light there is.
blue hour
Just after sunset — cool, moody, city lights starting to glow.
overcast / diffuse
Soft, even, shadowless. Great for faces and product shots.
rim lighting / backlit
Bright edge around the subject, separates them from the background.
volumetric light / god rays
Visible beams through mist, dust or windows. Instant atmosphere.
neon / practical lights
Light from sources inside the scene — signs, lamps, screens. Pairs beautifully with rain.
candlelight / tungsten
Warm, orange, intimate, flickering.
high-key / low-key
Bright and airy versus dark and dramatic — sets the whole emotional register in one word.

Description tips that consistently pay off

A 15-second makeover

✕ Before — describes a thing A cool dog running on the beach at sunset, very beautiful, high quality, 4K, epic.
✓ After — describes a shot A golden retriever sprints along the wet sand at the water's edge, ears flying, kicking up spray. Low tracking shot moving alongside, 85mm with shallow depth of field, golden hour backlight, warm rim light on the fur. Joyful, cinematic, slow motion.

Same dog, same beach. The second prompt tells the model where the camera is, what the light is doing, and how time flows. Quality keywords like "4K" and "epic" do almost nothing — lens and light do almost everything.

Example prompts — rendered on Qtum.ai

Four prompts, every clip below rendered as-is on Qtum.ai with default settings — no cherry-picking. The first shows a single model; the next three run the same prompt through two models side by side, so you can see exactly what you're paying for as you move up the lineup. The exact prompt sits above each set — copy them, remix them, re-render them yourself.

1 · The fast draft — Seedance 1.0

A single subject, tight framing, a few seconds. Exactly the kind of shot you iterate on cheaply before spending flagship tokens.

prompt — seedance-1.0
Macro close-up of an espresso pouring into a glass cup, rich crema swirling and folding. Warm café bokeh in the background, soft morning window light, shallow depth of field. Slow motion, cozy and inviting.
Seedance 1.0 the draft
Quick and cheap — coherent motion and crema detail with a fraction of the wait and cost of the flagship.

2 · The everyday upgrade — Seedance 1.0 vs 1.5

More demanding: a moving subject, a moving camera, practical lighting, and a specific color grade. The same prompt on both models shows what the mid-generation jump to 1.5 actually buys you — tighter prompt adherence and steadier reflections.

prompt — rainy tokyo cyclist
A cyclist in a yellow raincoat rides through a rainy Tokyo street at night, neon signs reflecting in the puddles. Tracking shot moving alongside, 35mm lens, light rain falling through the glow of the signs. Teal-and-magenta cinematic color grade, moody and atmospheric.
Seedance 1.0 the draft
Seedance 1.5 the upgrade

Same prompt, both models. Watch the puddle reflections and the rain holding together under the tracking move on 1.5 — the kind of direction 1.0 starts to lose. For a few more tokens, 1.5 is the sensible everyday default.

3 · The money shot — Seedance 1.0 vs 2.0

Everything that breaks lesser models in one prompt: water physics, an orbiting aerial camera, volumetric light, and a film look. Run on the entry model and the flagship back to back, the gap is the clearest argument for when 2.0's tokens are worth it.

prompt — lighthouse orbit
Aerial drone shot slowly orbiting a white lighthouse on a rugged cliff at golden hour. Waves crash against the rocks below, sea mist drifting, volumetric god rays breaking through the spray. Anamorphic lens flare, epic cinematic mood, 24fps film look, rich warm color grade.
Seedance 1.0 the draft
Seedance 2.0 the flagship

The crashing water and the continuous orbit are exactly what separates a flagship from the rest. 1.0 gets you the idea; 2.0 holds the physics and the camera move together — this is what the flagship tokens are for.

4 · The long multi-shot — HappyHorse vs Seedance 2.0

A 15-second, three-shot micro-story using explicit cut to: transitions. Both models handle multi-shot continuity — the comparison is about price and personality: HappyHorse delivers the length for less, Seedance 2.0 spends more for tighter detail.

prompt — street-food market
A street-food vendor flips a sizzling pancake on a griddle, steam rising into warm tungsten market lights. Cut to: hands drizzling sauce and folding the pancake into paper. Cut to: the pancake handed across the counter to a waiting customer, market bustle blurred behind. Handheld 35mm documentary style, shallow depth of field, warm and lively.
HappyHorse the value pick
Seedance 2.0 the flagship

Three shots, fifteen seconds, a single render on each. HappyHorse holds the motion and the cuts at a meaningfully lower token cost; Seedance 2.0 keeps faces and fine textures crisper across the shots. Pick by what the clip needs — length and budget, or detail.

Tip: the cheapest way to learn a model's personality is to render the same prompt on two models and watch where they diverge. With 500 free tokens at signup, your first few comparisons cost nothing.

Paying with QTUM — how billing works

Qtum.ai deliberately skips the subscription playbook. There's no monthly plan, no seat license, no token packs that expire at the end of the quarter. The model is simpler:

  1. Tokens are the unit of work. Every render quotes its token cost up front — based on model, clip length and resolution — and deducts exactly that.
  2. QTUM is the primary way to buy tokens. Top up your balance straight from your wallet — MetaMask Snap makes this a few clicks — and the tokens land in your account. No card on file, no fiat checkout, no billing relationship to manage.
  3. You start with 500 free tokens. Enough to render real clips on real models, including the flagship, before you've spent anything at all.

Pay-as-you-go does something subtle to how you work: because there's no monthly quota to "use up," there's no pressure to over-render — and because flagship renders visibly cost more than drafts, the draft-cheap / finish-expensive workflow from this guide isn't just good practice, it's the economically obvious one.

Privacy is part of the deal: no credit card also means no card-linked identity. Sign in with MetaMask Snap, pay in QTUM, and the service knows your wallet and your prompts — nothing else. No tracking, no ad retargeting, and your renders are never used to train the models.

Qtum: the network behind the prompt box

Qtum.ai isn't a standalone startup renting GPUs — it's the media-generation layer of a network that's been running production traffic since 2017.

A blockchain with seven years of uninterrupted operation

Qtum's underlying blockchain launched in 2017 and hasn't had downtime since. The network combines Bitcoin's UTXO security model with EVM-compatible smart contracts, runs on Proof-of-Stake consensus, and has shipped 50 core updates over eight years without breaking the chain. That operational record is what backs the token billing your renders run on.

7+ yrs
Uninterrupted operation
50
Core updates shipped
PoS
Bitcoin UTXO + EVM smart contracts

The Qtum ecosystem

  • Qtum blockchain — Bitcoin-secured UTXO with Ethereum-compatible smart contracts, since 2017.
  • Qtum.ai — the AI video generation covered here: four models, pay-as-you-go in QTUM.
  • Qtum AI Router — a unified, OpenAI- and Anthropic-compatible inference layer across multiple LLM providers, billed in QTUM credits.
  • Qtum Ally — a desktop AI agent that integrates multiple LLMs out of the box.
  • GPU infrastructure — the compute layer powering inference and video generation workloads across the network.

The takeaway: your QTUM balance is one account across the whole stack — render video today, route LLM traffic tomorrow, same asset, same wallet.

Render your first clip

Pick a prompt off this page — they're written to be stolen — choose a model, and watch it come back as footage. 500 free tokens on signup, no credit card, no subscription, pay as you go in QTUM.

Start generating at qtum.ai