Veo, Nova, Ray, Runway, Pika, Sora, Kling. Is this a new phonetic alphabet? No, these are all recently released AI Video Generation Models. And this does not even include the ecosystem of open source options!
Video models have come a long way in the past few months. We are rapidly approaching the point that we have already reached with image generation - the ability to spot the real from the generated is blurring.
Is this a real or fake image?
Answer: Fake (note the missing shoulder strap for the white undershirt)
But flaws persist in even the largest, most heavily trained state of the art models:
Consistency is still lacking. We are knee deep in slot machine territory.
Prompt adherence has improved but we are nowhere near the best image models such as Flux
Like prior generation image generators, text generally turns into garbled mess after a few seconds
These flaws can be greatly reduced using a number of techniques, which is a major focus of my work these days. But the real question that AI skeptics (which should be everyone to some extent) always like to ask…what do you actually do with these models?
Writing bespoke text prompts to generate somewhat random shots certainly has novelty but is there more? AI films are now a thing, but as this AI film festival review notes, we are not yet at the point where most people consider the output to be inherently entertaining.
The real value in not in the models themselves but to anchor the models to external assets and build applications that offer finer control. As the models continue to improve, these applications will only increase in effectiveness. Some examples include:
Image to video with intelligent prompting is a good start while advanced frame interpolation is even better
Use a controlnet to map the movement or layout of an existing video to a new video
Map an audio file to an image or video to generate lip synced video
Strictly under the novelty category, here is me speaking in Mandarin well beyond my ability thanks to AI:
Now that deep fakes are a near solved problem, what’s next in store for the disinformation age? More hoverboards is the only certainty.