AI Video Friend or Foe?
Imagine a realm where your boldest cinematic visions spring to life with mere words. In the past, only accomplished sorcerers like Merlin could accomplish such feats, but now, anyone can summon a moving image with a single keystroke. This is the alluring promise of AI video generation technology, rapidly becoming a reality that defies conventions and raises its middle index to filmmakers across the globe..
Gen-3 Alpha is able to generate highly realistic and expressive human characters with a wide range of actions and emotions.
— Runway (@runwayml) June 19, 2024
Learn more at https://t.co/YQNE3eqoWf pic.twitter.com/LzfblSmKL5
Leading this disruptive wave are models like OpenAI’s Sora, Runway, Kling AI, and Pika – trailblazers in a field poised to redefine (and perhaps democratize) filmmaking. Their ability to swiftly produce high-quality visual content has sparked a renaissance in the world of visual storytelling, enabling creators to explore new creative possibilities and push the boundaries of traditional filmmaking. This AI-driven revolution may not only streamline the production process but also inspire a new era of storytelling that blurs the lines between human creativity and artificial intelligence.
The implications are mind-bending. For filmmakers, AI video generation offers an uncharted canvas for creative expression, bounded only by the depths of their imagination. The era of meticulously crafting every frame, every movement, every visual element is fading. With AI, the pre-visualization and concept development phases are quickly transforming, allowing directors and writers to rapidly iterate and experiment with their ideas. In the short time since these models have hit the internet we have seen some clever and creative outputs. The resulting material has not only sparked applause but outrage from the creative craftspeople who see this as a major threat to their livelihood. We are apt to agree although the true impact is yet to be seen.
But the true power of AI video generation extends far beyond mere convenience (and human replacement.) It has the potential to unlock entirely new realms of storytelling, where the lines between reality and fantasy blur.
How high can AI SORA
Take, for instance, OpenAI’s Sora, a model that can generate 60-second video clips of remarkable quality and realism. With Sora, filmmakers can summon intricate scenes, complete with lifelike human movements, detailed environments, and complex objects – all from a textual description. OpenAI is all in on their technology and spending some of their billions in valuation on wooing Hollywood’s top decision makers to their mission of world dominance. They are unsuccessfully trying to walk the line of “we got your back, creative community” while they use the creative community’s work to train its models. Not everyone is happy about it. ScarJo?
Sora’s possibilities are staggering and being tested in partnership with some filmmakers and production enthusiasts. What we have seen to date is truly impressive, however, it is also misleading as the creative’s behind many of the demos and short films revealed that an immense amount of post-production was necessary to achieve professional results.
Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB
— OpenAI (@OpenAI) February 15, 2024
You can’t soar without a Runway
When talking text-to-video, we have to talk about Runway, a pioneering platform, which just unveiled a groundbreaking advancement that could “redefine the boundaries of what’s possible”. Gen-3 Alpha, is Runway’s latest offering. It marks a significant leap forward in the quest for AI systems that can construct comprehensive simulations of entire environments.
Runway’s long-term vision aligns with the pursuit of artificial general intelligence (AGI) – the holy grail of AI development. Much like OpenAI’s audacious goal, Runway aspires to create what they term “general world models” – AI systems capable of building intricate internal representations of environments and simulating events within those virtual realms.
Gen-3 Alpha represents Runway’s most significant stride yet towards this ambitious target. According to the company, this cutting-edge model will power the entirety of Runway’s image- and text-to-video tools, as well as its innovative Motion Brush feature and other groundbreaking capabilities, including text-to-image generation.
The implications of such a powerful AI video generation system are vast and far-reaching. Imagine filmmakers and content creators with the ability to conjure fully-realized worlds from mere textual prompts or conceptual sketches. Entire narratives, complete with richly detailed settings, characters, and events, could be brought to life with unprecedented ease and fidelity.
Moreover, Gen-3 Alpha’s potential extends well beyond the entertainment industry. Industries ranging from architecture and urban planning to scientific visualization and simulation could benefit profoundly from the ability to create highly accurate and immersive virtual environments on demand.
Runway Gen-3 Alpha Prompt: An extreme close-up shot of an ant emerging from its nest. The camera pulls back revealing a neighborhood beyond the hill.
Kling (Kong), the rise of the Beast you never saw coming.
And then there’s Kling AI, a newcomer that uses a 3D space-time attention system to model motion and physical interactions accurately. Like OpenAI’s Sora it uses diffusion transformers to generate coherent narratives. Kling AI has upped the game in that it can generate content up to two minutes long at higher resolutions (1080p) than Sora. Imagine being able to craft entire short films, complete with intricate plotlines and character arcs, all from the comfort of your keyboard. It isn’t something we are truly excited about here. KLING AI is currently available as a public demo in China.
But perhaps the most intriguing aspect of AI video generation is its potential to democratize filmmaking. By automating tasks and reducing production costs, this technology could open doors for independent creators and smaller studios to bring their visions to life without massive budgets or resources.
Take a PIKA at the future
Pika, founded in 2023 by two Stanford Ph.D. students who felt making videos was too hard, is another early entry into the AI video generation game. Pika focuses on creating short video clips (up to 4 seconds) from text prompts. (Listen, Pika co-founders Demi Guo and Chenlin Men, as filmmakers we know video is hard, but that doesn’t mean we don’t love it. We are here for you).
Pika’s models are known for their ability to generate realistic human faces and expressions, which could be useful for character development and dialogue scenes in filmmaking. Recently they integrated a Lip Sync function and sound effect generation for your creations. Their recent upgrade also allows users to input cinematic camera motion instructions. While our tests have had mixed results the upgrades are hard to ignore.
Are we Dreaming?
Dream Machine is another AI model that makes high quality, realistic videos fast from text and images. The makers of Dream Machine, Luma Labs, boast that not only is the quality one of the fastest, claiming it can create 120 frames of video in 120 seconds, not surprising since one of their partners is NVidia.
It is a highly scalable and efficient transformer model trained directly on videos making it capable of generating physically accurate, consistent and eventful shots.
The Negative Impact on Filmmaking
Despite its many promises, the rise of AI video generation isn’t without drawbacks, criticism and worry. One significant concern is the potential erosion of traditional filmmaking skills. Crafting a film involves a deep understanding of cinematography, direction, acting, and editing. These skills, honed over years of practice, risk being overshadowed by the ease and speed of AI-generated content.
Moreover, the use of AI in video creation raises questions about originality and artistic integrity. If a machine can generate a video based on existing data and patterns, what happens to the human touch that infuses art with emotion and authenticity? The danger is that AI-generated films might prioritize technical perfection over the nuanced, imperfect beauty that characterizes human creativity.
There’s also the issue of homogenization. AI models, trained on vast datasets, might produce outputs that reflect prevailing trends and biases, leading to a lack of diversity in visual styles and narratives. This could result in a cinematic landscape dominated by algorithmic predictability, stifling the unique voices and unconventional stories that push the medium forward. While companies like Adobe have been pushing for better representation in its machined outputs, it isn’t without missteps. Google’s AI is notoriously bad at this.
Ethical and legal concerns cannot be overlooked. The use of AI-generated likenesses of people, living or deceased, without consent, raises significant privacy issues. Furthermore, the potential for deepfakes and misleading content poses a threat to the integrity of visual media, complicating the already fraught landscape of digital ethics and copyright law. As we march toward a pivotal election here in the US, I fear what damage can be caused by deepfakes and AI generated falsehoods.
Lastly, and a big concern among many of us here, the reliance on AI tools has led to a devaluation of human labor in the film industry. Jobs in areas like visual effects, animation, and even scriptwriting might be at risk, leading to economic ramifications for those whose livelihoods depend on these skills. It is one thing to use a tool to augment your creativity, it is another to use it to replace it.
(And, in fairness, as we have argued numerous times, the advancement of technology will always cause a shift and a change in how art is made. AI is to cinema techniques today, what Photography was to fine arts painters and illustrators in the late 1800s.)
Navigating the Future
As with any transformative technology, the key lies in finding a balance. AI video generation can be a powerful tool for filmmakers, but it should complement, not replace, human creativity. Embracing AI’s potential while remaining vigilant about its limitations and impacts will be crucial.
The promise of AI video generation is undeniable. It represents a seismic shift in visual storytelling, potentially ushering in a new era of cinematic creativity and innovation. (And it isn’t just video, it is music, voice, and sound effects.) As we stand overlooking the cliff of this technological revolution, the future of filmmaking has never been more exciting. Yet, it’s essential to navigate this future with a clear-eyed understanding of both the opportunities and challenges that lie ahead. The boundaries of what is possible are being rewritten, and the only limit is the depth of our imagination, balanced by our commitment to preserving the soul of cinema.