Meta released Meta Movie Gen, generative AI research for media containing different media types such as image, video, and audio. Meta shows that you can generate customized videos and sounds with simple text inputs, modify existing videos, and alter your photo into a video. When measured and assessed by people, the performance of Movie Gen is superior to other similar models seen in the movie industry in these tasks.
The first wave of generative AI work that was initiated was with the Make-A-Scene series of models where we created mechanisms for making images, audio, video, and 3D animation. The diffusion models gave us the second round of work on Llama Image foundation models, which allowed for better quality generation of images and videos and even image manipulation.
Our third wave is Movie Gen, which is this combined query and response, and allows further refined control for the path that Pick uses at scale in a way that was previously frankly, unimaginable.
Video Generation feature
Meta Movie Gen allow you to take advantage of a model that can be both text-to-image and text-to-video, they will be able to generate good quality, clear, and high-definition images and videos. This 30B parameter transformer model can generate videos of up to 16 seconds at 16 frames per second.
Video customization as per preference
Meta also extended the above foundation model to incorporate personalized video generation. Using a person’s image and text input, produce a video that includes the reference person, as well as a synthesis of the text input embedding rich visual details. The model can generate personalized videos without losing Faces and Motion characteristics, and believe that our method set a new benchmark in this area.
Accurate editing
The editing variant of the same foundation model accepts not only a text prompt but also a video performing tasks with precision to create the desired results. Meta Movie Gen combines video generation with advanced image editing; performing local operations where objects can be completely removed/added/replaced and global operations such as changing the backdrop or style of the video.
Audio generating model
Meta developed a 13B parameter audio generation model that turns a video and optional text prompts into realistic and high fidelity audio up to 45 seconds in duration that includes ambient sounds, sound effects (Foley), and instrumental background music all in conjunction with the content of the video.
Meta Movie Gen also introduces an audio extension technique that can create coherent audio for videos of essentially any length together providing overall new state of the art in audio quality, video-to-audio alignment, and text-to-audio alignment.
Meta Movie Gen Vision
Meta claims that they want to take a collaborative approach to make sure we are building tools that allow people to build on their creativity in new ways that they would have never thought possible. Once creative nonfiction and perspective ownership reigns, you have the entire world at your fingertips.