Still image from an AI-generated video of a teddy bear painting a portrait.
Enlarge / Still image from an AI-generated video of a teddy bear painting a portrait.

Make-A-Video is an artificial intelligence video generator that can create novel video content from text or image prompt, similar to existing image synthesis tools. It is not yet available for public use.

On Make-A-Video's announcement page, Meta shows example videos, including a young couple walking in heavy rain and a teddy bear painting a portrait. It shows Make-A- Video's ability to take a static source image and make a video. A still photo of a sea turtle can look like it's swimming.

The key technology behind Make-A-Video is that it builds off existing work with text-to- image synthesis used with image generators. The Make-A-Scene model was announced by Meta.

Meta used un labeled video training data and image synthesis data to train the Make-A-Video model, instead of training it on labeled video data. It can predict what will happen after the image and show the scene for a short time.

Advertisement
  • A video of a teddy bear painting a portrait, created with Meta's Make-A-Video AI model (converted to GIF for display here).
  • A video of "a young couple walking in a heavy rain" created with Make-A-Video.
  • Video of a sea turtle, animated from a still image with Make-A-Video.

Meta wrote in a white paper that they used function-preserving transformations to extend the spatial layers. New attention modules learn temporal world dynamics from a collection of videos.

Make-A-Video may become available to the public or who would have access to it, but no announcement has been made. If people want to try it in the future, they need to fill out a sign-up form.

Meta acknowledges that the ability to make videos on demand presents some social dangers. At the bottom of the announcement page, Meta says that Make-A-Video has a watermark to make sure viewers know the video is not a captured one.

Meta's watermark safeguard may be irrelevant if competitive open source text-to-video models follow in the footsteps of history.