Researchers are moving on to the next frontier of artificial intelligence, which is text-to-video generators.
A group of machine learning engineers from Meta have created a new system called Make-A-Video. This artificial intelligence model allows users to type in a rough description of a scene and it will generate a short video. The videos are clearly artificial, with blurred subjects and distorted animation, but still represent a significant advancement in the field of artificial intelligence.
The model is impressive even though it is artificial.
Meta stated in a post thatrative artificial intelligence is pushing creative expression forward by giving people tools to quickly and easily create new content. Make-A-Video can bring imagination to life and create one-of-a-kind videos with just a few words or lines of text.
It's much harder to generate video than photos because the system also has to predict how they'll change
The clips are five seconds in length and have no audio. Watching the model's output is the best way to evaluate it. The videos below were created by Make-A-Video and captioned with a prompt. It is worth noting that the model is not currently being allowed to be accessed by anyone. The system could have been shown in a better light.
While it is clear that these videos are computer-generated, the output of such models will improve quickly. In just a few years, artificial intelligence image generators have gone from making pictures that were borderline incomprehensible to making pictures that were realistic. The prize of seamless video generation will motivate many institutions and companies to pour great resources into the project.
There is a chance for harmful applications.
Video generation tools could be useful for creators and artists according to Meta. There are worrying prospects as well. The output of these tools could be used for propaganda, misinformation, and pornography that can be used to harass and intimidate women.
Meta is publishing a paper on the Make-A-Video model in order to be thoughtful about how we build new generative artificial intelligence systems. The company doesn't say when or how access to the model will be limited, but it does say that it will release a demo of the system.
Meta is not the only institution that works on artificial intelligence. A group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released their own text-to-video model. The sample output from CogVideo is limited in many ways like Meta's work.
Meta's researchers note in a paper that Make-A-Video is training on pairs of images and un labeled video footage. Two datasets contain millions of videos and are used for training content. Stock video footage is included in this.
There are many technical limitations to the model, according to the researchers. They can't learn information that might be inferred from a human watching a video, such as whether a video of a waving hand is going left to right or right to left. Videos with multiple scenes and events are one of the problems. Currently, Make-A-Video outputs 16 frames of video at a resolution of 64 by 64, which are then boosted in size using a separate artificial intelligence model.
The Make-A-Video model has learned and likely exaggerated social biases, including harmful ones, according to the Meta team. These biases are reinforced in text-to-image models. Ask a model to create an image of a terrorist and it will probably depict someone wearing a turban. It is not possible to say what biases the model has learned.
Meta says it will continue to use its responsible artificial intelligence framework to refine and evolve its approach to this emerging technology.