Meta shared a new model that turns text into soundless videos. One of the new text-to-video systems released by Google focuses on image quality while the other focuses on the creation of longer clips.
The imagen video is a high-quality model.
The videos are created using artificial intelligence.
The end results are both amazing and unnerving. The most convincing samples are the ones that mimic animation, like the green sprout that forms the words "Imagen" or the wooden figurine that surfs in space. We don't expect such footage to follow strict rules. The model's weaknesses make them a bit loose.
The best clips are those that show the motion of real people and animals, like the cat jumping on a couch. When we have a clear idea of how bodies and limbs should move, the footage becomes more deteriorated. The videos are all impressive, with each clip generated using nothing more than the text prompt in the caption.
Look for yourself.
The imagen video model outputs 16 frames of 3 frames per second at 24x 48 resolution. This low-res content is then run through various super-resolution models, which boost this output to 128 frames of 24 frames footage at the highest resolution possible. That is higher quality than the Make-A- Video model.
As we discussed with the debut of Meta's system, the coming advent of text-to-videoai brings with it all sorts of challenges, from the racial and gender bias embedded in these systems, to their potential for misuse.
There are several important safety and ethical challenges left, according to the company.
In their research paper, the researchers don't mention these things. They write that video generative models can be used to amplify and enhance human creativity. The generative models may be used to create fake, harmful or explicit content. There are several important safety and ethical challenges remaining despite the team's success with filters, and they did not comment on their success. Quite.
It's not surprising. The research project is being mitigated by not releasing it to the public. Meta's Make-A-Video artificial intelligence is also restricted. As with text-to-image systems, these models will be replicated and imitated by third-party researchers before being disseminated as open-sourced models. There will be ethical and safety challenges when that happens.
In addition to Imagen Video, a separate team of researchers published details about another text-to-video model namedPhenaki. The focus of Phenaki is to create longer videos that follow the instructions of a detailed prompt.
There is a prompt like this.
Lots of traffic in futuristic city. An alien spaceship arrives to the futuristic city. The camera gets inside the alien spaceship. The camera moves forward until showing an astronaut in the blue room. The astronaut is typing in the keyboard. The camera moves away from the astronaut. The astronaut leaves the keyboard and walks to the left. The astronaut leaves the keyboard and walks away. The camera moves beyond the astronaut and looks at the screen. The screen behind the astronaut displays fish swimming in the sea. Crash zoom into the blue fish. We follow the blue fish as it swims in the dark ocean. The camera points up to the sky through the water. The ocean and the coastline of a futuristic city. Crash zoom towards a futuristic skyscraper. The camera zooms into one of the many windows. We are in an office room with empty desks. A lion runs on top of the office desks. The camera zooms into the lion’s face, inside the office. Zoom out to the lion wearing a dark suit in an office room. The lion wearing looks at the camera and smiles. The camera zooms out slowly to the skyscraper exterior. Timelapse of sunset in the modern city.
A video like this is generated byPhenaki.
The quality of the video is not as good as that of Imagen Video, but the series of scenes and settings is impressive. There are more examples on the project's home page.
In a paper describing the model, the researchers say their method can generate videos of anarbitrary lengths. Future versions of the model will provide new and exciting ways to express creativity, according to them. While the quality of the videos generated byPhenaki is not yet indistinguishable from real videos, getting to that bar for a specific set of samples is within the realm of possibility. If Phenaki is to be used to generate videos of someone without their consent, it can be very harmful.