The way images are generated is a big breakthrough. The first version of DALL-E used an extension of the technology behind Openai's language model GPT3 to predict the next point in an image as if it were a sentence. This didn't work out well. It was not a great experience. It works at all.

The DALL-E2 uses a model called a diffusion model. Neural networks are trained to remove noise from pictures. The process involves taking images and changing a few in them at a time, over many steps, until the original images are erased and you have nothing. "If you do this a thousand times, the image looks like you have plucked the antenna cable from your TV set--it's just snow."

The neural network is trained to reverse that process and predict what a less blurry version of an image would look like. If you give a model a lot of space, it will try to make something better. Plug the image back in and the model will clean it up. The model can take you from TV snow to a high-resolution picture if you do this many times.

AI art generators never work exactly how you want them to. They often produce hideous results that can resemble distorted stock art, at best. In my experience, the only way to really make the work look good is to add descriptor at the end with a style that looks aesthetically pleasing.

~Erik Carter

The language model that is trying to match a prompt to the images the model is producing is the trick with text-to- image models. A good match is what the language model considers a good match.

Text and images are not being pulled out of thin air by the models. A large data set called LAION contains billions of pairs of text and images that were taken from the internet. The images you get from a text-to-image model are a representation of the world as it is seen online.

There is a small but important difference between the two most popular models. DALL-E 2 is able to work on full-size images. Stable Diffusion is based on a technique invented by Ommer and his colleagues. It works on compressed versions of images that are in the neural network and only the essential features are retained.

Less computing muscle is required to work. Stable Diffusion is able to run on good personal computers. Stable Diffusion is open source and lightweight enough for people to run at home, which is one of the reasons for the rapid development of new apps.

Redefining creativity

Artificial general intelligence, or AGI, is a term used to refer to a future artificial intelligence that has general purpose or even human-like abilities. OpenAI wants to achieve AGI. Some of the tools that DALL-E 2 now competes with are free. He says that they are here to make AGI. It will fit into a larger road map. It is a small part of the AGI's plan.