If you buy something from a link, they may make a commission. Our ethics statement can be found here.
A piece of art is beginning to change culture. In the last few years, the ability of machine learning systems to generate imagery from text prompt has increased. These tools are moving out of research labs and into the hands of everyday users, where they are likely to create new types of trouble.
Right now, there are only a few dozen top-flight image- generating Artificial Intelligence. It takes access to millions of images used to train the system, and a lot of computational grunt, but a million-dollar price tag isn't out of the question.
The output of these systems is usually treated as novelty when it gets splashed on a magazine cover or used to create a meme. In a short amount of time, artists and designers will be able to use this software to create art. There are questions about who owns the image. It will have to be dealt with quickly about potential dangers, like bias output or artificial intelligence.
As the technology goes mainstream, one company will be able to take some credit for its ascendancy: a 10 person research lab named Midjourney, which makes an eponymous artificial intelligence image generator accessed through a Discord chat server. You have probably seen the output from Midjourney's system already. You can generate your own by joining Midjourney's Discord, typing a prompt, and the system will make an image for you.
Midjourney exists on Discord because “people want to make things together”
A lot of people want you to make an app that makes them a picture. David Holz is the founder of Midjourney. You have to make your own social network if you want to make things with others. That is quite difficult. It's really great if you want to have your own social experience.
If you sign up for a free account, you'll get 25 credits with all the images generated. Depending on the number of images you want to make, you will have to pay either $10 or $30 a month.
Midjourney will allow anyone to create their own Discord server with their own image generator. The Midjourney multiverse is going from a Midjourney universe to a Midjourney universe. He thinks the results will be amazing, with an onslaught of artificial intelligence-augmented creativity.
We called him to find out more about his ambitions with Midjourney, as well as why he thinks artificial intelligence is more like water than a tiger. We got Midjourney to show us our discussion.
The interview has been edited to make it clearer.
Start with a bit about yourself and Midjourney. What do you know about your past? How did you end up in this scene? What is Midjourney? What do you think about it?
I guess I'm a serialentrepreneur because I have a name like David Holz. I had a design business when I was a student. I graduated from college with a degree in mathematics. I was working on a PhD while at NASA. I put everything aside when I got overwhelmed. Leap Motion was started in San Francisco by me. We invented a lot of the gestural interface space by selling hardware devices that could do motion capture.
After founding Leap Motion and running it for 12 years, I left to start Midjourney. We have no investors, we are not financially motivated, and we are a small group. We aren't under pressure to be a public company. Having a home for the next 10 years will allow me to work on cool projects that matter and have fun.
“we see this technology as an engine for the imagination”
A lot of different projects are being worked on by us. The lab is going to be diverse. Reflecting, imagination, and coordination are themes. We are becoming well known for this image creation stuff. We don't think it's about art or making deep fakes, but how do we expand the imaginative powers of the human species What is that thing? When computers are better at imagining than humans, what does that mean? We will keep imagining. Humans are slower than cars but that doesn't mean we stopped walking. We need engines when we move a lot of stuff over a long distance. We see this technology as a tool for creativity. It is a positive and humane thing.
A lot of labs and companies are working on similar technologies. There are a number of smaller projects like Craiyon that are available. Where did this tech come from and where do you think it will go in the future?
“in 10 years, you’ll be able to buy an Xbox with a giant AI processor, and all the games are dreams.”
There have been two breakthrough discoveries in the field of artificial intelligence. The ability to create images and the ability to understand language are both related. Combining those things will allow you to create images. These technologies will be better at making images than people, and it will be very fast. You will be able to make content in 30 frames a second within the next two years. It will cost a lot, but it will be doable. In 10 years, you will be able to purchase an Xbox with a giant artificial intelligence processor, and all the games will be dreams.
There is no way to get around the fact that those are facts. What does that mean for a human? We are going to have augmented reality headsets, but what does that mean? The part of that that is not understandable. The software required to make that thing that we can wield is off the map.
We began testing the raw technology in September of last year and found a lot. Most people don't know what they want What do you want from this machine? They say, "dog." You go really, and they go Pink dog. You give them a picture of a dog and they will do something else.
If you put them in a group, they'll go "dog" and someone else will go "space dog" and someone else will go "Aztec space dog", and then all of a sudden, people understand the possibilities. We made Midjourney social because we found that people like to imagine together. One of the largest Discords has over a million people in it, where they co-imagine things.
Is this human collective similar to the machine collective? There is a sort of counterbalance to these systems.
There isn't a collective of machines. When you ask the artificial intelligence to make a picture, it doesn't remember anything else. It doesn't have a will, it doesn't have goals, and it doesn't have a story to tell. That is us, all the egos and stories. It is similar to an engine. People have places to go even though an engine is nowhere to be found. It is a hive mind of people powered by technology.
“They’re new, interesting, human aesthetics that I think will spill out into the world”
Everybody in the community can see the images of the million people who make them. If you do that, it means you are a commercial user. Everyone is ripping off each other. It is similar to aesthetic accelerationism. All of them are bubbling up and swirling around. I think they will make their way into the world.
Is openness helpful in keeping things safe? There is a lot of discussion about how artificial intelligence image generators can be used to generate potentially harmful imagery. What can you do to stop that?
It is incredible. When you put someone's name on the pictures they make, they're more used to using it. That makes a big difference.
Unfortunately, the way that social media works everywhere else, you can make a living by causing outrage, and there's a reason for some people to come into the community, pay for privacy, then spend a month. We have to say, "That's not what we're about; that's not the kind of community we want"
We stomp it out whenever we can. Words are banned if we have to. We collected words for things like photorealistic ultragore and have banned all of them within a mile of that.
Another way to create misinformation is by making realistic faces. Is the model realistic?
Celebrity faces will be generated by it. We don't generally, but we have a default style and look, and it's hard to push the model away from that, meaning you can't really force it to make a deep fake. You have to work hard to make it look like a photo if you spend 100 hours trying. I don't think the world needs more deepfakes, but it does need more beautiful things, so we're focused on making them.
What location did you get the training data from?
The internet is where our training data comes from. Every big artificial intelligence model pulls off all the data it can, all the text it can, and all the pictures it can. We are at an early point in the space, where everyone grabs everything they can, they dump it in a huge file, and no one knows what data in the pile actually matters.
“The entire space has maybe only trained two dozen models like this. So it’s experimental science.”
Our most recent update made everything look better, and you might think we did that by putting in a lot of paintings. We used the user data to make the model, not the actual model. There was no artistic creation in it. We are very early in the science. Two dozen models like this have only been trained by the space. It is an experiment in science.
How much did you spend on training?
I can speak about general things, but I can't speak about specific costs. Every time you do it, it's around $50,000. You have to use a lot of tries because you never get it right in a single attempt. It costs a lot. It isn't so expensive that you need a billion dollars or a super computer.
I think the costs will come down for both training and running. It costs quite a bit to run. Money is spent on every image. We have to rent the $20,000 server by the minute. I think there has never been a service for consumers where they use thousands of trillions of operations in a short period of time. It is more compute than your average consumer has done. It is sort of crazy.
The issue of ownership is one of the more controversial aspects of training data. We don't know if people can claim copyright over images used in training data. If the work of artists and designers can be copied by artificial intelligence, what will happen? Did you have a lot of discussion about this?
We have a lot of artists in the community who are positive about the tool and think it will improve their lives a lot. We talk to them and ask if they are okay. Are you happy about this? We do these office hours where I answer questions for four hours.
The platform is used by a lot of famous artists who all say the same thing. It's like asking an art student to make something inspired by my art when you invoke my name to create an image. I want people to be inspired by my work.
The artists who are active in the Midjourney Discord are likely to be the ones who will be excited by it. Some people say that they don't want their art to be eaten up by machines. Is it possible for these people to remove themselves from your system?
We don't have a process yet, but we're open to it It doesn't have a lot of artists in it. It isn't that deep of a data set. The people who have made it in have been telling us that they don't feel intimidated. It makes sense to be dynamic right now because it is so new. We talk to a lot of people. Artists want it to be better at stealing their styles so they can use it as part of their art flow. I have been surprised by that.
It may be different for other image generators because they try to make something look like it. It looks like an art student is inspired by something else because we have more of a default style. If you say "dog," we could give you a photo of the dog, but that's boring. Why do you want that? Go to the internet search engine to find images. We want things to look artistic.
I am very interested in the idea that each image generator is its own example of culture, with its own preferences and expressions. What do you think about Midjourney's particular style and how did you develop it?
I think it is a little ad hoc. Every time we try a new thing, we make thousands of images. There is not an intent to it. It should be pretty. Specific things should be responded to. It needs to not look like photos. At some point, we might make a realistic version, but we wouldn't want it to be the default. I can see why you would want a more realistic photo.
The style would be a bit abstract and weird, and it would blend things in ways that are surprising and beautiful. It usually uses a lot of oranges. There are some favorites colors and faces. It has to go to its favorites if you give it a vague instructions. We don't know why it happens, we don't know where it comes from, but people just call it "Miss Journey." There is a man with a square face, but he doesn't have a name. It is similar to an artist who has their own faces and colors.
One of the biggest challenges in the image- generation space is dealing with bias. When you use an artificial intelligence model to draw a CEO, it will always be a white man, and when you use it to make a nurse, it will always be a woman. What have you done to deal with that challenge? Is it a big problem for Midjourney or a bigger concern for corporations who want to make money from these systems?
Miss Journey is more of a problem than a feature, and we are working on something that will give you more variety. There are also drawbacks to that. We had a version where it completely destroyed Miss Journey, but if you really wanted, it would destroy that request as well. It's difficult to get that to work without ruining whole genres of expression. It is easy to have a switch that increases diversity, but it is hard to have it only turn on when it is needed.
It's never been easier to make an image with diversity, you just use the word. One word is all you need to tell the model what you want, because you are always one word away from creating.
You talk a lot about how you don't see the work you're doing in Midjourney as practical. It is very hands-on, but your motivation is more abstract, about how we can use artificial intelligence in a more human way. Some people in the space think of the technology as gods and sentient life. What do you think about the situation?
I have been trying to figure out what the machine is. It is like an engine for imagination but there is more. It's tempting to look at it through an artistic lens. Is this similar to the invention of photography? When photographs were invented, paintings got weirder because anyone could take a photo of a face.
“people totally misunderstand what AI is”
Is it similar to that? That is not what it is. It is definitely weird. It feels like the invention of an engine right now, because you're making like a bunch of images every minute, and you're on a road of imagination. One more step into the future, where instead of making four images at a time, you are making 1,000 or 10,000, is different. It took me four hours to get through it all after I made 40,000 pictures in a few minutes. I felt like a small child when I looked into the deep end of the pool and realized I couldn't swim. It felt like a torrent of water after all of a sudden. I thought about it for a few weeks and realized that it was actually water.
People don't understand what artificial intelligence is right now. They think it's a tiger. There is a tiger. I might be eaten by it. It's an enemy. The danger of a tiger is very different to the danger of a river of water. You can swim in the water, you can make boats, you can dam it, and you can make electricity from it. Humans who know how to live with and work with water are better off than people who don't. It's a chance. Yes, you can drown in it, but that doesn't mean we should ban it. It is a good thing when you find a new source of water.
There is a new source of water.
That is a little frightening when you say that.
I think we have found a new source of water and what Midjourney is trying to figure out is how to use it for people. How do people learn to swim? How do we build vessels? We don't know how to dam it up. How do we get from people who are afraid of water to people who like to surf? We're not making water. There's something profound about that.