Inspired by the recently released images of the universe by NASA, the first prompt I fed into the Artificial Intelligence (AI) tool of research lab Midjourney was “a spaceship surrounded by galaxies”. The result, as pictured below, was an image of a vessel suspended in space that seems to reflect the cosmos around it – pretty much true to the prompt.
For Midjourney’s founder David Holz, a powerful aspect of generative AI is its “ability to unify with language”, where we can “use language as a tool to create things”. In simple terms, generative AI uses commands from the user to create novel images based on the dataset it has learned from different sources over time.
The rise of text-to-image generation has also raised philosophical questions about the definition of an ‘artist’.
British mathematician Marcus du Sautoy argues in his book, The Creativity Code (Art and Innovation in the Age of AI), 2019, “Art is ultimately an expression of human free will and until computers have their own version of this, art created by a computer will always be traceable back to a human desire to create.” He states that if we were to create a “mind” in a machine, it would perhaps offer a glimpse into its thoughts. “But we are still a long way from creating conscious code,” du Sautoy concludes.
Similarly, Holz notes, “It’s important that we don’t think of this as an AI ‘artist’. We think of it more like using AI to augment our imagination. It’s not necessarily about art but about imagining. We are asking, ‘what if’. The AI sort of increases the power of our imagination.”
Midjourney allows its users to feed in their prompts on its Discord server and then generates four images akin to the text. The user can choose to explore more variations and upscale the perfect fit to a higher quality image. The bot entered open beta last month, giving users a certain number of free trials to bring their imaginations to life. The images generated can also be minted into NFTs, for which, until recently, Midjourney charged royalties.
“It’s a giant community of almost a million people who are all making images together, dreaming and riffing off each other. All of the prompts are public and everybody can see each other’s images… that’s pretty unique,” Holz tells indianexpress.com.
Holz co-founded Leap Motion, a hand-tracking motion capture user-interface company, in 2010, and was featured in the Forbes 30 under 30 list of 2014. He now runs a small self-funded research and design lab, Midjourney, which is exploring a bunch of diverse projects, including the AI visualization tool, with 10 other colleagues.
Elaborating on the response received by the AI bot, Holz says, “A lot of people are very happy and find using the product a deeply emotional experience. People use it for everything from a project to art therapy. There are people who have always had things in their mind but were unable to express it before. Some people have conditions like aphantasia, where the mind can’t visualize things, and they are now using the bot to visualize for the first time in their life. There’s a lot of beautiful stuff happening.”
The bot also takes care to prevent the misuse of the platform to generate offensive images. The community guidelines urge users to refrain from using prompts that are “inherently disrespectful, aggressive, or otherwise abusive” as well as generate “adult content or gore”. Midjourney also makes use of moderators who watch out for people violating the policies and give them a warning or ban them. It also has automated content moderation where certain words are banned on the server. The AI, too, learns from user data, Holz explains. “If people don’t like something, it generates less of that.”
I chanced upon the Midjourney bot during a cursor glance through my Twitter feed, where I saw user psychedelhic’s renditions of a somewhat post-apocalyptic Delhi.
Having previously dabbled with AI bots like Disco Diffusion and Craiyon, an interesting aspect of discovering Midjourney was looking at how different AIs would respond to the same texts. The pictures below show the results generated with the same prompt, ‘city during monsoon rains’, by Midjourney, Disco Diffusion, a free-to-use AI tool hosted by Google Colab, and Craiyon, formerly known as DALL-E mini.
While Craiyon throws up relatively realistic images, Disco Diffusion shows surreal, impressionistic results, and Midjourney sits somewhat in the middle of the two.
According to Holz, Midjourney can be understood as a “playful, imaginative sandbox”. “The goal is to give everybody access to that sandbox, so that everyone can understand what’s possible and where we are as a civilization. What can we do? What does this mean for the future?”
Holz dismisses fears that AI is here to “replace” humans or their jobs. “When computer graphics was invented, there were similar questions — will this replace artists? And it hasn’t. If anything, computer graphics makes artists more powerful,” he says.
Holz adds, “Whenever we see something new, there’s a temptation to try and figure out if it’s dangerous and we treat it like a tiger. I ain’t a tiger. It’s actually more like a big river of water. A tiger is dangerous in a very different way than water. Water is something that you can build a boat for, you can learn to swim, or you can create dams that make electricity. It’s not trying to eat us, it’s not angry at us. It doesn’t have any emotions or feelings or thoughts. It’s just like a powerful force. It is an opportunity.”