The greatest artistic tool ever built, or a harbinger of doom for entire creative industries? OpenAI’s second-generation DALL-E 2 system is slowly opening up to the public, and its text-based image generation and editing abilities are awe-inspiring.
The pace of progress in the field of AI-powered text-to-image generation is positively frightening. The generative adversarial network, or GAN, first emerged in 2014, putting forth the idea of two AIs in competition with one another, both “trained” by being shown a huge number of real images, labeled to help the algorithms learn what they’re looking at. A “generator” AI then starts to create images, and a “discriminator” AI tries to guess if they’re real images or AI creations.
At first, they’re evenly matched, both being absolutely terrible at their jobs. But they learn; the generator is rewarded if it fools the discriminator, and the discriminator is rewarded if it correctly picks the origin of an image. Over millions and billions of iterations – each taking a matter of seconds – they improve to the point where humans start struggling to tell the difference.
They learn in their own way, completely undirected by their programmers; each AI develops its own understanding about what a horse is, completely untethered from the reality we understand. All it knows or cares about is its job: either fooling the other AI or not getting fooled, based on its own individual and completely mysterious methods of analyzing and creating fool data.
This leads to the famously weird disconnects from reality that have been the hallmark of such systems to date. Think Deepdream’s strange obsession with dogs and eyes, or the rampant and beautiful surrealism of systems like Botto, the AI/human NFT art collaboration.
Thus far, these algorithms have been fascinating amusements. DALL-E 2, on the other hand, makes it crystal clear just how disruptive this technology will be – not five or 10 years in the future, but the minute its doors are flung open to the public. Just look at the video below, and imagine how much time and money you’d need to budget to make this using non-artificial intelligence.
DALL-E 2 represents a step change in AI image generation technology. It understands natural-language prompts much better than anything that’s come before, allowing an unprecedented level of control over subjects, styles, techniques, angles, backgrounds, locations, actions, attributes and concepts – and it generates images of extraordinary quality. If you tell it you want photo-realism, for example, it’ll happily let you direct its lens and aperture choices.
Given a high-quality prompt, it will generate dozens of options for you in seconds, each at a level of quality that would take a human photographer, painter, digital artist or illustrator hours to produce. It’s some kind of art director’s dream; a smorgasbord of visual ideas in an instant, without having to pay creatives, models or location fees.
You can also generate different versions – either versions of something DALL-E has generated for you, or of something you’ve uploaded. It’ll create its own understanding of the subject, the composition, the style, the color palette and the conceptual meaning of the image, and generate a series of original pieces that echo the look, the feel and the content of the original, but each adds its own twist.
And DALL-E 2 can now also make edits, in a way that makes Adobe’s insanely powerful but notoriously unapproachable Photoshop software feel like a relic of the past. No level of education is required. You can paint out a splodge in a chair and say “put a cat there.” You can tell DALL-E to “make it sunset,” “put her in a neon-lit cyberpunk atrium,” or “remove the bicycle.” It understands things like reflections, and will update those accordingly.
You can stick an image in, and ask the AI to expand it outward to a wider frame of view. Each time, it’ll give you a few different options, and if you don’t like them, you can just run the same instruction again or get more specific in your prompting. Effectively, you can continue zooming out on an image indefinitely, and people are already using this to extraordinary creative effect.
These capabilities – which just scratch the surface of what it can do – make DALL-E 2 an absolutely revolutionary image editor. It feels like this technology can do just about anything.
Well, within limits. OpenAI has designed DALL-E 2 to refuse to create images of celebrities or public figures. It also won’t accept image uploads “containing realistic faces,” and it does its best not to generate images of real people, instead tweaking things in an interesting way that tends to look somewhat like the actual person, but also very clearly not. Mind you, given the sophistication of deepfake and image editing software, we don’t imagine it’ll take a ton of effort to take a DALL-E image and stick the head of your choice on it.
The system wont generate porn, or gore, or political content – and indeed, the data used to train it excludes these types of images. And, unless you specify racial or demographic information in your prompts, the system “generates images of people that more accurately reflect the diversity of the world’s population,” in the hopes of pre-empting some of the racial bias AI systems frequently suffer from due to skewed training data.
DALL-E 2 is currently in beta, with a waitlist for interested parties. Over the coming weeks, a million accounts will be welcomed in, each with 50 free credits to use the system and a further 15 credits each month. Additional credits will cost $15 per 115 credits – and each credit will bring you back four images for a prompt or instruction. It’s at once an incredible democratization of visual creativity, and a knife to the heart of anyone who’s spent years or decades refining their artistic techniques in the hope of making a living from them.
OpenAI explicitly says users “get full rights to commercialize the images they create with DALL-E, including the right to reprint, sell, and merchandise.” But there are still some fascinating legal gray areas yet to be fully AI explored here, given that everything these people know about art, they’ve learned by analyzing the works of other, human creators.
If this latest piece of software looks amazing, it’s worth remembering that it’s still a very early version of this kind of technology. DALL-E 2, its contemporaries and its descendants will continue their evolution at a breakneck pace that’s only likely to accelerate.
Where to from here? Well, why not video? As processing power and storage continues to expand, it’s easy to imagine systems like this should eventually be capable of generating moving images, too. Adobe’s already embedded AI-enhanced video editing capabilities into its pro-level After Effects software, but we’re yet to see any DALL-E style creativity in video as yet.
How long will be before we see an entire short film, written, directed, soundtracked and made entirely by AI systems? And then, after that point, how long until they start being worth watching?
What about other forms of graphic design? Can DALL-E’s logos? Website templates? Business cards? Will it evolve to self-generate catalogs, posters, brochures, book covers and everything else a designer currently makes a living from? Probably. Indeed, if you’re young and interested in art or design, you’d probably best become an expert at getting the best out of these emerging tools, because in a few short years, whether you like it or not, this might be what the gig looks like.
Presumably, alternative AI image generators will soon begin to spring up without the ethical and moral boundaries that OpenAI has drawn around DALL-E. Cans of worms will be opened. Noses will be put out of joint. DALL-E shows a glimpse of a future that’s fundamentally different, and this kind of upheaval is never painless.
Check out the short video below.
DALL-E 2 Explained