Everything about DALL-E: the A.I. text-to-image generator

Everything about DALL-E: the A.I. text-to-image generator

Have you already seen funny images with strange words written next to them these last weeks on Twitter? If so, you may be interested to know that these images are created by an A.I. software: DALL-E 2 that turns your words into a specific new language and then develops realistic pictures out of it. Quite fascinating, right? Here's all you need to know about it.

What is DALL-E?

The DALL-E is a new A.I. software that turns your words into images and works of art. Users can simply insert a text in English, and the neural network will generate an image out of it. The tool is developed by OpenAI, a startup backed by Microsoft, and Google's Imagen technology. So far, the description of DALL-E on the OpenAI website says that it's "a new AI system that can create realistic images and art from a description in natural language", which is exactly what it does.

It can create even the most improbable pictures: such as "Teddy bears shopping for groceries in ancient Egypt". In January 2021 OpenAI introduced DALL-E, an older version of the software. Now it's a smarter and newer system – DALL-E 2.

Open AI DALL-E 2
© OpenAI

Its developers say that "DALL-E 2 will empower people to express themselves creatively. DALL-E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity".

How does DALL-E work?

The DALL-E tool is a neural network that is trained by OpenAI to generate images from text. It can combine unrelated concepts and create anthropomorphized versions of animals and objects. DALL-E works together with CLIP, a computer vision system that OpenAI has already introduced last year. The neural network translates human language into its own language and only after that turns them into images. 

Giannis Daras, a Ph.D. candidate in computer science, posted examples of what he called "AI's own language" on Twitter:

"Apoploe vesrreaitais means birds," Daras wrote on Twitter. "Contarra ccetnxniams luryca tanniounons," means bugs or pests".

DALLE-2 has a secret language."Apoploe vesrreaitais" means birds."Contarra ccetnxniams luryca tanniounons" means bugs or pests.The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.A thread (1/n) pic.twitter.com/VzWfsCFnZo

DALL-E's interface is simple: there is a text box where the user can insert text, a button to start the conversion, and images just below it. Its developers explain how DALL-E 2 works: "It uses a process called "diffusion," which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image".

For now, the new version of DALL-E is only available to a small group of people to avoid any hate, nudity, and inappropriate statements that may harm users. Although, you can join the waiting list by explaining why you'd like to test the program and maybe the developers will grant you early access.

Soon the software will be added to API so it could be used by third-party developers. In the meantime, you can try the "lite" version of DALL-E: DALL-E Mini which also creates images from text and is open-source. Although, sometimes it gets stuck due to a large number of requests.