With the ChatGPT-based tool DALL-E, you have the option for generating images according to your wishes.
Simply describe what you have in mind, and the tool will generate four suggestions for you. You can then have variations of these suggestions created.
There are currently two versions of the DALL-E tool: DALL-E 2 and DALL-E 3. To use DALL-E 2, you must register (https://labs.openai.com) and top up your credit. For $15, you get 115 credits; one generation costs 1 credit. If you want to use DALL-E 3, you can do so directly in the ChatGPT-4 prompt. For this, you need a ChatGPT Plus account. For current prices, check the OpenAI homepage.
DALL-E 2
DALL-E 2 has its own user interface, which differs from ChatGPT but is still quite intuitive to use. Even if you have a ChatGPT Plus account, you should also take a closer look at DALL-E 2. The quality of the images you can generate with DALL-E 3 is much better, but if you want to edit images of people for a flyer, for example, DALL-E 2 is more suitable. Editing your own photorealistic images, in which you can erase certain areas and fill them with new content, isn’t yet possible with DALL-3.
Prompt Engineering
When formulating the prompts, you should pay attention to precision, adjectives/adverbs, and comparisons with similar things. Describe the camera perspective, time of day, and lighting conditions. The prompt can also be supplemented by movie styles, character styles, and eras.
Here are some examples:
- Post-apocalyptic wide-angle shot of a gas station, gloomy, creepy
- Picture of Mona Lisa, in the style of van Gogh, cheerful, sunny
- A small, white dog with sunglasses looks into the camera, lots of light, close-up
- Photo of a robot on the moon, earth in the background, pixel art
- A blues guitarist in a bar, dark, light lighting from behind, melancholic, gloomy, wide angle
- Roman sculpture of a man with a sword, very detailed, realistic, light and shadow
You can access the images generated in the past via the History tab. You can also create image folders via Collections and save the generated images in a structured way. It’s also possible to share folders with others. Released collections are marked as public, others as private.
You’ll find many prompts and pictures of DALL-E users sharing their results on the internet. The examples from Guy Parsons (https://dallery.gallery/the-dalle-2-promptbook) are very interesting and varied.
Editing Generated Images
If the prompts are too long and too complex, the result may not look as desired. It’s then more difficult for the model to recognize what is actually required. In addition, there is a greater risk that you’ll enter information when formulating the prompts that is superfluous and tends to confuse the model. That is why ChatGPT itself recommends formulating prompts in a clear and focused manner without omitting important information. For complex topics, it can be useful to break a problem down into smaller parts. ChatGPT allows you to formulate multiple prompts in succession relatively easily. Using DALL-E, you can subsequently create variations of the generated images or edit these images to suit your needs.
For example, say you’re not happy with the example of the blues guitarist. If you select the image, and click on Variations, four alternative images will be suggested:
But even with the alternative variations, there is no picture that convinces me. And if we add the prompt that the guitarist should wear black sunglasses in proper style, the overall result no longer fits. Let’s therefore edit the image online. To do this, select the image again and click on Edit. The man’s face is erased (see next figure) and the following is entered at the top of the prompt: Dark picture of a 50-year-old man with black sunglasses, light in the background.
After editing, we now have a picture of a blues guitarist as we had imagined and originally wanted (see next figure). If you like an edited image, you should save it in a collection or download it. Reentering the same prompt doesn’t produce the same image.
You can also expand images. In edit mode, click on Add generation frame, and place the image on the left.
Now you can enter a new prompt for this frame. Choose the following: Table lamp with dim lighting.
In the next example, we uploaded an image of a dog (https://labs.openai.com/editor) and expanded it on the right. The prompt for this was as follows: Green meadow in the background, blurred, with flowers. You can use the arrows in the black bar at the bottom of the screen to click through the suggestions and select one. Just take a look at the next figure. The content of the blue frame is completely generated by DALL-E, and the result looks very realistic.
You’ve now seen some examples that were generated or edited using DALL-E. Now try to create your own pictures according to your own ideas. Use the editing mode to adjust details. Search the internet for prompts and the corresponding images from other users as this will increase your wealth of experience.
DALL-E 3
ChatGPT Plus customers can use DALL-E 3 directly in the GPT-4 prompt. The following principle applies here: the more details you enter, the more likely it is that the result will meet your expectations. However, you can deliberately leave out some details because the results are always surprising in a positive way. You can then enter change requests if necessary. You can also upload images and communicate your change requests or ask questions about the image. However, this doesn’t yet work well with photorealistic images. Here are a few examples of image generation:
- Create a photorealistic image of a hamster on a bicycle.
- This picture is already quite good. Change it in such a way that the hamster is wearing a little hat on its head.
- Generate a picture of a robot in the style of van Gogh.
- The robot should resemble a human, but still be clearly recognizable as a robot.
- Create a photorealistic image of a blues guitarist in a dark bar, dim light in the background. The guitarist should be wearing a black hat and black sunglasses.
- Create a profile picture of an old man with a full beard in a small boat, the sea in the background. Only the man’s head should be visible in the picture.
You can see the results of the prompts in this figure:
In the next example, we uploaded a photo and had the image described. DALL-E 3 wrote: The picture shows a teddy bear. It sits upright and appears soft and fluffy, with fur in a light brown color. The soles of the bear’s feet and its snout are designed in a contrasting white. The bear has black eyes and a black nose, which stand out and give it a friendly appearance. In the background, the bear appears to be sitting on a patterned textile in shades of red and beige, possibly part of a sofa or cushion.
We then entered the following prompt: The teddy bear should sleep; the rest of the picture remains the same. The original image and the result are shown:
The result looks really good but hardly resembles the original image. It would probably be impressive and frightening at the same time if you could manipulate photorealistic images just like that with a prompt—especially if they are photos of people.
Editor’s note: This post has been adapted from a section of the book Developing AI Applications: An Introduction by Metin Karatas.
Comments