Captions explaining img2img prompts and generated images in Automatic1111 tool

Loading data from Talking to Chatbots Dataset reddgr/talking-to-chatbots-chats …

😎 I want a caption that explains a few things: the text highlighted in yellow is the img2img prompt, which was automatically generated after clicking the button “Interrogate CLIP”. The text is generated based on the large image that appears on the left. Then the Stable Diffusion web UI in Automatic1111 (the tool the screen captures belongs to) has generated the 6 images on the right (after the user clicked on “Generate”)

[image: file-4U6BqedXg6HQz8I0bQJDbGjj]

🤖 Here’s an ALT text for the image:

Screenshot of a Stable Diffusion web interface in Automatic1111, displaying a feature where an img2img prompt is highlighted in yellow, reading “a woman in a wet suit holding a surfboard on the beach with waves in the background and a blue sky, promotional image, a colorized photo, precisionism.” This prompt was created by the “Interrogate CLIP” feature, as indicated by a red oval around the button. On the left side of the screen, a large image of a woman in a wetsuit holding a surfboard on the beach is visible. To the right, a sequence of six smaller images are shown, representing variations of the large image that were generated after the “Generate” button was clicked.

[Alt text by ALT Text Artist GPT]

😎 Now it’s the same, but with “Interrogate DeepBooru”

[image: file-zvytMPihfFOpGAFmKRR7x8pl]

🤖 Here’s an ALT text for the image:

Screenshot of a Stable Diffusion web interface in Automatic1111, showing an img2img prompt in yellow highlight with the text “1girl, aircraft, asian, beach, black hair, blurry, blurry background, bodysuit, cloud, day, depth of field, horizon, lips, ocean, outdoors, planet, realistic, red lips, retro artstyle, short hair, sky, solo, space, surfboard, water, wetsuit.” This prompt was generated by clicking the “Interrogate DeepBooru” button, encircled in orange on the screen. On the left, a prominent image of a woman with black hair in a wetsuit holding a surfboard on a beach is visible. On the right, there are six derivative images generated by the system based on the initial image and prompt.

[Alt text by ALT Text Artist GPT]

😎 The image is generated by ControlNet Canny model. The text was generated with CLIP Interrogator, which has been superimposed for illustrative purposes.

[image: file-eaRN2oQ8iuiyyks2rEaEWVje]

🤖 Here’s an ALT text for the image:

Black and white line drawing of a woman in a wetsuit holding a surfboard on the beach, created by the ControlNet Canny model. Superimposed text on the image, likely for illustrative purposes, describes the content as “a woman in a wet suit holding a surfboard on the beach with waves in the background and a blue sky, promotional image, a colorized photo, precisionism,” generated by the CLIP Interrogator tool.

[Alt text by ALT Text Artist GPT]

😎 don’t say “likely”. Use your own language. I don’t know if “illustrative” is an appropriate word. I was just explaining the image to you.

🤖 Here’s an ALT text for the image:

Black and white line drawing generated by the ControlNet Canny model, depicting a woman in a wetsuit holding a surfboard on the beach. The text from the CLIP Interrogator, describing the image, is superimposed over the drawing and reads: “a woman in a wet suit holding a surfboard on the beach with waves in the background and a blue sky, promotional image, a colorized photo, precisionism.”

[Alt text by ALT Text Artist GPT]

Hugging Face Dataset Metrics

All the conversation prompts, responses, and metrics are available to download and explore on Hugging Face dataset reddgr/talking-to-chatbots-chats: