dmm | AI image generation and analysis update

GPT-4 in ChatGPT+ can now read and analyze images, one can upload an image and discuss it with GPT-4.

GPT-4 in ChatGPT+ can also generate text prompts and render images via DALL-E 3.

What is missing is the ability to load an image and modify it with DALL-E 3 (or to load the image and use it as an inspiration for image generation by DALL-E 3).

1) So, on one hand, I want to ask a question:

Do people here have experience with loading an image and modifying it using an AI system (or with loading an image and using it as an inspiration for an AI system)? Any recommendations?

2) On the other hand, there are two possible workarounds.

2a) One can use an ASCII art inside GPT-4 to inspire images rendered by DALL-E 3 (it works, but I was not able to obtain the results I would like so far).

2b) One can analyze an image and obtain an image description in one mode of GPT-4 in ChatGPT+, and then open another session and use that description in the prompt to guide DALL-E 3.

Some of the results obtained via method 2b) are quite interesting. I am going to post related images and dialogs as a series of comments.

Update: In November, the ability to read and create images within one session has been added (and, generally speaking, all functionality which has been separated into different type of sessions is now available from a single session, you should be able to ask the system for anything, be it "search the web", or "generate and execute code", or "upload this data, and then generate code to analyze this data, and run it"; these used to require starting different types of sessions, but now all this is available together).

Flat | Top-Level Comments Only

From:

dmm

1) I am not the first one to come up with this idea, although I "invented this independently".

E.g. here is someone else's experiment in the similar spirit:

https://twitter.com/ComputingByArts/status/1714792882425725263

2) DALL-3 outputs *.webp files: https://en.wikipedia.org/wiki/WebP

I don't think Dreamwidth image hosting supports this format.

Instead I am using GitHub Pages-generated web site to store those images.

Unfortunately, ChatGPT does not allow to share chats containing images yet.

So I am just going to write this down.

In the original session in the default mode I uploaded my avatar I am using here, and I asked:

***

what does this image show?

***

GPT-4 answered (in a rather abbreviated and so-so fashion):

The image appears to be a schematic or diagram showing three vertical sections with triangular symbols in each section. The triangles might represent some sort of signal or waveform. The large curved line enclosing two of the sections suggests a relationship or process involving those two sections. Without additional context, it's difficult to provide a specific interpretation. It could be a representation from various fields such as

***

Then I opened another session, with DALL-E 3, and asked:

***

Please create an image showing a schematic or diagram showing three vertical sections with triangular symbols in each section. The triangles might represent some sort of signal or waveform. The large curved line enclosing two of the sections suggests a relationship or process involving those two sections.

***

I've obtained three images:

GPT-4 generated the following prompt for DALL-E 3 to create this image:

Photo of a clean and crisp schematic divided into three vertical sections. In each section, there are triangles representing signals or waveforms. A prominent curved line encloses the first two sections, emphasizing a relationship or process between them.

GPT-4 generated the following prompt for DALL-E 3 to create this image:

Illustration of a technical diagram with three distinct vertical sections. Triangular symbols are distributed within each section, symbolizing waveforms. A bold curved line surrounds the first two sections, indicating a special connection or interaction.

GPT-4 generated the following prompt for DALL-E 3 to create this image:

Vector image of a modern schematic with three columns. Each column has triangles to denote signals or waveforms. An overarching curved line is drawn around the first and second sections, suggesting a unique relationship or process.

Normally, one gets 4 images, so I said the following:

***

ME:

Nice 3 images, thanks! I particularly like the first one.

GPT-4:

You're welcome! I'm glad you liked the first one. If you have any other requests or if there's anything else you'd like to explore, just let me know. I'm here to help!

ME:

I don't see the image number 4, that subwindow displays "Error loading image" instead... Is it because you generated 3 images, but the interface tried to display four images?

GPT-4:

I apologize for the inconvenience. It seems there might have been an issue with the fourth image generation. I did intend to generate four images as per your request. Let me try to generate the fourth image again for you.

***

And I obtained the following image, which became image number 4:

GPT-4 generated the following prompt for DALL-E 3 to create this image:

Drawing of an organized diagram split into three vertical areas. Within each area, there are triangular shapes illustrating some form of signal or waveform. The first two areas are enveloped by a large curved line, pointing out a significant relationship or operation.

This is also quite interesting (all of them are good, but 1 and 4 are particularly interesting).

***

ME:

Awesome, thanks! It's great!

GPT-4

You're welcome! I'm glad you found it satisfactory. If you have any more questions or requests in the future, don't hesitate to reach out. Have a wonderful day!

Here are DALL-E 3 links:

paper: https://cdn.openai.com/papers/dall-e-3.pdf

Oct 19 blog post: https://openai.com/blog/dall-e-3-is-now-available-in-chatgpt-plus-and-enterprise

DALL-E 3 page: https://openai.com/dall-e-3

juan_gandhi

Some people on Facebook show their examples on how they draw something and then Dalle converts it into a pretty neat image.

Thanks!

Yes, I don't know about DALL-E 3, but people say that stand-alone DALL-E 2 can do that.

It's a separate account (or a separate service). Although they do give some free usage allowance, and the price is per usage and seems to be quite low (like 2 cents per image).

Follow-up 1: ChatGPT+ is rolling out the upgrade allowing to do image-to-image directly (but I don't have it yet).

Follow-up 2: https://dediffusion.github.io/

"De-Diffusion Makes Text a Strong Cross-Modal Interface", https://arxiv.org/abs/2311.00618

"We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language [...]"

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Dataflow matrix machines (by Anhinga anhinga)

AI image generation and analysis update

AI image generation and analysis update

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

April 2026

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags