AI image generation and analysis update
Oct. 22nd, 2023 07:29 pmGPT-4 in ChatGPT+ can now read and analyze images, one can upload an image and discuss it with GPT-4.
GPT-4 in ChatGPT+ can also generate text prompts and render images via DALL-E 3.
What is missing is the ability to load an image and modify it with DALL-E 3 (or to load the image and use it as an inspiration for image generation by DALL-E 3).
1) So, on one hand, I want to ask a question:
Do people here have experience with loading an image and modifying it using an AI system (or with loading an image and using it as an inspiration for an AI system)? Any recommendations?
2) On the other hand, there are two possible workarounds.
2a) One can use an ASCII art inside GPT-4 to inspire images rendered by DALL-E 3 (it works, but I was not able to obtain the results I would like so far).
2b) One can analyze an image and obtain an image description in one mode of GPT-4 in ChatGPT+, and then open another session and use that description in the prompt to guide DALL-E 3.
Some of the results obtained via method 2b) are quite interesting. I am going to post related images and dialogs as a series of comments.
Update: In November, the ability to read and create images within one session has been added (and, generally speaking, all functionality which has been separated into different type of sessions is now available from a single session, you should be able to ask the system for anything, be it "search the web", or "generate and execute code", or "upload this data, and then generate code to analyze this data, and run it"; these used to require starting different types of sessions, but now all this is available together).
GPT-4 in ChatGPT+ can also generate text prompts and render images via DALL-E 3.
What is missing is the ability to load an image and modify it with DALL-E 3 (or to load the image and use it as an inspiration for image generation by DALL-E 3).
1) So, on one hand, I want to ask a question:
Do people here have experience with loading an image and modifying it using an AI system (or with loading an image and using it as an inspiration for an AI system)? Any recommendations?
2) On the other hand, there are two possible workarounds.
2a) One can use an ASCII art inside GPT-4 to inspire images rendered by DALL-E 3 (it works, but I was not able to obtain the results I would like so far).
2b) One can analyze an image and obtain an image description in one mode of GPT-4 in ChatGPT+, and then open another session and use that description in the prompt to guide DALL-E 3.
Some of the results obtained via method 2b) are quite interesting. I am going to post related images and dialogs as a series of comments.
Update: In November, the ability to read and create images within one session has been added (and, generally speaking, all functionality which has been separated into different type of sessions is now available from a single session, you should be able to ask the system for anything, be it "search the web", or "generate and execute code", or "upload this data, and then generate code to analyze this data, and run it"; these used to require starting different types of sessions, but now all this is available together).
no subject
Date: 2023-10-22 11:56 pm (UTC)So I am just going to write this down.
In the original session in the default mode I uploaded my avatar I am using here, and I asked:
***
what does this image show?
***
GPT-4 answered (in a rather abbreviated and so-so fashion):
The image appears to be a schematic or diagram showing three vertical sections with triangular symbols in each section. The triangles might represent some sort of signal or waveform. The large curved line enclosing two of the sections suggests a relationship or process involving those two sections. Without additional context, it's difficult to provide a specific interpretation. It could be a representation from various fields such as
***
Then I opened another session, with DALL-E 3, and asked:
***
Please create an image showing a schematic or diagram showing three vertical sections with triangular symbols in each section. The triangles might represent some sort of signal or waveform. The large curved line enclosing two of the sections suggests a relationship or process involving those two sections.
***
I've obtained three images: