AI image generation and analysis update
Oct. 22nd, 2023 07:29 pmGPT-4 in ChatGPT+ can now read and analyze images, one can upload an image and discuss it with GPT-4.
GPT-4 in ChatGPT+ can also generate text prompts and render images via DALL-E 3.
What is missing is the ability to load an image and modify it with DALL-E 3 (or to load the image and use it as an inspiration for image generation by DALL-E 3).
1) So, on one hand, I want to ask a question:
Do people here have experience with loading an image and modifying it using an AI system (or with loading an image and using it as an inspiration for an AI system)? Any recommendations?
2) On the other hand, there are two possible workarounds.
2a) One can use an ASCII art inside GPT-4 to inspire images rendered by DALL-E 3 (it works, but I was not able to obtain the results I would like so far).
2b) One can analyze an image and obtain an image description in one mode of GPT-4 in ChatGPT+, and then open another session and use that description in the prompt to guide DALL-E 3.
Some of the results obtained via method 2b) are quite interesting. I am going to post related images and dialogs as a series of comments.
Update: In November, the ability to read and create images within one session has been added (and, generally speaking, all functionality which has been separated into different type of sessions is now available from a single session, you should be able to ask the system for anything, be it "search the web", or "generate and execute code", or "upload this data, and then generate code to analyze this data, and run it"; these used to require starting different types of sessions, but now all this is available together).
GPT-4 in ChatGPT+ can also generate text prompts and render images via DALL-E 3.
What is missing is the ability to load an image and modify it with DALL-E 3 (or to load the image and use it as an inspiration for image generation by DALL-E 3).
1) So, on one hand, I want to ask a question:
Do people here have experience with loading an image and modifying it using an AI system (or with loading an image and using it as an inspiration for an AI system)? Any recommendations?
2) On the other hand, there are two possible workarounds.
2a) One can use an ASCII art inside GPT-4 to inspire images rendered by DALL-E 3 (it works, but I was not able to obtain the results I would like so far).
2b) One can analyze an image and obtain an image description in one mode of GPT-4 in ChatGPT+, and then open another session and use that description in the prompt to guide DALL-E 3.
Some of the results obtained via method 2b) are quite interesting. I am going to post related images and dialogs as a series of comments.
Update: In November, the ability to read and create images within one session has been added (and, generally speaking, all functionality which has been separated into different type of sessions is now available from a single session, you should be able to ask the system for anything, be it "search the web", or "generate and execute code", or "upload this data, and then generate code to analyze this data, and run it"; these used to require starting different types of sessions, but now all this is available together).
no subject
Date: 2023-10-22 11:49 pm (UTC)E.g. here is someone else's experiment in the similar spirit:
https://twitter.com/ComputingByArts/status/1714792882425725263
no subject
Date: 2023-10-22 11:52 pm (UTC)I don't think Dreamwidth image hosting supports this format.
Instead I am using GitHub Pages-generated web site to store those images.
no subject
Date: 2023-10-22 11:56 pm (UTC)So I am just going to write this down.
In the original session in the default mode I uploaded my avatar I am using here, and I asked:
***
what does this image show?
***
GPT-4 answered (in a rather abbreviated and so-so fashion):
The image appears to be a schematic or diagram showing three vertical sections with triangular symbols in each section. The triangles might represent some sort of signal or waveform. The large curved line enclosing two of the sections suggests a relationship or process involving those two sections. Without additional context, it's difficult to provide a specific interpretation. It could be a representation from various fields such as
***
Then I opened another session, with DALL-E 3, and asked:
***
Please create an image showing a schematic or diagram showing three vertical sections with triangular symbols in each section. The triangles might represent some sort of signal or waveform. The large curved line enclosing two of the sections suggests a relationship or process involving those two sections.
***
I've obtained three images:
no subject
Date: 2023-10-22 11:57 pm (UTC)no subject
Date: 2023-12-06 04:29 am (UTC)Photo of a clean and crisp schematic divided into three vertical sections. In each section, there are triangles representing signals or waveforms. A prominent curved line encloses the first two sections, emphasizing a relationship or process between them.
no subject
Date: 2023-10-22 11:59 pm (UTC)no subject
Date: 2023-12-06 04:30 am (UTC)Illustration of a technical diagram with three distinct vertical sections. Triangular symbols are distributed within each section, symbolizing waveforms. A bold curved line surrounds the first two sections, indicating a special connection or interaction.
no subject
Date: 2023-10-23 12:01 am (UTC)no subject
Date: 2023-12-06 04:31 am (UTC)Vector image of a modern schematic with three columns. Each column has triangles to denote signals or waveforms. An overarching curved line is drawn around the first and second sections, suggesting a unique relationship or process.
no subject
Date: 2023-10-23 12:03 am (UTC)***
ME:
Nice 3 images, thanks! I particularly like the first one.
GPT-4:
You're welcome! I'm glad you liked the first one. If you have any other requests or if there's anything else you'd like to explore, just let me know. I'm here to help!
ME:
I don't see the image number 4, that subwindow displays "Error loading image" instead... Is it because you generated 3 images, but the interface tried to display four images?
GPT-4:
I apologize for the inconvenience. It seems there might have been an issue with the fourth image generation. I did intend to generate four images as per your request. Let me try to generate the fourth image again for you.
***
And I obtained the following image, which became image number 4:
no subject
Date: 2023-10-23 12:04 am (UTC)no subject
Date: 2023-12-06 04:32 am (UTC)Drawing of an organized diagram split into three vertical areas. Within each area, there are triangular shapes illustrating some form of signal or waveform. The first two areas are enveloped by a large curved line, pointing out a significant relationship or operation.
no subject
Date: 2023-10-23 12:05 am (UTC)***
ME:
Awesome, thanks! It's great!
GPT-4
You're welcome! I'm glad you found it satisfactory. If you have any more questions or requests in the future, don't hesitate to reach out. Have a wonderful day!
no subject
Date: 2023-10-23 12:11 am (UTC)paper: https://cdn.openai.com/papers/dall-e-3.pdf
Oct 19 blog post: https://openai.com/blog/dall-e-3-is-now-available-in-chatgpt-plus-and-enterprise
DALL-E 3 page: https://openai.com/dall-e-3
no subject
Date: 2023-10-23 06:17 am (UTC)Some people on Facebook show their examples on how they draw something and then Dalle converts it into a pretty neat image.
no subject
Date: 2023-10-23 06:58 am (UTC)Yes, I don't know about DALL-E 3, but people say that stand-alone DALL-E 2 can do that.
It's a separate account (or a separate service). Although they do give some free usage allowance, and the price is per usage and seems to be quite low (like 2 cents per image).
no subject
Date: 2023-11-02 03:35 pm (UTC)no subject
Date: 2023-11-02 03:36 pm (UTC)"De-Diffusion Makes Text a Strong Cross-Modal Interface", https://arxiv.org/abs/2311.00618
"We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language [...]"