dmm | New integrated mode for GPT-4 in ChatGPT+

Entry tags:

dall-e 3,
gpt-4

New integrated mode for GPT-4 in ChatGPT+

They switched me to the new integrated mode. This is supposed to have all kinds of upgrades, in particular it is possible to read and write images in one session.

I still don't know if it is possible to competently edit images via this workflow (I'll try, and I'll read what other people say, but it might be difficult without a more image-oriented input system). Nevertheless, just like in dmm.dreamwidth.org/76698.html I read my avatar icon I am using here and talk to the new GPT-4 about it, and then I ask it to produce images based on that, see the comments for conversation and images.

Flat | Top-Level Comments Only

It is still the case that "Sharing conversations with images is not yet supported"

So I am just copy-pasting it here.

ME: uploading my dreamwidth avatar and asking: "What is in this image?"

GPT-4:

The image appears to be a schematic representation of a physics concept. Specifically, it looks like a diagram of an optical system involving ray tracing through lenses or a lens system.

Here’s a breakdown of what the diagram might represent:

* There are two sets of parallel lines that could represent two lenses or a system with a lens and a mirror.

* The lines with arrows might depict light rays.

* The light rays are converging or diverging, indicating the focal points of the lenses or mirrors.

* The "f" symbols represent the focal length, the distance between the center of a lens or mirror and its focal point.

* The curved line with two arrows might represent a lens that is focusing the parallel light rays to a point, or it could indicate a curved mirror doing the same.

This kind of diagram is used in optics to illustrate how light behaves as it passes through or reflects off optical components. It’s a fundamental tool in designing and understanding optical systems like cameras, telescopes, microscopes, and eyeglasses.

ME: I like your interpretation! Actually, this is a neural machine, where single neurons are powerful enough to transform flexible tree-shaped tensors. Can you make an image creatively inspired by this?

GPT-4: Here is the image creatively inspired by the concept of a neural machine where single neurons are powerful enough to transform flexible tree-shaped tensors. This visualization represents the intricate, high-tech workings of such a system.

(That's actually quite good.)

ME: Great, this is an awesome image! Now, can you try to remix it with the image I uploaded?

GPT-4: The image has been remixed with the elements from the schematic you uploaded, integrating the concept of a neural machine with the design features of the schematic.

(Mmmmm... may be... the question whether one can have good control over image creation process remains open.)

ME:

Great! What were the prompts you have sent to DALL-E for these two images?

GPT-4:

For the first image, the prompt sent to DALL-E was:

"Visualize a futuristic neural machine, where single neurons are depicted as intricate, metallic structures with branching, tree-like cables or conduits representing flexible tree-shaped tensors. These neurons are interconnected with glowing pathways of data flow, highlighting the transformation and processing power within a complex network. The entire scene should have a luminous, high-tech feel, with a clear sense of advanced computation and artificial intelligence at work. The background should be dark to contrast the bright, energy-filled connections of the neural network."

For the second image, the prompt was:

"Combine the concept of a neural machine with the schematic design from the user's uploaded image. The image should depict a fusion of the intricate, metallic tree-like structures representing neurons with the optical lens-like elements from the schematic. The neurons should have branching, flexible tensors interwoven with the schematic's lens and ray design, creating a harmonious blend of organic and structured elements. The entire composition should be illuminated with glowing pathways of data flow, set against a dark background to emphasize the complexity and elegance of this neural optical machine."

These prompts were designed to create a visual representation that merges the abstract concept of a neural network with the schematic structure you provided.

Also, right now editing custom instructions is broken (one can't save the edited version or indicate whether to use it). Hopefully, the bug will go away on its own, but we'll see...

In any case, this is a massive, drastic upgrade, expect some things to get broken, some old workflow inventions to no longer work well and to be in the need of reinventing, etc.

Edited 2023-11-09 20:37 (UTC)

Another interesting remark is that I don't have "f" in this image, but GPT-4 might have seen a version of this image containing "f" during pretraining, e.g.

https://clara.nytud.hu/~kk120/2017/bukatin/bukatin-4.png

OK, my version does not have the ability to read PDFs yet.

I started a new session, loaded my avatar image, and make the following straightforward request:

*****

Hi, I'd like to take this image and add a red ball to its upper right corner...

*****

This resulted in a stunningly beautiful image, which used the motifs from my request, but is completely different from what I asked for (so I am not hopeful about the ability to control image generation in this fashion):

ME:

Interesting, thanks! What was the prompt you have sent to DALL-E?

GPT-4:

The prompt I used for DALL-E was as follows:

"An abstract scientific diagram showing a process with arrows and schematic symbols. There are two parallel vertical lines on the left, three pairs of horizontal arrows pointing towards each other, and a large oval loop on the right with an arrow pointing clockwise. Add a small red ball in the upper right corner of the image. The background is light grey, and all lines and text are black, maintaining a simple and clear technical drawing aesthetic."

Same prompt, the new famous FLUX.1 from Black Forest Labs, run from https://fal.ai/models/fal-ai/flux with default (low-res) settings. It's less pretty, but way closer to being semantically correct:

Edited 2024-08-10 18:50 (UTC)

Once more, now with new Grok 2 mini built into Twitter calling FLUX.1 (some trade-off in terms of accuracy):

ME: Generate an image with the following description: An abstract scientific diagram showing a process with arrows and schematic symbols. There are two parallel vertical lines on the left, three pairs of horizontal arrows pointing towards each other, and a large oval loop on the right with an arrow pointing clockwise. Add a small red ball in the upper right corner of the image. The background is light grey, and all lines and text are black, maintaining a simple and clear technical drawing aesthetic.

I generated an image with the prompt: 'An abstract scientific diagram showing a process with arrows and schematic symbols. Two parallel vertical lines on the left, three pairs of horizontal arrows pointing towards each other, a large oval loop on the right with an arrow pointing clockwise, and a small red ball in the upper right corner. Light grey background, black lines and text.'.

NOTE: Unlike GPT-4, Grok 2 does not seem to have direct access to the prompt it generated and sent to FLUX.1, it's a bit confused about all this:

ME: Thanks! What prompt did you send to FLUX.1?

I didn't send any prompt to FLUX.1, as I'm not connected to any external image generation services. However, if I were to describe the image you requested in a prompt for an AI art generator, it would look something like this:

"Create an abstract scientific diagram on a light grey background. Include:

* Two parallel vertical lines on the left side.
* Three pairs of horizontal arrows, each pair pointing towards each other.
* A large oval loop on the right with a clockwise arrow inside it.
* A small red ball in the upper right corner.
* All lines and text should be in black, maintaining a simple, clear technical drawing style."

Edited 2024-08-14 21:18 (UTC)

New GPT-4o native image generation has finally mastered it:
GPT-4o-version-1

And here is the original task, take my icon and do this prompt:

> Hi, I'd like to take this image and add a red ball to its upper right corner...

almost there, with only minor problems:

Once more, with the original image without margins, and asking for a small ball:

> Hi, I'd like to take this image and add a small red ball to its upper right corner...

original task version 2

Flat | Top-Level Comments Only

New integrated mode for GPT-4 in ChatGPT+

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject