dmm | Entries tagged with gpt-4

1 year since GPT-4 launch. Everything is progressing rapidly, the last few weeks have been intense.

GPT-4 now has some real competition from Claude 3 Opus and from Gemini models.

It also has some problems: in its default multimodal configuration, the system prompt is too long, and that can interfere with its thinking.

There is a URL for a non-multimodal "Classic" version, which might be better in this sense: chat.openai.com/g/g-YyyyMT9XH-chatgpt-classic

The main difference between classical Transformers and the new generation (which includes GPT-4) is that the new generation seems to be "mixtures-of-experts", with each of their feedforward layers subdivided into "experts" and only some of its "experts" activated on each inference run.

I think this is key to both GPT-4 and Mixtral 8x7B (which I suspect is approximately an open-source mini-GPT-4 and which is the new leading open-source model roughly equivalent to GPT-3.5 in performance).

Of course, GPT-4 might have some extra magic secret sauce besides being (according to rumors) a "mixture-of-experts" and scale (given how difficult it has been to even reproduce its performance so far).

Hugging Face published a very nice tutorial recently: huggingface.co/blog/moe

arxiv.org/abs/2103.06376

page 2: "Semi-ring dictionaries realize the well-known connection between relations and tensors" (from "In-Database Learning with Sparse Tensors" 2016-2018 paper)

"Are there evolutionary models which try to explain why the tendency to engage in self-deception and other numerous cognitive biases are not pruned away by evolutionary selection?"

They switched me to the new integrated mode. This is supposed to have all kinds of upgrades, in particular it is possible to read and write images in one session.

I still don't know if it is possible to competently edit images via this workflow (I'll try, and I'll read what other people say, but it might be difficult without a more image-oriented input system). Nevertheless, just like in dmm.dreamwidth.org/76698.html I read my avatar icon I am using here and talk to the new GPT-4 about it, and then I ask it to produce images based on that, see the comments for conversation and images.

"Preliminary results indicate that GPT-4 fine-tuning requires more work to achieve meaningful improvements over the base model compared to the substantial gains realized with GPT-3.5 fine-tuning."

On November 6, OpenAI will host a "DevDay", and some of it will be livestreamed.

This might be a landmark event (by some preliminary indications).

In particular, my prediction is that the ability fine-tune GPT-4 will be open to the public, and that this functionality will enable creation of specialized systems which are much more powerful than GPT-4, and that some of the magic will be demonstrated during the livestream.

We'll see if this prediction turns true. I record some relevant links and information in the comments.

Livestream link: www.youtube.com/watch?v=U9mJuUkhUzk

Update: One can watch the recording (45 min, if you'd like the transcript, you need to switch manually from "auto-generated" to close-caption-induced. like CC1 or DTVCC1).

Tons of upgrades (GPT-4 Turbo with 128K context and tons of other things, including making the API engineering easier) and major API price cuts (if they manage that without quality degradation, that would be a major step forward).

With fine-tuning: opening fine-tuning for GPT-3.5 Turbo 16K context and inviting active fine-tuning users to apply to the experimental GPT-4 fine-tuning program (so they are going very cautiously, I grade my prediction as 50% only; they are in the process of opening it to the public, but they are afraid of its potential and will go slowly; they also have chosen not to show-case fine-tuning at all; they've show-cased all kinds of things, but they don't want to encourage fine-tuning too much at this moment, because it is so uncontrollable).

openai.com/blog/new-models-and-developer-products-announced-at-devday

openai.com/blog/introducing-gpts

GPT-4 in ChatGPT+ can now read and analyze images, one can upload an image and discuss it with GPT-4.

GPT-4 in ChatGPT+ can also generate text prompts and render images via DALL-E 3.

What is missing is the ability to load an image and modify it with DALL-E 3 (or to load the image and use it as an inspiration for image generation by DALL-E 3).

1) So, on one hand, I want to ask a question:

Do people here have experience with loading an image and modifying it using an AI system (or with loading an image and using it as an inspiration for an AI system)? Any recommendations?

2) On the other hand, there are two possible workarounds.

2a) One can use an ASCII art inside GPT-4 to inspire images rendered by DALL-E 3 (it works, but I was not able to obtain the results I would like so far).

2b) One can analyze an image and obtain an image description in one mode of GPT-4 in ChatGPT+, and then open another session and use that description in the prompt to guide DALL-E 3.

Some of the results obtained via method 2b) are quite interesting. I am going to post related images and dialogs as a series of comments.

Update: In November, the ability to read and create images within one session has been added (and, generally speaking, all functionality which has been separated into different type of sessions is now available from a single session, you should be able to ask the system for anything, be it "search the web", or "generate and execute code", or "upload this data, and then generate code to analyze this data, and run it"; these used to require starting different types of sessions, but now all this is available together).

Let's see whether github hosting for *.webp images is OK to render them here, in the post, and in the comments.

test image

A good way to mark this occasion is to try to read a new paper which seems to be a major breakthrough in understanding and harnessing the magic of Transformers:

"Uncovering mesa-optimization algorithms in Transformers"

"we demonstrate that minimizing a generic autoregressive loss gives rise to a subsidiary gradient-based optimization algorithm running inside the forward pass of a Transformer. This phenomenon has been recently termed mesa-optimization"

"Moreover, we find that the resulting mesa-optimization algorithms exhibit in-context few-shot learning capabilities,
independently of model scale. Our results therefore complement previous reports characterizing the
emergence of few-shot learning in large-scale LLMs"

Что делать, если в упор не видишь, чем отличаются две строчки друг от друга? Надо спросить GPT-4!

(А недавно она меня научила, как делать immutable programming в Питоне (без этого autodiff in JAX не будет правильно работать). Сделала довольно несовершенные, но в целом правильные наброски, а я вообще не знал там, с какой стороны к делу подойти. И никто не научит, потому что не совсем обычная задачка.)

Подключили этот режим для GPT-4 (если его включить, система может исполнять сгенерированные ей же Питоновские скрипты и, при желании, обрабатывать файлы, которые можно временно подгрузить в систему).

Меня как раз попросили посмотреть, как GPT-4 справится с решением математической задачки ("What is the largest natural number that is the product of natural numbers that add up to 1976?"), и это был удобный момент протестировать этот режим.

GPT-4 справилась очень неплохо (есть мелкие огрехи в рассуждениях, но общий ход мысли правильный, и вычисления правильные, так что отметка "A-" (5-); и, пожалуй, она соображает лучше меня, если говорить про мои и её способности решить эту задачу).

External functionality gradually improves (ability to use built-in web browsing, plugin access, and now links):

"OpenAI is rolling out the ability to generate and share a link to a ChatGPT conversation.

And then anyone (who bothers to have a ChatGPT account) can continue this conversation, so interesting trees are possible.."

On the other hand, people seem to be saying that latest optimizations are making the model faster, but not as good...

Попросил её проанализировать 200 с хвостиком строк кода, удалив практически все комментарии.

Меня поразило, что она сделала правильное обобщение, о чём эта программа:

The main idea of this program is to implement a neural network-like engine using nested dictionaries and custom-defined mathematical operations, allowing for flexible and extensible data processing.

Ну, то есть, там, в именах переменных используются слова типа to_neuron, from_neuron, neuron_name и activation_functions, но всё равно круто...

(Still waitlisted)

People say:

• AI-generated answers from code docs
• Chat interface for code suggestions
• Copilot for the command line
• Voice interface with Copilot
• Copilot for pull requests

Через ChatGPT+, за двадцатку в месяц. В общем, разница между этой штукой и ChatGPT огромная, и нет смысла ещё тянуть время.

Сразу она начала с предупреждения: "GPT-4 currently has a cap of 25 messages every 3 hours. Expect significantly lower caps, as we adjust for demand."

Но когда я попросил её дать мне советы по моему проекту, она очень мило выступила, технически грамотно в конкретном контексте, и совсем не всё, что она сказала, было общим местом (и, в любом случае, она явно шире видит, чем я; я даже почти всё это "в принципе знаю", и с чем-то могу и не согласиться, но, с другой стороны, оно всё по делу, большую часть этого дела я бы и не вспомнил).