dmm | Entries tagged with large language models

I think humans tend to have these idée fixes:

www.anthropic.com/news/golden-gate-claude

A pretty interesting example of conversation by a very experienced interlocutor:

www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform#hyXAiafwbwfEPiX5Q

The main difference between classical Transformers and the new generation (which includes GPT-4) is that the new generation seems to be "mixtures-of-experts", with each of their feedforward layers subdivided into "experts" and only some of its "experts" activated on each inference run.

I think this is key to both GPT-4 and Mixtral 8x7B (which I suspect is approximately an open-source mini-GPT-4 and which is the new leading open-source model roughly equivalent to GPT-3.5 in performance).

Of course, GPT-4 might have some extra magic secret sauce besides being (according to rumors) a "mixture-of-experts" and scale (given how difficult it has been to even reproduce its performance so far).

Hugging Face published a very nice tutorial recently: huggingface.co/blog/moe

For the 4K context version. OpenAI says that it is often better than GPT-4 for a specific application.

Price to fine-tune a model is very reasonable (but depends on the size of your training set).

The cost to use the resulting model is much higher than using the un-finetuned GPT-3.5, though it's still cheaper than using GPT-4.

Obviously, you only can use a fine-tuned model via API, not via standard Web interface.

OpenAI says the ability to fine-tune 16K context and the ability to fine-tune GPT-4 are coming.

This is a good starting point:

"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html

simons.berkeley.edu/workshops/large-language-models-transformers

Youtube livestream (and, presumably, post-conference youtube recording) are available.

Some of the talks look really interesting.

"This is a one-file Rust implementation of Llama2 that works pretty well. It's Rust port of Karpathy's llama2.c"

"I don't actually know Rust, but it seems good. I had gpt translate the code, and then I fixed the compiler errors until it ran fast. Would love a code-review if anyone has time."

"Generally you get a lot of memory safety things for free in the conversion. One unsafe part is memory mapping to load in the model, my code is a bit sloppy (but llama2.c is way grosser)."

"One part that is cool is that Rust data parallel (rayon) forces you to protect against double writes with the borrow checker. This fixed one of my bugs in parallel multiheaded attention. (We're not all karpathy-level)"

Через ChatGPT+, за двадцатку в месяц. В общем, разница между этой штукой и ChatGPT огромная, и нет смысла ещё тянуть время.

Сразу она начала с предупреждения: "GPT-4 currently has a cap of 25 messages every 3 hours. Expect significantly lower caps, as we adjust for demand."

Но когда я попросил её дать мне советы по моему проекту, она очень мило выступила, технически грамотно в конкретном контексте, и совсем не всё, что она сказала, было общим местом (и, в любом случае, она явно шире видит, чем я; я даже почти всё это "в принципе знаю", и с чем-то могу и не согласиться, но, с другой стороны, оно всё по делу, большую часть этого дела я бы и не вспомнил).

I am not sure what (I missed the latest part of the story). But here is a beautiful petition on change.org which says this:

*****
Waluigi has been scorned by Nintendo yet again, being left out of the roster of Super Smash Bros Ultimate. However, there is still a chance for Waluigi to get his rightly deserved place in the spotlight. Waluigi should appear in the next edition of Higher Algebra.

Indeed, Waluigi fits naturally into the framework of stable ∞-categories, and would probably have been incorporated long ago were Nintendo not so notoriously protective of their copyright. For example, the discussion of the Waldhausen construction in §1.2.2 generalizes without much additional effort to the WAHldhausen construction. It is also worth noting that a careful treatment of the WAHll finiteness obstruction from the ∞-categorical perspective is sorely lacking from the literature.
*****

(I've read the original Waluigi effect paper. I am going to write more about all this in the comments.)

Эти системы - это не то, что традиционно было принято понимать под AI.

Это системы нового типа - симуляторы. Система "создаёт виртуальную реальность с персонажами", и мы общаемся не с самой системой, а с этими персонажами.

Эти персонажи могут иметь очень разный стиль и очень разные способности: они могут быть умными, глупыми, правдивыми, лживыми, быть похожими на хороших или плохих профессионалов той или иной профессии, и так далее.

Искусство "prompt engineering" как раз и состоит в том, чтобы создать интересных персонажей-собеседников, с теми качествами, которые нам бы хотелось, чтобы они имели. "Prompt engineering" - это очень нетривиальная быстро развиваюшаяся область. Со временем, будут разные завертки, где будет "невидимая пользователю part of the promt", настраивающая систему тем, или иным образом. Но сейчас надо творчески экспериментировать, чтобы получалось по-настоящему интересно.

То, что создано, это не AI в традиционном смысле, а скорее очень хорошо обученная "ткань искусственного мозга, не заполненная изначально никакими личностями и никакими фиксированными свойствами характера"; и искусство состоит в том, чтобы в ней возникали интересные/желанные нам персонажи и динамики.

~~P.S. I am having a local ongoing intermittent internet outage which sucks (replies might be slow).~~

A huge field already: github.com/dair-ai/Prompt-Engineering-Guide

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Dataflow matrix machines (by Anhinga anhinga)

Entries tagged with large language models

Golden Gate Claude

"Mixture-of-Experts" and Transformers

Fine-tuning for GPT-3.5 Turbo is now available

Let's understand Large Language Models better

Large Language Models and Transformers (a workshop this week)

Sasha Rush invites Rust specialists to code-review Llama2.rs

Начал экспериментировать с GPT-4

Something is happening around Waluigi effect

Напоминание (если вы экспериментируете с ChatGPT или похожими моделями)

Prompt engineering guide

Profile

May 2025

Syndicate

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags