One thing I did learn: the quality boost in the text-only tasks from using multi-modal models is very small.
(Of course, we don't know if that's always the case, and, in particular, whether it's the case for GPT-4, although this super-long evaluation paper, "Sparks of Artificial General Intelligence: Early experiments with GPT-4", https://arxiv.org/abs/2303.12712 has been done with non-multi-modal version of GPT-4, so it might be the case also for GPT-4.)
no subject
https://phildeeplearning.github.io/
https://phildeeplearning.github.io/streaming
no subject
no subject
(Of course, we don't know if that's always the case, and, in particular, whether it's the case for GPT-4, although this super-long evaluation paper, "Sparks of Artificial General Intelligence: Early experiments with GPT-4", https://arxiv.org/abs/2303.12712 has been done with non-multi-modal version of GPT-4, so it might be the case also for GPT-4.)