Neural nets: very small and very large
May. 29th, 2020 04:49 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
arxiv.org/abs/2005.13092 "Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search" (Uber AI Labs)
"In the Synthetic Petri Dish, architectural motifs are instantiated in very small networks and evaluated using very few learned synthetic data samples (to effectively approximate performance in the full problem). The relative performance of motifs in the Synthetic Petri Dish can substitute for their ground-truth performance, thus accelerating the most expensive step of NAS (Neural Architecture Search)."
arxiv.org/abs/2005.14165 "Language Models are Few-Shot Learners" (OpenAI)
"Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic."
(Продолжается победное шествие моделей с архитектурой Transformer: en.wikipedia.org/wiki/Transformer_(machine_learning_model) )
"In the Synthetic Petri Dish, architectural motifs are instantiated in very small networks and evaluated using very few learned synthetic data samples (to effectively approximate performance in the full problem). The relative performance of motifs in the Synthetic Petri Dish can substitute for their ground-truth performance, thus accelerating the most expensive step of NAS (Neural Architecture Search)."
arxiv.org/abs/2005.14165 "Language Models are Few-Shot Learners" (OpenAI)
"Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic."
(Продолжается победное шествие моделей с архитектурой Transformer: en.wikipedia.org/wiki/Transformer_(machine_learning_model) )
no subject
Date: 2020-05-29 10:40 pm (UTC)Via https://twitter.com/AdityaRawaI/status/1266023861398630401
no subject
Date: 2020-05-29 10:48 pm (UTC)https://twitter.com/nottombrown/status/1266188687219384320
no subject
Date: 2020-05-30 04:24 pm (UTC)no subject
Date: 2020-05-30 06:59 pm (UTC)no subject
Date: 2020-06-02 05:32 am (UTC)https://www.zdnet.com/google-amp/article/openais-gigantic-gpt-3-hints-at-the-limits-of-language-models-for-ai/
In particular, this deliberately tricky set is difficult: https://arxiv.org/abs/1910.14599 "Adversarial NLI: A New Benchmark for Natural Language Understanding"
no subject
Date: 2020-05-30 02:45 pm (UTC)https://dmm-dream-atom.livejournal.com/
and
https://dmm-dream-rss.livejournal.com/
So, it's not a one-time export; changed might be pushed forward, depending on the situation.