dmm | Neural nets: very small and very large

arxiv.org/abs/2005.13092 "Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search" (Uber AI Labs)

"In the Synthetic Petri Dish, architectural motifs are instantiated in very small networks and evaluated using very few learned synthetic data samples (to effectively approximate performance in the full problem). The relative performance of motifs in the Synthetic Petri Dish can substitute for their ground-truth performance, thus accelerating the most expensive step of NAS (Neural Architecture Search)."

arxiv.org/abs/2005.14165 "Language Models are Few-Shot Learners" (OpenAI)

"Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic."

(Продолжается победное шествие моделей с архитектурой Transformer: en.wikipedia.org/wiki/Transformer_(machine_learning_model) )

Flat | Top-Level Comments Only

From:

dmm

For the first paper: https://github.com/uber-research/Synthetic-Petri-Dish

Via https://twitter.com/AdityaRawaI/status/1266023861398630401

For the second paper, summary thread:

https://twitter.com/nottombrown/status/1266188687219384320

Discussion: https://twitter.com/mark_riedl/status/1266439682562326528

Note that GPT-3 is not released. The GitHub repository associated with that paper only contains a bit of relevant experimental data. The model is kept private to OpenAI for now.

Discussion of its "Limitations" section:

https://www.zdnet.com/google-amp/article/openais-gigantic-gpt-3-hints-at-the-limits-of-language-models-for-ai/

In particular, this deliberately tricky set is difficult: https://arxiv.org/abs/1910.14599 "Adversarial NLI: A New Benchmark for Natural Language Understanding"

Edits propagate over feeds: I've edited this post, adding a reference to Wikipedia article on Transformers, and this edit started to show up on

https://dmm-dream-atom.livejournal.com/

and

https://dmm-dream-rss.livejournal.com/

So, it's not a one-time export; changed might be pushed forward, depending on the situation.

Dataflow matrix machines (by Anhinga anhinga)

Neural nets: very small and very large

Neural nets: very small and very large

no subject

no subject

no subject

no subject

no subject

no subject

Profile

May 2025

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags