May. 29th, 2020

dmm: (Default)
arxiv.org/abs/2005.13092 "Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search" (Uber AI Labs)

"In the Synthetic Petri Dish, architectural motifs are instantiated in very small networks and evaluated using very few learned synthetic data samples (to effectively approximate performance in the full problem). The relative performance of motifs in the Synthetic Petri Dish can substitute for their ground-truth performance, thus accelerating the most expensive step of NAS (Neural Architecture Search)."

arxiv.org/abs/2005.14165 "Language Models are Few-Shot Learners" (OpenAI)

"Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic."

(Продолжается победное шествие моделей с архитектурой Transformer: en.wikipedia.org/wiki/Transformer_(machine_learning_model) )

Profile

dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga)

May 2025

S M T W T F S
    123
456 78910
11 121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 9th, 2025 10:56 am
Powered by Dreamwidth Studios