Resources:
Main contributions
autoregressive language model with 175B parrameters, 10x larger than any previous lmsincontext learning and showed competitive performance<aside> 💡
Zero shot learning
Key concepts:
Example
Prompt: “Summarize the following article: [Insert article text here].”
task-agnostic- model’s ability to perfrom wide variety of tasks without being specifically trained or fine tuned on those tasks
Intially, word vectors(from word2vec,glove) were used to create a single layer representations, which were then fed to task-specific architectures
Later, RNNs with multiple layers and contextual states enhanced these representations (though still fed to task-specific architecutures)
More recently, pre-trained recurrent or transformer models, such as BERT, have been finetuned direclty for tasks, eliminating the need for task-specific architecures
this finetuning paradigm has led to an incredible progress in NLP (question asnwering, textual entailment, reading comprehension etc etc).
However, a major limitation to this approach is that while the architecture is task-agnostic, fine-tuning still requires task-specific datasets and task-specific finetuning
This process should be repeated for every new task