-
A Large Language Model is a machine learning model for language embedding and text generation and is characterized by the large number of parameters (on the order of hundreds of billions) and being trained on very large datasets (trillions of tokens)
-
It is debatable whether or not LLMs actually learn to reason or whether they simply perform sophisticated pattern recognition.
-
When pre-training an LLM, we often fill the full context window with text. To make sure to separate different text snippets (i.e., separate documents), we add EOS tokens in between.
-
For the training corpus, it is desirable to filter
- Sources with personal identifiable information
- Boilerplate text
- Duplicate text or documents.
Key Facts
- LLMs are capable of being fine-tuned using only a small amount of data. This makes them more economical to use (ignoring the hardware costs)
- LLMs can be preconditioned to take or speak in certain roles or tuned for specific tasks.
- LLMS scale not only due to the amount of parameters but also on the amount of training data used.
- At scale, LLMs have a bunch of useful, emergent behavior such as:
- In-context learning (Language Models are Few-Shot Learners by Brown et. al, (Jul. 22, 2020)|Brown et. al (2020)) and zero shot generalization to unseen tasks.
- Instruction Tuning Amenability (Finetuned Language Models are Zero Shot Learners by Wei et al. (Feb 8, 2022)|Wei et al (2022))
- Chinchilla scaling (i.e., increased performance with more tokens used on training) (Training Compute-Optimal Large Language Models by Hoffmann et. al (Mar 29, 2022)|Hoffman et. al (2022))
- Chain of Thought Prompting (Chain-Of-Thought Prompting Elicits Reasoning in Large Language Models by Wei et. al (Jan 10, 2023)|Wei et al (2023))
- LLMs acquire the biases that are present in the training sets.
Workflows
-
1 suggests that when building complex workflows we can get good results when we exploit the increasing context length of models (i.e., use many shot learning or very fine tuned prompts).
This leads to a pipeline
- Quick and simple prompts.
- Iteratively flesh out the prompt based on where the output falls short. This may lead to mega prompts
- Consider few-shot or many-shot learning, or ifne tuning.
- Breakdown the task into subtasks and use an agentic workflow
Variants
- According to 2
- Left-to-right LMs are the most commonly used. They scan in the manner of an Encoder
- Masked LMs are used for bidirectional contexts, similar to that of a decoder.
- Prefix Language Models are left-to-right LMs that decodes an output conditioned on an input, which is encoded by the same model parameters but with a fully connected mask and possibly some corruption on the input.
- Encoder-Decoder architectures mimic the full transformer.
Topics
- Language Model Sampling
- LLM Fine Tuning
- Prompt Engineering - an increasingly important technique in using LLMs which involves tuning the input prompts.
- Instruction Tuning - all about instruction tuning, a technique to get an NLP model to understand instructions.
Foundational Models
Links
-
Transformer Model - discusses one of the most common mechanisms used in LLMs.
-
Language Model - a discussion on some of the earlier and smaller Language Models.
-
How To Generate - Hugging Face article on decoding strategies
-
Flowise - A tool for working with LLMs.