¹ present BlockBERT an efficient BERT model for modeling long-distance dependencies using sparse block structures.
- Rationale: Memory is a bottleneck in training BERT. In particular, it grows , quadratically with respect to sequence length.
- It promises to be simpler than the approach by [^child_2019]
- We design the masking matrix used in the formulation for (masked) attention to be a sparse block matrix.

Language Models are Few-Shot Learners by Brown et. al, (Jul. 22, 2020)
LLaMA- Open and Efficient Foundation Language Models by Touvron et. al (Feb 27, 2023)
OpenAGI—When LLM Meets Domain Experts by Ge et. al (Apr 12, 2023)
Pre-train Prompt and Predict- A systematic survey of prompting methods in Natural Language Processing by Liu et. al (Jul 28, 2021) - A survey of different prompting techniques.
Chain-Of-Thought Prompting Elicits Reasoning in Large Language Models by Wei et. al (Jan 10, 2023)

Knowledge

Exercises

[^Weil_2024] introduces a decentralized approach to MARL using a graph-based message passing algorithm to pass agent states to their neighbors.
- In this approach, agents form a communication network. Agents pass their local states to their neighbors in the network, and aggregate incoming messages to form a local observation of the entire network.
- This approach can be used with any RL training algorithm by performing the message passing step in each episode and augmenting agent observations with the local graph observation.