- 1 present BlockBERT an efficient BERT model for modeling long-distance dependencies using sparse block structures.
- Rationale: Memory is a bottleneck in training BERT. In particular, it grows
, quadratically with respect to sequence length. - It promises to be simpler than the approach by [^child_2019]
- We design the masking matrix
used in the formulation for (masked) attention to be a sparse block matrix.
- Rationale: Memory is a bottleneck in training BERT. In particular, it grows
LLMs
- Language Models are Few-Shot Learners by Brown et. al, (Jul. 22, 2020)
- LLaMA- Open and Efficient Foundation Language Models by Touvron et. al (Feb 27, 2023)
- OpenAGI—When LLM Meets Domain Experts by Ge et. al (Apr 12, 2023)
- Pre-train Prompt and Predict- A systematic survey of prompting methods in Natural Language Processing by Liu et. al (Jul 28, 2021) - A survey of different prompting techniques.
- Chain-Of-Thought Prompting Elicits Reasoning in Large Language Models by Wei et. al (Jan 10, 2023)
Knowledge
-
A Moore Graph is a graph with diameter
and girth . -
A generalized polygon is a bipartite graph with diameter
and girth .
Exercises
Top
-
- ODEs
- PDEs
Hold
- 2 shows the use of MARL for analyzing and predicting the evolution of social networks. Each node represents a rational agent in an RL setting.
- The goal is to design explainable reward and policy functions. Each agent’s policy is to add or remove edges or change their attributes.
- The NetEvolve system consists of three phases:
- Learn the reward function for each node.
- The reward function consists of a linear combination of interpretable features and represents the desirability of the network to each node.
- The weights used in the reward function are learnt.
- Optimization is done by assuming that the input time series evolution of the network is optimized.
- Learn the policy for each node. The policy expresses the tendency to change attributes and edges.
- Predict future networks based on the multi-agent simulation using learned policies.
- Learn the reward function for each node.
- [^Weil_2024] introduces a decentralized approach to MARL using a graph-based message passing algorithm to pass agent states to their neighbors.
- In this approach, agents form a communication network. Agents pass their local states to their neighbors in the network, and aggregate incoming messages to form a local observation of the entire network.
- This approach can be used with any RL training algorithm by performing the message passing step in each episode and augmenting agent observations with the local graph observation.
[^Weil_2024] Wel et al. (2024) Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing
Footnotes
-
Qu et al. (2019) Blockwise Self-Attention for Long Document Understanding ↩
-
Miyake et al. (2024) NetEvolve: Social Network Forecasting using Multi-Agent Reinforcement Learning with Interpretable Features ↩