-
We can use a simple dense layer to extract outputs from
. This gives us Where
, and for labels. This can then be optimized using a Loss Function on each
assuming ground truth Where
is the number of labelled vertices. The loss function is then optimized with backpropagation.
Graph Convolutional Network
-
A Graph Convolutional Network (GCN) is a GNN architecture that makes use of convolutions. In particular, the update rule is defined as
Where
is the adjacency matrix with an additional self-loop of weight added so that a node’s old representation is also considered.And
is a diagonal matrix defined similar to the degree matrix.And
is an activation function -
We can decompose the update rule into Aggregation and Combination as in the General Framework.
- The decomposition is stated as follows:
- Aggregation is done as the weighted average of neighbor node representations.
- The weighting factor
is the weight of the edge between and normalized by their respective degrees. - Combination is defined as the summation of aggregated messages and the node representation itself.
- The decomposition is stated as follows:
-
The “Convolution” comes from Spectral Convolutions.
Graph Attention Network
-
Takes inspiration from The Attention Mechanism. The Graph Attention Network (GAT) makes use of the following layers.
-
The graph attention layer defines how new node representations attend to old node representations.
We use the shared attention mechanism
to deriveTo get the attention coefficients, we simply apply softmax
-
An alternative to the above definition is to use a Leaky ReLU as the activation function so that
Where
denotes concatenation of two tensors. -
The new representation is then obtained using
Neural Message Passing Networks
-
The underlying idea behind Neural Message Passing Networks (NMPN) is to simulate message passing between nodes. We define two functions:
- Message
which encodes the message between node and in the -th layer based on information in edge . - Update
combines the aggregated messages from the neighbors and the node representation. This functions as Combine in a similar manner to the general framework
- Message
-
More formally, the message at the
-th layer for node isThe update is done as follows
Continuous Graph Neural Networks
-
Inspired by Diffusion models and Network epidemics.
-
Define
as the regularized characteristic matrix with hyperparameter and degree matrix -
The eigenvalues of
lie in the interval which means that converges controls the diffusion rate.
-
If we Assume independent feature channels, we have the following propagation equation
Where
.We can unroll this to be equal to
The precise dynamics are then encapsulated using the following differential equation
- In words: The representation of a node is dependent on the influence of neighbors (
), how much the node resists external influence ( ) and the inherent characteristics of the node ( )
- In words: The representation of a node is dependent on the influence of neighbors (
-
If we relax the assumption such that we allow feature channels to interact with each other we have the following model. Let
be the weight matrix that models interactions between feature channels. We haveThe dynamics are given by the following differential equation
Where, as before,
.
Lanczos Network
-
Motivation: The main issue with GCN is that it only propagates information over one step. That is, from a node to its nearest neighbor.
- Naively stacking does not work. First, it may make the network too deep which means Vanishing Gradients.
- Exponentiating the Laplacian is too costly. It is also unclear how to add learnable parameters to the Laplacian.
-
The Lanczos Network makes use of the Lanczos algorithm to obtain a matrix
and compute a low rank approximation of the LaplacianThe column vectors of
are denoted . Implicitly, because it is low rank, the size of the approximation is significantly less than the Laplacian. -
The update rule is then defined as follows. Let
be a learnable spectral filter parameterized by . Also, letWe have
And
Where
means the representation at the -th layer. -
The main limitation of this approach is that it uses a low rank approximation