Supervised Graph Neural Network

Graph Convolutional Network

A Graph Convolutional Network (GCN) is a GNN architecture that makes use of convolutions. In particular, the update rule is defined as

Where is the adjacency matrix with an additional self-loop of weight added so that a node’s old representation is also considered.

And is a diagonal matrix defined similar to the degree matrix.

And is an activation function
We can decompose the update rule into Aggregation and Combination as in the General Framework.
- The decomposition is stated as follows:
- Aggregation is done as the weighted average of neighbor node representations.
- The weighting factor is the weight of the edge between and normalized by their respective degrees.
- Combination is defined as the summation of aggregated messages and the node representation itself.
The “Convolution” comes from Spectral Convolutions.

Takes inspiration from The Attention Mechanism. The Graph Attention Network (GAT) makes use of the following layers.
The graph attention layer defines how new node representations attend to old node representations.

We use the shared attention mechanism to derive

To get the attention coefficients, we simply apply softmax
An alternative to the above definition is to use a Leaky ReLU as the activation function so that

Where denotes concatenation of two tensors.
The new representation is then obtained using

The underlying idea behind Neural Message Passing Networks (NMPN) is to simulate message passing between nodes. We define two functions:
- Message which encodes the message between node and in the -th layer based on information in edge .
- Update combines the aggregated messages from the neighbors and the node representation. This functions as Combine in a similar manner to the general framework
More formally, the message at the -th layer for node is

The update is done as follows

Inspired by Diffusion models and Network epidemics.
Define as the regularized characteristic matrix with hyperparameter and degree matrix
The eigenvalues of lie in the interval which means that converges
- controls the diffusion rate.
If we Assume independent feature channels, we have the following propagation equation

Where .

We can unroll this to be equal to

The precise dynamics are then encapsulated using the following differential equation
- In words: The representation of a node is dependent on the influence of neighbors (), how much the node resists external influence () and the inherent characteristics of the node ()
If we relax the assumption such that we allow feature channels to interact with each other we have the following model. Let be the weight matrix that models interactions between feature channels. We have

The dynamics are given by the following differential equation

Where, as before, .

Motivation: The main issue with GCN is that it only propagates information over one step. That is, from a node to its nearest neighbor.
- Naively stacking does not work. First, it may make the network too deep which means Vanishing Gradients.
- Exponentiating the Laplacian is too costly. It is also unclear how to add learnable parameters to the Laplacian.
The Lanczos Network makes use of the Lanczos algorithm to obtain a matrix and compute a low rank approximation of the Laplacian

The column vectors of are denoted . Implicitly, because it is low rank, the size of the approximation is significantly less than the Laplacian.
The update rule is then defined as follows. Let be a learnable spectral filter parameterized by . Also, let

We have

And

Where means the representation at the -th layer.
The main limitation of this approach is that it uses a low rank approximation