• We can use a simple dense layer to extract outputs from . This gives us

    Where , and for labels.

    This can then be optimized using a Loss Function on each assuming ground truth

    Where is the number of labelled vertices.

    The loss function is then optimized with backpropagation.

Graph Convolutional Network

  • A Graph Convolutional Network (GCN) is a GNN architecture that makes use of convolutions. In particular, the update rule is defined as

    Where is the adjacency matrix with an additional self-loop of weight added so that a node’s old representation is also considered.

    And is a diagonal matrix defined similar to the degree matrix.

    And is an activation function

  • We can decompose the update rule into Aggregation and Combination as in the General Framework.

    • The decomposition is stated as follows:
    • Aggregation is done as the weighted average of neighbor node representations.
    • The weighting factor is the weight of the edge between and normalized by their respective degrees.
    • Combination is defined as the summation of aggregated messages and the node representation itself.
  • The “Convolution” comes from Spectral Convolutions.

Graph Attention Network

  • Takes inspiration from The Attention Mechanism. The Graph Attention Network (GAT) makes use of the following layers.

  • The graph attention layer defines how new node representations attend to old node representations.

    We use the shared attention mechanism to derive

    To get the attention coefficients, we simply apply softmax

  • An alternative to the above definition is to use a Leaky ReLU as the activation function so that

    Where denotes concatenation of two tensors.

  • The new representation is then obtained using

Neural Message Passing Networks

  • The underlying idea behind Neural Message Passing Networks (NMPN) is to simulate message passing between nodes. We define two functions:

    • Message which encodes the message between node and in the -th layer based on information in edge .
    • Update combines the aggregated messages from the neighbors and the node representation. This functions as Combine in a similar manner to the general framework
  • More formally, the message at the -th layer for node is

    The update is done as follows

Continuous Graph Neural Networks

  • Inspired by Diffusion models and Network epidemics.

  • Define as the regularized characteristic matrix with hyperparameter and degree matrix

  • The eigenvalues of lie in the interval which means that converges

    • controls the diffusion rate.
  • If we Assume independent feature channels, we have the following propagation equation

    Where .

    We can unroll this to be equal to

    The precise dynamics are then encapsulated using the following differential equation

    • In words: The representation of a node is dependent on the influence of neighbors (), how much the node resists external influence () and the inherent characteristics of the node ()
  • If we relax the assumption such that we allow feature channels to interact with each other we have the following model. Let be the weight matrix that models interactions between feature channels. We have

    The dynamics are given by the following differential equation

    Where, as before, .

Lanczos Network

  • Motivation: The main issue with GCN is that it only propagates information over one step. That is, from a node to its nearest neighbor.

    • Naively stacking does not work. First, it may make the network too deep which means Vanishing Gradients.
    • Exponentiating the Laplacian is too costly. It is also unclear how to add learnable parameters to the Laplacian.
  • The Lanczos Network makes use of the Lanczos algorithm to obtain a matrix and compute a low rank approximation of the Laplacian

    The column vectors of are denoted . Implicitly, because it is low rank, the size of the approximation is significantly less than the Laplacian.

  • The update rule is then defined as follows. Let be a learnable spectral filter parameterized by . Also, let

    We have

    And

    Where means the representation at the -th layer.

  • The main limitation of this approach is that it uses a low rank approximation

Links