seed.tex

\documentclass[reprint,twocolumn,amsmath,amssymb,aps]{revtex4-1}
\pdfoutput=1 
\usepackage{graphicx}% Include figure files
\usepackage{dcolumn}% Align table columns on decimal point
\usepackage{bm}% bold math
\usepackage{hyperref}% add hypertext capabilities
\DeclareMathOperator{\Tr}{Tr}

\newcommand{\NN}{\mathcal{NN}}
\newcommand{\XX}{\mathcal{X}}
\newcommand{\YY}{\mathcal{Y}}
\newcommand{\X}{\mathbf{X}}
\newcommand{\Y}{\mathbf{Y}}
\newcommand{\bb}{\mathbf{b}}
\newcommand{\A}{\mathbf{A}}
\newcommand{\PP}{\mathbf{P}}
\newcommand{\cc}{\mathbf{c}}
\newcommand{\xstar}{\mathbf{x}^\ast}


\begin{document}

\title{Networks \\--\\ Complex and Graph Neural}% Force line breaks with \\

\author{Frank Hellmann}
\email{hellmann@pik-potsdam.de}
\affiliation{Potsdam Institute for Climate Impact Research}

\date{\today}

\begin{abstract}
Seeding thoughts for the Knowledge Repo on the intersection of complex networks and graph neural networks.
\end{abstract}

\maketitle

\section{Initial observations}

We have a language clash: The graph in Graph Neural Networks (GNNs) is the network in Complex Networks. We will try not to use the word network in this Knowledge Repo. All Complex Networks and graphs we will consider are defined over an index set $1 \dots n$.

\section{A brief taxonomy of Functions on Sets and Graphs}

Neural Networks (NN) are function approximators. Set based Neural Networks approximate functions from a set of inputs, meaning there is no particular order of the inputs specified. We call such functions set based functions. Graph Neural Networks approximate functions from data associated to graphs, and again, there is no natural ordering for these data, we refer to the functions as graph based functions. Network Measures in Complex Network theory are graph based functions for graphs with no data associated to them. They might go to a single number, to the set of nodes, or the set of edges.

It's instructive to briefly consider set based functions on sets where the elements of the set carry no data, or equivalently, functions that don't depend on the data, in analogy to Network Measures. If such a function takes values in, e.g. the reals $\mathbb{R}$, it should tell us something about the set aspects of the set of inputs we have. The only property of a set independent of its elements is its size, thus any set based function that does not depend on the data associated to the set elements, is a function of the size of the input set.

In contrast, for graph based functions, much more complex functions are neccessary.

\subsection{Classification through permutations}

For all these functions, the order in which the inputs are presented does not matter. When we want to write a set based function taking values in the reals in the form $f(x_1, x_2, \dots)$ they are permutation invariant
\begin{align}
f(x_1, x_2, \dots) = f(x_{\sigma(1)}, x_{\sigma(2)}, \dots)
\end{align}
where $\sigma$ denotes a permutation.

For a graph based function, we need to permute the edges and the vertices at the same time.

\begin{align}
f(x_1, x_2, \dots, e_{12}, \dots) = f(x_{\sigma(1)}, x_{\sigma(2)}, \dots, e_{\sigma(1),\sigma(2)})
\end{align}

If the graph based function provides a set of data associated to the nodes or edges then the function has to be covariant under permutations:

\begin{align}
f_1(x_1, x_2, \dots) = f_{\sigma(1)}(x_{\sigma(1)}, x_{\sigma(2)}, \dots)
\end{align}

Generally speaking the domain and image of the function, written as a function on ordinary spaces, carry a representation of some permutation group.
%
%\subsection{Previously written...}
%
% That is, specifying a neural network requires specifying an input and output space first: $f_\NN: \XX \rightarrow \YY$. This space can carry extra structure. Set based neural networks deal with spaces that are product spaces $\XX = \X^n$ and approximate functions that are permutation invariant or covariant. We will always use indices $i, j$ running from $1 \dots n$ to denote the copies. Writing $S_n$ for the symmetric group on the indices parametrizing the copies of $\X$ and $\Y$, and $P \in S_n$ for the permutations we have the covariant $f^C_\NN: \X^n \rightarrow \Y^n$:
%
%\begin{align}
%f^C_\NN(P x) = P f^C_\NN(x)
%\end{align}
%
%and the invariant $f^I_\NN: \X^n \rightarrow \YY$:
%
%\begin{align}
%f^I_\NN(P x) = f^I_\NN(x)
%\end{align}
%
%The name set based indicates that the functions learned only depend on the set of components $x_i$ in the vector $x = (x_1, \dots, x_n)$, not on their order.
%
%We can now consider more complex representations of the permutation group. For example, consider the space $\XX = \X^{n \times n}$. Elements of this space have components $x_{ij}$. This space transforms naturally under $S_n$ by conjugation. A covariant network $f^C_\NN: \X^{n \times n} \rightarrow \Y^n$ would thus satisfy:
%
%\begin{align}
%f^C_\NN(P x P) = P f^C_\NN(x)
%\end{align}
%
%Finally, for the space $\XX = \X_2^{n \times n} \oplus \X_1^{n}$ we would obtain a covariant function
%
%\begin{align}
%f^C_\NN(P x_2 P, P x_1) = P f^C_\NN(x_2, x_1)
%\end{align}
%
%This is the form of the symmetry of a Graph Neural Network. The coefficients $(x_2)_{ij}$ contain the information on the connection from node $i$ to $j$, in the simples case simply whether the nodes are connected or not.
%
\subsection{Tensor towers}

In the above the node features live in $\X^n$ and the edge features in $\X_e^{n \times n} = \X_e^{2n}$. The permutation group $S_n$ acts naturally on these spaces by permuting indices in the tensor copies\footnote{This can be said more abstractly of course but it should be relatively clear I hope}. More generally the permutation group also acts naturally on $\X_k^{kn}$ where $\X_k$ is some fixed base space.

We can take the direct sum of such spaces to obtain an entire tensor tower,
\begin{align}
\XX = \bigoplus_{k=0}^{\infty} \X_k^{kn} \;.
\end{align}
This space also transforms naturally under $S_n$. It also has an interpretation as being the space that carries data on hypergraphs. Hypergraph Neural Networks have actually already been defined in \cite{feng2019hypergraph}.


\section{General structure results for permutation invariant functions}

Theorems exist that guarantee that a permutation invariant function can be written as

\begin{align} \label{eq::1 to 0 structure}
f(x_1, x_2, \dots) = g \left(\sum_i (h(x_i))\right)
\end{align}

See theorem ?? \citet{zaheer2017deep}. However this reperesentation might not be useful or meaningful, see e.g. the discussion in \citet{wagstaff2019limitations}.

What is remarkable about this is that this representation theorem also allows us to compare functions on sets of different size, though the results of \citet{wagstaff2019limitations} show that generalizations to larger sets is not guaranteed.

\textbf{Question: } Are there such theorems for graph based functions already known?

\section{Combining functions}

Let's stick to these representations of $S_n$ for now. We can denote the representations by noting the $k$ that occur in the expansion. $(1)$ would act on a space $\X^n$, $(1,2)$ on $\X_1^n \oplus \X_2^{2n}$ and so on. We can then denote for a function what representation it's domain and image carry. For example the function that counts the total edge weight $\sum_{i,j} e_{i,j}$ is of the form $(2) \rightarrow (0)$.

A set based function $f$ is of the form $(1) \rightarrow (0)$. A node wise network measure on a graph in the sense of Complex Networks is $(2) \rightarrow (1)$.

Functions can be combined in the obvious way, give a network measure $C$ and a set based neural network $\NN$ we can obtain a function on graphs by first applying the network measure and then the neural network.

\paragraph*{Notation:} We write a set of elements indexed by $i,j,\dots \in 1 \dots N$ as $\{z_{ijk}\}$ meaning that the set is over all possible indices. If we want to take a set that is only over a part of the indices we write $\{z_{ijk}\}_k$ to denote the set that covers all $k$, and so on. Thus $\{z_{ijk}\} = \{z_{ijk}\}_{ijk}$ \textit{This is not very clean, what we really want to say is $z_{ij}$ up to permutation. This is the same as a set for one index, but not for several indices. Also needs to be tuned to work with covariant functions...}

More interestingly, we can also use set based functions in the other direction to lift. Given set based functions $f: (1) \rightarrow (0)$  and $g: (1) \rightarrow (0)$ on  on $S_{n}$ we can create a set consisting of $g_i = g(\{x_{ij}\}_{j})$ that transforms naturally under $S_n$. We can then apply $f$ to the set $\{g_i\}$ to obtain a function

\begin{equation} \label{eq::2 to 0 structure}
f(\{g(x_{ij})\}) : (2) \rightarrow (0)
\end{equation}

As an example, let the edge weights be the adjacency matrix $A_{ij}$, take $g$ to be the sum function and $f$ the mean,
\begin{align}
g(M) &= \sum_{m \in M} m\\
f(M) &= \frac{\sum_{m \in M} m}{\sum_{m \in M}}
\end{align}
then 
\begin{equation}
f(\{g(A_{ij})\}) = \frac{\sum_{i, j} A_{ij}}{\sum_{i}}
\end{equation}
is the average degree of the network described by $A$

\subsection{A conjecture}

In \eqref{eq::2 to 0 structure}, $f$ and $g$ are set based functions, we know that they can be expressed in the form \eqref{eq::1 to 0 structure}. Conjecture: Every function $(2) \rightarrow (0)$ can be written in the form \eqref{eq::2 to 0 structure}.

Explicitly:
\begin{equation}
f(\{e_{ij}\}) = g_2\left(\sum_j h_2 \left(g_1 \left( \sum_i h_1(e_{ij})\right)\right)\right)
\end{equation}

By defining $h_{21} = g_1 \circ h_2$ this simplifies

\begin{equation} \label{eq::2 to 0 structure conjecture}
f(\{e_{ij}\}) = g_2\left(\sum_j h_{21} \left( \sum_i h_1(e_{ij})\right)\right)
\end{equation}

\textbf{What do we know about covariant functions $(1) \rightarrow (1)$?} A form like $y_i = g(x_i, h(\{x_i\}))$ seems natural...

\section{The dynamics on graphs perspective}

\subsection{An initial observation}

Take a discrete time linear diffusive dynamical system on a graph. The equations are given by
\begin{align}
x(t + 1) = \left( A \otimes 1 + B \otimes L \right) x
\end{align}

with $L$ the effective network coupling.
Writing $x$ in components $x_{ai}$ with $i$ denoting the graph, and $a$ denoting the internal idnices this becomes

\begin{align}
x_{ai}(t + 1) &= A_{ab} x_{bj} + B_{ab} L_{ij} x_{bj} \\
x(t+1) &= A \cdot x(t) + B \cdot x(t) \cdot L^T
\end{align}

Assume for a moment we have no local mixing $A = 0$, then we have

\begin{align}
x(t+n) &= B^n \cdot x(t) (L^T)^n
\end{align}

These type of terms are often a central ingredient in graph convolutional networks. With $A \neq 0$ we get something like recurrence.

This raises the possibility that more general dyanmics on graphs could be of interest in the construction of GNNs, including possibly using ODE or stochastic dynamics instead of discrete time maps.

It also raises the question whether there are interesting structure results for functions of type $(1, 2) \rightarrow (1)$, which serve as the righthandside of a graph dynamical system.

One final note, the ARMA Filters for GNNs paper constructs a dynamical system that is supposed to converge to an interesting quantity. This is done by inserting a source in the dynamics above. Generally speaking the idea that by using dynamics we can solve equations could also be readily generalized.

Finally, a different notion of graph dynamics is given by functions $(2) \rightarrow (2)$ that change the graph itself, for example by stochastic rewiring. It would be interesting whether these types of dynamics have any use in the construction of GNNs.

\subsection{Some thoughts on general structure theorems - linear case}

A dynamical system on graphs is specified by a function $(1, 2) \rightarrow (1)$. That is, a covariant function $f_i(\{e_{ij}\}, \{x_i\})$.

The dynamics are then either of

\begin{align}
\dot x_i(t) &= f_i(\{e_{ij}\}, \{x_i(t)\})\\
x_i(t + 1) &= f_i(\{e_{ij}\}, \{x_i(t)\})
\end{align}

The most typical covariant dynamics linear in $e$ and $x$ (separately) is given by
\begin{align}
x_i(t + 1) &= A \cdot x_i + \sum_j  B_1 e_{ij} B_2 x_j
\end{align}

To obtain the gneuinely most general form we would need to allow pure $e_{ij}$ terms and also consider $e_{ji} x_{j}$

Generically this can be rewritten by going to an eigenbasis of $B_1 \tensor B_2^T$. Call the left and right eigenmatrices $B_k$ and $B_k^r$ then we obtain 
\begin{align}
x_i(t + 1) &= A \cdot x_i + \sum_k \sum_j   Tr (e_{ij} B_k^r) B_k x_j
\end{align}

Writing ${L_k}_{ij} = Tr (e_{ij} B_k^r)$ this form becomes a form of multilayer dynamics on effective networks given by the $L_k$.

\begin{align}
x(t + 1) = \left( A \otimes 1 + \sum_k B_k \otimes L_k \right) x
\end{align}


Graphs encode a notion of locality. Functions that act covariantly can very easily be defined through local dynamics.

Linear dynamics on the graph seem to cover most cases of Graph Neural Networks considered so far.

\section{Lifting to probabilities}

If we don't just want to look at functions from graphs to nodes, but also look at functions of probability densities of graphs and nodes, an interesting potential connection to QFT arises. QFT studies linear state spaces on tensor towers like we looked at above and specifically bosonic and fermionic statistics concern the question of how the spaces considered transform under permutation.

However, this is a different permutation symmetry than we have above. In the space $\X^{kn}$ QFT considers permutations on the $k$ rather than the $n$. So a direct application of, e.g. Fock spaces is not possible.
%
%However, the interest in QFT is not for an action of $S_n$ on the whole tower 
%
%Let's consider set based functions first. We want functions to go from some (set of) $X$ to (a set of) $Y$.Let's consider the space of functions on $X$, $\mathbb{R}^X$ (subject to sensible conditions). Then a function $X$ to $Y$ induces a linear function $\mathbb{R}^X$ to $\mathbb{R}^Y$. The cartesian product of spaces $X$ and $Y$ on which the permutation acts directly becomes a linear representation of the permutaton group on the tensor products of $\mathbb{R}^X$ and $\mathbb{R}^Y$ respectively. These spaces contain the probability densities on the direct product space. They might be suitable to Bayesian extensions of learning set or graph based functions.
%
%This latter is pretty much the set up for a bosonic QFT. This suggests that we can adopt the machinery of QFT with creation and annihilation operators, actions and interaction terms, diagrams, etc... to the setting of set based functions. It might be interesting to see if (stochastic) gradient descent can be connected to renormalization.
%
%The big question is what the equivalent of these structures would be if we consider graph based functions.

\bibliography{biblio}

\end{document}