So, if I understand you well, it would be something like this. Inputs:
1. {x0, x1, ...} - nodes in the graph, say users
2. Bunch of edges like {x_i, x_j}, say social connections
3. Some raw or processed features on nodes, say the text of posts, age of the account or some nlp-based scores for posts
4. Some raw or processed features on the edges (in particular maybe some coloring)
Before: people would train various classifiers/regressors directly on the nodes and/or edges, then maybe use the graph structure to propagate the scores.
After: But instead you could train whatever objective you have from raw features on the edges and nodes, with some extra message passing between nodes and edges. For example train (some of) that nlp-based classifier together with the graph part. And the benefit would be that, for example, you can extract some signals from the NLP part that would be more useful in determining the properties of neighbors, but not necessarily as useful in determining the properties of the current node/edge.
Question - what's the maximum range of such message passing? Sounds a bit like an RNN, where the unroll depth can be an issue. Though in practice most graphs have a low average path length, so maybe this is not a particularly big problem.
Although if you start unrolling graphs you'll very quickly load ~everything, so I guess the training must be completely reworked (flush data to distributed storage frequently then shuffle for the next step) or you cannot unroll further than maybe a few steps.
Before: People might precompute graph scores ("pagerank", ...) and use as features for tabular NNs. Or use simpler and slow GNNs like GraphSAGE bc the domain fit was great (ex: Pinterest social recs)
After: heterogeneity and scale for graphs that fit in CPU RAM (1TB) w decent GPUs
Re:unrolling, yeah a bunch of papers there :) sampling, artificial jump edges, and adversarial techniques have been helping with aspects of generalization (far data, unbalanced data, ...)
1. {x0, x1, ...} - nodes in the graph, say users
2. Bunch of edges like {x_i, x_j}, say social connections
3. Some raw or processed features on nodes, say the text of posts, age of the account or some nlp-based scores for posts
4. Some raw or processed features on the edges (in particular maybe some coloring)
Before: people would train various classifiers/regressors directly on the nodes and/or edges, then maybe use the graph structure to propagate the scores.
After: But instead you could train whatever objective you have from raw features on the edges and nodes, with some extra message passing between nodes and edges. For example train (some of) that nlp-based classifier together with the graph part. And the benefit would be that, for example, you can extract some signals from the NLP part that would be more useful in determining the properties of neighbors, but not necessarily as useful in determining the properties of the current node/edge.
Question - what's the maximum range of such message passing? Sounds a bit like an RNN, where the unroll depth can be an issue. Though in practice most graphs have a low average path length, so maybe this is not a particularly big problem.
Although if you start unrolling graphs you'll very quickly load ~everything, so I guess the training must be completely reworked (flush data to distributed storage frequently then shuffle for the next step) or you cannot unroll further than maybe a few steps.