- Training
- Baseline:
main.pywith baseline set to true (alternativelyrun_main.sh) - Subgraph:
main.py(alternativelyrun_main.sh)
- Baseline:
- Inference:
- Baseline:
inference_baseline.py(alternativelyrun_inference_baseline.sh) - Subgraph:
inference.py(alternativelyrun_inference.sh)
- Baseline:
- Saved Models: Models stored in the following directory:
./save/
Refer to the following .sh files for examples:
- Training (both subgraph and baseline):
run_main.sh - Inference for baselines:
run_inference_baseline.sh - Inference for subgraphs:
run_inference.sh
Refer to the csv file: dataset_info.csv
pip install -r requirements.txtdataset:- Dataset name
- Node Classification: cora, citeseer, pubmed, dblp, Physics
- Node Regression: chameleon, squirrel, crocodile
- Graph Classification: ENZYMES, AIDS, PROTEINS
- Graph Regression: QM9, ZINC (subset)
- Dataset name
experiment: {fixed, random, few}- Parameter specific to Node Classification for splitting nodes into train, val and test sets.
- fixed: cora, citeseer, pubmed
- few: cora, citeseer, pubmed, dblp, Physics
- random: dblp, Physics
- Parameter specific to Node Classification for splitting nodes into train, val and test sets.
runs: default = 20- Number of times to run node-level task
baseline: default = True- To train the baseline model
train_fitgnn: default = False- To train the FIT_GNN model
- Note: If both
baselineandtrain_fitgnnare set to be true, thentrain_fitgnnwill be considered.
exp_setup: {Gc_train_2_Gs_infer, Gs_tran_2_Gs_infer, Gc_train_2_Gs_train}- Type of experiment setup to run
- Gc_train_2_Gs_infer: Train and val on Gc >> Test on Gs
- Gs_train_2_Gs_infer: Train, val and test on Gs
- Gc_train_2_Gs_train: Train and val on Gc >> transfer learnt weights >> Train, val and test on Gs
- Type of experiment setup to run
extra_node: {True, False}- Boolean parameter to train model by incorporating extra nodes.
cluster_node: {True, False}- Boolean parameter to train model by incorporating cluster nodes.
coarsening_ratio: [0, 1]- Extent of coarsening, 0 implying fewer subgraphs created and more nodes in each subgraph while 1 indicating large number of subgraphs created and fewer number of nodes in each subgraph.
coarsening_method: {variation_neighborhoods, algebraic_JC, affinity_GS, kron}- Method used to coarsen graphs into subgraphs.
output_dir:- Directory to save best model.
task: {node_cls, node_reg, graph_cls, graph_reg}- Type of node-level or graph-level task being performed.
multi_prop: {True, False}- Boolean parameter specific to QM9
datasetfor Node Regression task. Should be set to True while performing experiments using QM9, else False.
- Boolean parameter specific to QM9
property: {0, 1, ... , 18}- Parameter specific to QM9
datasetfor Node Regression task. Should be given one of the 19 targets for prediction.
- Parameter specific to QM9
hidden: default = 512- Number of nodes in hidden layers of GNN
epochs1: default = 100- Parameter specific to Gc_train_2_Gs_infer
exp_setup. Number of epochs to train on Gc.
- Parameter specific to Gc_train_2_Gs_infer
epochs2: default = 300- Parameter specific to Gs_train_2_Gs_infer
exp_setup. Number of epochs to train on Gs.
- Parameter specific to Gs_train_2_Gs_infer
num_layers1: default = 2- Parameter specific to Gc_train_2_Gs_infer
exp_setup. Number of layers in Gc training model.
- Parameter specific to Gc_train_2_Gs_infer
num_layers2: default = 2- Parameter specific to Gs_train_2_Gs_infer
exp_setup. Number of layers in Gs training model.
- Parameter specific to Gs_train_2_Gs_infer
train_ratio: [0, 1], default = 0.3- Parameter specific to graph-level tasks. Ratio of graphs reserved for training to total number of graphs in dataset.
val_ratio: [0, 1], default = 0.2- Parameter specific to graph-level tasks. Ratio of graphs reserved for validation to total number of graphs in dataset.
use_community_detection: default = False- Leiden algorithm is used to detect the top k communities to construct a proxy graph of a large graph.