Real-World Graph Structures

polygraph.datasets.DobsonDoigGraphDataset

Bases: SplitGraphDataset

Dataset of protein graphs originally introduced by Dobson and Doig [1].

This dataset was later adopted by You et al. [2] in the area of graph generation. The splits we provide are disjoint, unlike in [2]. We use the splitting strategy proposed in [3].

First 3 graphs

Dataset statistics:

Metric Train Val Test
# of Graphs 587 147 184
Min # of Nodes 100 107 101
Max # of Nodes 500 498 490
Avg # of Nodes 261.54 252.14 250.73
Min # of Edges 213 215 186
Max # of Edges 1575 1392 1397
Avg # of Edges 658.29 629.75 621.89
Edge/Node Ratio 2.52 2.50 2.48
Is Undirected True True True
Graph Attributes
  • residues: Node-level attribute indicating the amino acid types
  • is_enyzme: Graph-level attribute indicating whether protein is an enzyme (1 or 2)
References

[1] Dobson, P. and Doig, A. (2003). Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology, 330(4):771–783.

[2] You, J., Ying, R., Ren, X., Hamilton, W., & Leskovec, J. (2018). GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In International Conference on Machine Learning (ICML).

[3] Martinkus, K., Loukas, A., Perraudin, N., & Wattenhofer, R. (2022). SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators. In Proceedings of the 39th International Conference on Machine Learning (ICML).

polygraph.datasets.EgoGraphDataset

Bases: SplitGraphDataset

Dataset of ego networks extracted from Citeseer [1], introduced by You et al. [2].

The graphs are 3-hop ego networks with 50 to 399 nodes.

First 3 graphs

Dataset statistics:

Metric Train Val Test
# of Graphs 454 151 152
Min # of Nodes 50 50 50
Max # of Nodes 399 333 364
Avg # of Nodes 141.72 139.29 158.08
Min # of Edges 64 56 63
Max # of Edges 1066 898 1004
Avg # of Edges 325.16 321.87 369.30
Edge/Node Ratio 2.29 2.31 2.34
Is Undirected True True True
References

[1] Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3):93.

[2] You, J., Ying, R., Ren, X., Hamilton, W., & Leskovec, J. (2018). GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In International Conference on Machine Learning (ICML).

polygraph.datasets.SmallEgoGraphDataset

Bases: SplitGraphDataset

Dataset of smaller ego networks extracted from Citeseer.

The graphs of this dataset have at most 18 nodes.

First 3 graphs

Dataset statistics:

Metric Train Val Test
# of Graphs 120 40 40
Min # of Nodes 4 4 4
Max # of Nodes 17 17 16
Avg # of Nodes 5.92 6.32 7.05
Min # of Edges 3 3 3
Max # of Edges 30 22 55
Avg # of Edges 7.38 7.61 10.73
Edge/Node Ratio 1.25 1.20 1.52
Is Undirected True True True

polygraph.datasets.PointCloudGraphDataset

Bases: SplitGraphDataset

Dataset of KNN-graphs of point clouds, proposed by Neumann et al. [1].

First 3 graphs

Dataset statistics:

Metric Train Val Test
# of Graphs 26 7 8
Min # of Nodes 134 188 286
Max # of Nodes 5037 3848 2796
Avg # of Nodes 1491.38 1294.71 1078.62
Min # of Edges 320 424 656
Max # of Edges 10886 8730 5991
Avg # of Edges 3320.69 2911.71 2413.00
Edge/Node Ratio 2.23 2.25 2.24
Is Undirected True True True
Graph attributes
  • coords: node-level feature describing the 3D coordinates of the point cloud.
  • object_class: graph-level attribute describing the object represented by the point cloud.
References

[1] Neumann, M., Moreno, P., Antanas, L., Garnett, R., & Kersting, K. (2013). Graph kernels for object category prediction in task-dependent robot grasping. In International Workshop on Mining and Learning with Graphs at KDD.

polygraph.datasets.ModelNet10GraphDataset

Bases: SplitGraphDataset

Dataset of kNN-graphs sampled from objects in ModelNet10 by Wu et al. [1].

The graphs are constructed by sampling a random number of points on the object's surface and computing a 4-NN graph.

First 3 graphs

Dataset statistics:

Metric Train Val Test
# of Graphs 3592 399 908
Min # of Nodes 60 66 60
Max # of Nodes 4996 4931 4988
Avg # of Nodes 2222.65 2262.82 2096.08
Min # of Edges 149 165 148
Max # of Edges 12404 12286 12265
Avg # of Edges 5462.61 5568.30 5153.20
Edge/Node Ratio 2.46 2.46 2.46
Is Undirected True True True
Graph attributes
  • coords: node-level feature describing the 3D coordinates of the point cloud.
  • object_class: graph-level attribute describing the object represented by the point cloud.
References

[1] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A Deep Representation for Volumetric Shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1912-1920).