Real-World Graph Structures
polygraph.datasets.DobsonDoigGraphDataset
Bases: SplitGraphDataset
Dataset of protein graphs originally introduced by Dobson and Doig [1].
This dataset was later adopted by You et al. [2] in the area of graph generation. The splits we provide are disjoint, unlike in [2]. We use the splitting strategy proposed in [3].

Dataset statistics:
| Metric | Train | Val | Test |
|---|---|---|---|
| # of Graphs | 587 | 147 | 184 |
| Min # of Nodes | 100 | 107 | 101 |
| Max # of Nodes | 500 | 498 | 490 |
| Avg # of Nodes | 261.54 | 252.14 | 250.73 |
| Min # of Edges | 213 | 215 | 186 |
| Max # of Edges | 1575 | 1392 | 1397 |
| Avg # of Edges | 658.29 | 629.75 | 621.89 |
| Edge/Node Ratio | 2.52 | 2.50 | 2.48 |
| Is Undirected | True | True | True |
Graph Attributes
residues: Node-level attribute indicating the amino acid typesis_enyzme: Graph-level attribute indicating whether protein is an enzyme (1 or 2)
References
[1] Dobson, P. and Doig, A. (2003). Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology, 330(4):771–783.
[2] You, J., Ying, R., Ren, X., Hamilton, W., & Leskovec, J. (2018). GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In International Conference on Machine Learning (ICML).
[3] Martinkus, K., Loukas, A., Perraudin, N., & Wattenhofer, R. (2022). SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators. In Proceedings of the 39th International Conference on Machine Learning (ICML).
polygraph.datasets.EgoGraphDataset
Bases: SplitGraphDataset
Dataset of ego networks extracted from Citeseer [1], introduced by You et al. [2].
The graphs are 3-hop ego networks with 50 to 399 nodes.

Dataset statistics:
| Metric | Train | Val | Test |
|---|---|---|---|
| # of Graphs | 454 | 151 | 152 |
| Min # of Nodes | 50 | 50 | 50 |
| Max # of Nodes | 399 | 333 | 364 |
| Avg # of Nodes | 141.72 | 139.29 | 158.08 |
| Min # of Edges | 64 | 56 | 63 |
| Max # of Edges | 1066 | 898 | 1004 |
| Avg # of Edges | 325.16 | 321.87 | 369.30 |
| Edge/Node Ratio | 2.29 | 2.31 | 2.34 |
| Is Undirected | True | True | True |
References
[1] Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3):93.
[2] You, J., Ying, R., Ren, X., Hamilton, W., & Leskovec, J. (2018). GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In International Conference on Machine Learning (ICML).
polygraph.datasets.SmallEgoGraphDataset
Bases: SplitGraphDataset
Dataset of smaller ego networks extracted from Citeseer.
The graphs of this dataset have at most 18 nodes.

Dataset statistics:
| Metric | Train | Val | Test |
|---|---|---|---|
| # of Graphs | 120 | 40 | 40 |
| Min # of Nodes | 4 | 4 | 4 |
| Max # of Nodes | 17 | 17 | 16 |
| Avg # of Nodes | 5.92 | 6.32 | 7.05 |
| Min # of Edges | 3 | 3 | 3 |
| Max # of Edges | 30 | 22 | 55 |
| Avg # of Edges | 7.38 | 7.61 | 10.73 |
| Edge/Node Ratio | 1.25 | 1.20 | 1.52 |
| Is Undirected | True | True | True |
polygraph.datasets.PointCloudGraphDataset
Bases: SplitGraphDataset
Dataset of KNN-graphs of point clouds, proposed by Neumann et al. [1].

Dataset statistics:
| Metric | Train | Val | Test |
|---|---|---|---|
| # of Graphs | 26 | 7 | 8 |
| Min # of Nodes | 134 | 188 | 286 |
| Max # of Nodes | 5037 | 3848 | 2796 |
| Avg # of Nodes | 1491.38 | 1294.71 | 1078.62 |
| Min # of Edges | 320 | 424 | 656 |
| Max # of Edges | 10886 | 8730 | 5991 |
| Avg # of Edges | 3320.69 | 2911.71 | 2413.00 |
| Edge/Node Ratio | 2.23 | 2.25 | 2.24 |
| Is Undirected | True | True | True |
Graph attributes
coords: node-level feature describing the 3D coordinates of the point cloud.object_class: graph-level attribute describing the object represented by the point cloud.
References
[1] Neumann, M., Moreno, P., Antanas, L., Garnett, R., & Kersting, K. (2013). Graph kernels for object category prediction in task-dependent robot grasping. In International Workshop on Mining and Learning with Graphs at KDD.
polygraph.datasets.ModelNet10GraphDataset
Bases: SplitGraphDataset
Dataset of kNN-graphs sampled from objects in ModelNet10 by Wu et al. [1].
The graphs are constructed by sampling a random number of points on the object's surface and computing a 4-NN graph.

Dataset statistics:
| Metric | Train | Val | Test |
|---|---|---|---|
| # of Graphs | 3592 | 399 | 908 |
| Min # of Nodes | 60 | 66 | 60 |
| Max # of Nodes | 4996 | 4931 | 4988 |
| Avg # of Nodes | 2222.65 | 2262.82 | 2096.08 |
| Min # of Edges | 149 | 165 | 148 |
| Max # of Edges | 12404 | 12286 | 12265 |
| Avg # of Edges | 5462.61 | 5568.30 | 5153.20 |
| Edge/Node Ratio | 2.46 | 2.46 | 2.46 |
| Is Undirected | True | True | True |
Graph attributes
coords: node-level feature describing the 3D coordinates of the point cloud.object_class: graph-level attribute describing the object represented by the point cloud.
References
[1] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A Deep Representation for Volumetric Shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1912-1920).