PolyGraph
PolyGraph is a Python library for evaluating graph generative
models.
With polygraph, evaluating a generative model becomes as easy as this:
import networkx as nx
from polygraph.datasets import PlanarGraphDataset
from polygraph.metrics import GaussianTVMMD2Benchmark
reference = PlanarGraphDataset("test").to_nx()
benchmark = GaussianTVMMD2Benchmark(reference)
generated = [nx.erdos_renyi_graph(64, 0.1) for _ in range(40)]
print(benchmark.compute(generated)) # {'orbit': 1.3305546735190608, 'clustering': 0.2799915534527712, 'degree': 0.07563928348299709, 'spectral': 0.07841922146118052}
Installation
You may install this package via:
pip install polygraph-benchmark
No manual compilation of ORCA is required.
For details on the interaction with the graph_tool package, see the more detailed installation instructions.
Usage
We provide a few basic tutorials:
- Basic Usage - How to load datasets and compute metrics
- Metrics Overview - An overview of which metrics are implemented in
polygraph(MMD, PGD, VUN, Frechet Distance) - Custom Datasets - How to build custom datasets and share them
PGD vs MMD overview
PolyGraph Discrepancy (PGD) is our proposed metric for graph generative model evaluation. Compared to maximum mean discrepancy (MMD), PGD provides a bounded range, an intrinsic scale, and a principled way to compare and aggregate across descriptors.
| Property | MMD | PGD |
|---|---|---|
| Range | [0, ∞) | [0, 1] |
| Intrinsic Scale | ❌ | ✅ |
| Descriptor Comparison | ❌ | ✅ |
| Multi-Descriptor Aggregation | ❌ | ✅ |
| Single Ranking | ❌ | ✅ |
PGD and its motivation are described in more detail in the paper and API docs.
Benchmarking snapshot
The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the newly proposed PolyGraph Discrepancy. For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below. Values are scaled by 100 for legibility and subsampling is used to obtain standard deviations (using StandardPGDInterval and MoleculePGDInterval). More details are provided in our paper.
| Method | Planar-L | Lobster-L | SBM-L | Proteins | Guacamol | Moses |
|---|---|---|---|---|---|---|
| AutoGraph | 34.0 ± 1.8 | 18.0 ± 1.6 | 5.6 ± 1.5 | 67.7 ± 7.4 | 22.9 ± 0.5 | 29.6 ± 0.4 |
| AutoGraph* | — | — | — | — | 10.4 ± 1.2 | — |
| DiGress | 45.2 ± 1.8 | 3.2 ± 2.6 | 17.4 ± 2.3 | 88.1 ± 3.1 | 32.7 ± 0.5 | 33.4 ± 0.5 |
| GRAN | 99.7 ± 0.2 | 85.4 ± 0.5 | 69.1 ± 1.4 | 89.7 ± 2.7 | — | — |
| ESGG | 45.0 ± 1.4 | 69.9 ± 0.6 | 99.4 ± 0.2 | 79.2 ± 4.3 | — | — |
AutoGraph denotes a variant that leverages additional training heuristics as described in the paper.