PolyGraph

PolyGraph is a Python library for evaluating graph generative models. With polygraph, evaluating a generative model becomes as easy as this:

import networkx as nx
from polygraph.datasets import PlanarGraphDataset
from polygraph.metrics import GaussianTVMMD2Benchmark

reference = PlanarGraphDataset("test").to_nx()
benchmark = GaussianTVMMD2Benchmark(reference)

generated = [nx.erdos_renyi_graph(64, 0.1) for _ in range(40)]
print(benchmark.compute(generated))     # {'orbit': 1.3305546735190608, 'clustering': 0.2799915534527712, 'degree': 0.07563928348299709, 'spectral': 0.07841922146118052}

Installation

You may install this package via:

pip install polygraph-benchmark

No manual compilation of ORCA is required. For details on the interaction with the graph_tool package, see the more detailed installation instructions.

Usage

We provide a few basic tutorials:

Basic Usage - How to load datasets and compute metrics
Metrics Overview - An overview of which metrics are implemented in polygraph (MMD, PGD, VUN, Frechet Distance)
Custom Datasets - How to build custom datasets and share them

PGD vs MMD overview

PolyGraph Discrepancy (PGD) is our proposed metric for graph generative model evaluation. Compared to maximum mean discrepancy (MMD), PGD provides a bounded range, an intrinsic scale, and a principled way to compare and aggregate across descriptors.

Property	MMD	PGD
Range	[0, ∞)	[0, 1]
Intrinsic Scale	❌	✅
Descriptor Comparison	❌	✅
Multi-Descriptor Aggregation	❌	✅
Single Ranking	❌	✅

PGD and its motivation are described in more detail in the paper and API docs.

Benchmarking snapshot

The table below shows an example benchmark generated with this library across multiple datasets and models. Values illustrate typical outputs from the newly proposed PolyGraph Discrepancy. For completeness, this library and our paper also implements and provides various MMD estimates on the datasets below. Values are scaled by 100 for legibility and subsampling is used to obtain standard deviations (using StandardPGDInterval and MoleculePGDInterval). More details are provided in our paper.

Method	Planar-L	Lobster-L	SBM-L	Proteins	Guacamol	Moses
AutoGraph	34.0 ± 1.8	18.0 ± 1.6	5.6 ± 1.5	67.7 ± 7.4	22.9 ± 0.5	29.6 ± 0.4
AutoGraph*	—	—	—	—	10.4 ± 1.2	—
DiGress	45.2 ± 1.8	3.2 ± 2.6	17.4 ± 2.3	88.1 ± 3.1	32.7 ± 0.5	33.4 ± 0.5
GRAN	99.7 ± 0.2	85.4 ± 0.5	69.1 ± 1.4	89.7 ± 2.7	—	—
ESGG	45.0 ± 1.4	69.9 ± 0.6	99.4 ± 0.2	79.2 ± 4.3	—	—

_{AutoGraph denotes a variant that leverages additional training heuristics as described in the paper.}