Base MMD Metrics

Squared Maximum Mean Discrepancy (MMD) metrics for comparing graph distributions.

This module provides classes for computing MMD-based distances between sets of graphs. We provide both single MMD estimates and uncertainty estimates through subsampling.

Available metrics

MMD metrics are initialized with a kernel function (see DescriptorKernel) and a collection of reference graphs.

Example
from polygraph.metrics.base import DescriptorMMD2, MaxDescriptorMMD2, DescriptorMMD2Interval
from polygraph.utils.descriptors import SparseDegreeHistogram
from polygraph.utils.kernels import AdaptiveRBFKernel
import networkx as nx
import numpy as np

reference_graphs = [nx.erdos_renyi_graph(10, 0.5) for _ in range(10)]
generated_graphs = [nx.erdos_renyi_graph(10, 0.5) for _ in range(10)]

kernel = AdaptiveRBFKernel(descriptor_fn=SparseDegreeHistogram(), bw=0.1)
mmd = DescriptorMMD2(reference_graphs=reference_graphs, kernel=kernel)
mmd_value = mmd.compute(generated_graphs)
print(mmd_value)    # A single float value

mmd_w_uncertainty = DescriptorMMD2Interval(reference_graphs=reference_graphs, kernel=kernel, subsample_size=5, num_samples=100, coverage=0.95)
mmd_interval = mmd_w_uncertainty.compute(generated_graphs)
print(mmd_interval)    # Named tuple with mean, standard deviation, and confidence interval bounds

multi_kernel = AdaptiveRBFKernel(descriptor_fn=SparseDegreeHistogram(), bw=np.array([0.1, 0.2]))
mmd = MaxDescriptorMMD2(reference_graphs=reference_graphs, kernel=multi_kernel)
mmd_value = mmd.compute(generated_graphs)
print(mmd_value)    # A single float value

Point Estimates

polygraph.metrics.base.DescriptorMMD2

Bases: GenerationMetric[GraphType], Generic[GraphType]

Computes squared MMD between reference and generated graphs using a kernel.

Parameters:
  • reference_graphs (Collection[GraphType]) –

    Collection of graphs to compare against

  • kernel (DescriptorKernel[GraphType]) –

    Kernel function for comparing graphs

  • variant (Literal['biased', 'umve', 'ustat'], default: 'biased' ) –

    Which MMD estimator to use ('biased', 'umve', or 'ustat')

compute(generated_graphs)

Computes MMD² between reference and generated graphs.

Parameters:
  • generated_graphs (Collection[GraphType]) –

    Collection of graphs to evaluate

Returns:
  • Union[float, ndarray]

    MMD² value(s). Returns array if kernel has multiple parameters.

polygraph.metrics.base.MaxDescriptorMMD2

Bases: DescriptorMMD2[GraphType], Generic[GraphType]

Computes maximum MMD² across multiple kernel parameters.

Similar to DescriptorMMD2 but takes the maximum across different kernel parameters (e.g., bandwidths). The kernel must support multiple parameters.

Parameters:
  • reference_graphs (Collection[GraphType]) –

    Collection of graphs to compare against

  • kernel (DescriptorKernel[GraphType]) –

    Kernel function with multiple parameters

  • variant (Literal['biased', 'umve', 'ustat'], default: 'biased' ) –

    Which MMD estimator to use ('biased', 'umve', or 'ustat')

Raises:
  • ValueError

    If kernel does not have multiple parameters

compute(generated_graphs)

Computes maximum MMD² between reference and generated graphs.

Parameters:
  • generated_graphs (Collection[GraphType]) –

    Collection of graphs to evaluate

Returns:
  • float

    Maximum MMD² value across kernel parameters

Uncertainty Quantification

polygraph.metrics.base.DescriptorMMD2Interval

Bases: GenerationMetric[GraphType], _MMD2SamplingMixin[GraphType], Generic[GraphType]

Computes MMD² confidence intervals using subsampling.

Estimates uncertainty in MMD² by repeatedly computing it on random subsamples of the reference and generated graphs.

Parameters:
  • reference_graphs (Collection[GraphType]) –

    Collection of graphs to compare against

  • kernel (DescriptorKernel[GraphType]) –

    Kernel function for comparing graphs

  • subsample_size (int) –

    Number of graphs to use in each MMD² sample, should be consistent with the sample size in point estimates.

  • num_samples (int, default: 500 ) –

    Number of MMD² samples to generate

  • coverage (Optional[float], default: 0.95 ) –

    Confidence level to compute upper and lower bounds. If None, only the mean and standard deviation are returned.

  • variant (Literal['biased', 'umve', 'ustat'], default: 'biased' ) –

    Which MMD estimator to use ('biased', 'umve', or 'ustat')

compute(generated_graphs)

Computes MMD² confidence intervals through subsampling.

Parameters:
  • generated_graphs (Collection[GraphType]) –

    Collection of graphs to evaluate

Returns:
  • MetricInterval

    Named tuple with mean, standard deviation, and confidence interval bounds

polygraph.metrics.base.MaxDescriptorMMD2Interval

Bases: GenerationMetric[GraphType], _MMD2SamplingMixin[GraphType], Generic[GraphType]

Computes confidence intervals for maximum MMD² across kernel parameters.

Similar to DescriptorMMD2Interval but takes the maximum across different kernel parameters for each subsample. I.e., it quantifies the uncertainty of the point estimates made in MaxDescriptorMMD2.

Parameters:
  • reference_graphs (Collection[GraphType]) –

    Collection of graphs to compare against

  • kernel (DescriptorKernel[GraphType]) –

    Kernel function with multiple parameters

  • subsample_size (int) –

    Number of graphs to use in each MMD² sample, should be consistent with the sample size in point estimates.

  • num_samples (int, default: 500 ) –

    Number of MMD² samples to generate

  • coverage (Optional[float], default: 0.95 ) –

    Confidence level to compute upper and lower bounds. If None, only the mean and standard deviation are returned.

  • variant (Literal['biased', 'umve', 'ustat'], default: 'biased' ) –

    Which MMD estimator to use ('biased', 'umve', or 'ustat')

Raises:
  • ValueError

    If kernel does not have multiple parameters

compute(generated_graphs)

Computes confidence intervals for maximum MMD² through subsampling.

Parameters:
  • generated_graphs (Collection[GraphType]) –

    Collection of graphs to evaluate

Returns:
  • MetricInterval

    Named tuple with mean, standard deviation, and confidence interval bounds for the maximum MMD² across kernel parameters