Graph Kernels

Kernel functions for comparing graphs via descriptor functions.

This module implements various kernel functions that measure similarities between graphs. Each kernel is initialized with a descriptor function that converts graphs into vector representations. The kernel then computes similarities between these feature vectors.

Descriptor functions accept iterables of graphs (networkx graphs or rdkit molecules) and return descriptions as dense or sparse arrays. Specifically, they should either return a dense numpy.ndarray or sparse scipy.sparse.csr_array of shape (n_graphs, n_features).

Available kernels
  • RBFKernel: Standard Gaussian/RBF kernel using \(\ell^2\) distance
  • LaplaceKernel: Exponential kernel using \(\ell^1\) distance
  • GaussianTV: Non-positive definite Gaussian kernel using \(\ell^1\) distance
  • AdaptiveRBFKernel: RBF kernel with data-dependent bandwidth adaptation
  • LinearKernel: Simple dot product kernel
Example
import numpy as np
from polygraph.utils.kernels import RBFKernel
import networkx as nx
from typing import Iterable

def my_descriptor(graphs: Iterable[nx.Graph]) -> np.ndarray:
    hists = [nx.degree_histogram(graph) for graph in graphs]
    hists = [
        np.concatenate([hist, np.zeros(128 - len(hist))], axis=0)
        for hist in hists
    ]
    hists = np.stack(hists, axis=0)
    return hists / hists.sum(axis=1, keepdims=True) # shape: (n_graphs, n_features)


ref_graphs = [nx.erdos_renyi_graph(10, 0.3) for _ in range(100)]
gen_graphs = [nx.erdos_renyi_graph(10, 0.3) for _ in range(100)]

# Create kernel with multiple bandwidths
kernel = RBFKernel(my_descriptor, bw=np.array([0.1, 1.0, 10.0]))

ref_desc = kernel.featurize(ref_graphs) # shape: (n_ref_graphs, n_features)
gen_desc = kernel.featurize(gen_graphs) # shape: (n_gen_graphs, n_features)

# Compare reference and generated graph descriptors
ref_vs_ref, ref_vs_gen, gen_vs_gen = kernel(ref_desc, gen_desc)

polygraph.utils.kernels.DescriptorKernel

Bases: ABC, Generic[GraphType]

Abstract base class for kernel functions that operate on graph descriptors.

This class defines the interface for kernels that compute similarity between graphs based on their descriptors. It handles both the featurization of graphs and the computation of kernel matrices.

Parameters:
  • descriptor_fn (GraphDescriptor[GraphType]) –

    Function that transforms graphs into descriptor vectors/matrices

__call__(ref, gen)

Computes the full kernel comparison between reference and generated graphs.

Parameters:
  • ref (Any) –

    Descriptors of reference graphs

  • gen (Any) –

    Descriptors of generated graphs

Returns:
  • GramBlocks

    Named tuple containing all kernel matrices after adaptation

adapt(blocks)

May adapt kernel parameters based on the computed kernel matrices.

This method may, e.g., scale the bandwidth of a Gaussian kernel based on the typical distance between reference and generated graphs.

Parameters:
  • blocks (GramBlocks) –

    Pre-computed kernel matrices

Returns:
  • GramBlocks

    Adapted kernel matrices

featurize(graphs)

Converts graphs into descriptor representations.

Parameters:
  • graphs (Iterable[GraphType]) –

    Collection of networkx graphs or rdkit molecules to featurize

Returns:
  • Any

    Descriptor representation of the graphs

pre_gram(ref, gen)

Computes all kernel matrices between reference and generated samples.

Parameters:
  • ref (Any) –

    Descriptors of reference graphs

  • gen (Any) –

    Descriptors of generated graphs

Returns:
  • GramBlocks

    Named tuple containing kernel matrices for ref-ref, ref-gen, and gen-gen comparisons