Graph Descriptors

Generic Graph Descriptors

polygraph.utils.descriptors.DegreeHistogram

Bases: GraphDescriptor[Graph]

Computes normalized degree distributions of graphs.

For each graph, computes a histogram of node degrees and normalizes it to sum to 1. Pads all histograms to a fixed maximum degree.

Parameters:
  • max_degree (int) –

    Maximum degree to consider. Larger degrees are ignored

polygraph.utils.descriptors.SparseDegreeHistogram

Bases: GraphDescriptor[Graph]

Memory-efficient version of degree distribution computation.

Similar to DegreeHistogram but returns a sparse matrix, making it suitable for graphs with high maximum degree where most degree bins are empty.

polygraph.utils.descriptors.ClusteringHistogram

Bases: GraphDescriptor[Graph]

Computes histograms of local clustering coefficients.

For each graph, computes the distribution of local clustering coefficients across nodes. The clustering coefficient measures the fraction of possible triangles through each node that exist.

Parameters:
  • bins (int) –

    Number of histogram bins covering [0,1]

  • sparse (bool, default: False ) –

    Whether to return a dense np.ndarray or a sparse csr_array. Sparse version may be faster when comparing many graphs.

polygraph.utils.descriptors.OrbitCounts

Bases: GraphDescriptor[Graph]

Computes graph orbit statistics .

Warning

Self-loops are automatically removed from input graphs.

polygraph.utils.descriptors.EigenvalueHistogram

Bases: GraphDescriptor[Graph]

Computes eigenvalue histogram of normalized Laplacian.

For each graph, computes the eigenvalue spectrum of its normalized Laplacian matrix and returns a histogram of the eigenvalues.

Parameters:
  • n_bins (int, default: 200 ) –

    Number of histogram bins

  • sparse (bool, default: False ) –

    Whether to return a dense np.ndarray or a sparse csr_array. Sparse version may be faster when comparing many graphs.

polygraph.utils.descriptors.RandomGIN

Bases: GraphDescriptor[Graph]

Random Graph Isomorphism Network for graph embeddings.

Initializes a randomly weighted Graph Isomorphism Network (GIN) and uses it to compute graph embeddings. The network parameters are fixed after random initialization. Node features default to node degrees if not specified.

Parameters:
  • num_layers (int, default: 3 ) –

    Number of GIN layers

  • hidden_dim (int, default: 35 ) –

    Hidden dimension in each layer

  • neighbor_pooling_type (str, default: 'sum' ) –

    How to aggregate neighbor features ('sum', 'mean', or 'max')

  • graph_pooling_type (str, default: 'sum' ) –

    How to aggregate node features into graph features ('sum', 'mean', or 'max')

  • input_dim (int, default: 1 ) –

    Dimension of input node features

  • edge_feat_dim (int, default: 0 ) –

    Dimension of edge features (0 for no edge features)

  • dont_concat (bool, default: False ) –

    If True, only use final layer features instead of concatenating all layers

  • num_mlp_layers (int, default: 2 ) –

    Number of MLP layers in each GIN layer

  • output_dim (int, default: 1 ) –

    Dimension of final graph embedding

  • device (str, default: 'cpu' ) –

    Device to run the model on (e.g., 'cpu' or 'cuda')

  • node_feat_loc (Optional[List[str]], default: None ) –

    List of node attributes to use as features. If None, use degree as features.

  • edge_feat_loc (Optional[List[str]], default: None ) –

    List of edge attributes to use as features. If None, no edge features are used.

  • seed (Optional[int], default: None ) –

    Random seed for weight initialization

polygraph.utils.descriptors.WeisfeilerLehmanDescriptor

Bases: GraphDescriptor[Graph]

Weisfeiler-Lehman subtree features for graphs.

Computes graph features by iteratively hashing node neighborhoods using the WL algorithm. Returns sparse feature vectors where each dimension corresponds to a subtree pattern.

Warning

Hash collisions may occur, as at most \(2^{31}\) unique hashes are used.

Parameters:
  • iterations (int, default: 3 ) –

    Number of WL iterations

  • use_node_labels (bool, default: False ) –

    Whether to use existing node labels instead of degrees

  • node_label_key (Optional[str], default: None ) –

    Node attribute key for labels if use_node_labels is True

  • digest_size (int, default: 4 ) –

    Number of bytes for hashing in intermediate WL iterations (1-4)

  • n_jobs (int, default: 1 ) –

    Number of workers for parallel computation

  • n_graphs_per_job (int, default: 100 ) –

    Number of graphs per worker

  • show_progress (bool, default: False ) –

    Whether to show a progress bar

polygraph.utils.descriptors.NormalizedDescriptor

Bases: GraphDescriptor[GraphType], Generic[GraphType]

Standardizes graph descriptors using reference graph statistics.

Wraps a graph descriptor to standardize its output features (zero mean, unit variance) based on statistics computed from a set of reference graphs. This is useful when different features have very different scales.

The wrapped graph descriptor must return a dense numpy array.

Parameters:
  • descriptor_fn (Callable[[Iterable[GraphType]], ndarray]) –

    Base descriptor function to normalize

  • ref_graphs (Iterable[GraphType]) –

    Reference graphs used to compute normalization statistics

Molecule Descriptors

polygraph.utils.descriptors.molecule_descriptors.TopoChemicalDescriptor

Bases: GraphDescriptor[Mol]

Computes topological properties.

polygraph.utils.descriptors.molecule_descriptors.FingerprintDescriptor

Bases: GraphDescriptor[Mol]

Computes molecular fingerprints.

Parameters:
  • dim (int, default: 128 ) –

    Dimension of the fingerprint

  • algorithm (Literal['rdkit', 'morgan'], default: 'morgan' ) –

    Algorithm to use for fingerprint generation. Either "rdkit" or "morgan".

polygraph.utils.descriptors.molecule_descriptors.LipinskiDescriptor

Bases: GraphDescriptor[Mol]

Physico-chemical properties of molecules.

polygraph.utils.descriptors.molecule_descriptors.ChemNetDescriptor

Bases: GraphDescriptor[Mol]

Random projection of ChemNet embeddings.

Parameters:
  • dim (int, default: 128 ) –

    Dimension of the projected embedding

polygraph.utils.descriptors.molecule_descriptors.MolCLRDescriptor

Bases: GraphDescriptor[Mol]

Random projection of MolCLR embeddings.

Parameters:
  • dim (int, default: 128 ) –

    Dimension of the projected embedding

  • batch_size (int, default: 128 ) –

    Batch size for the model used during inference