API

graphpca.draw_graph(nx_graph)

Draws the input graph on two axes with lines between the nodes

Positions of the nodes are determined with reduce_graph, of course.

Parameters

nx_graph (nx.Graph or nx.DiGraph) – The graph to be plotted

graphpca.reduce_graph(nx_graph, output_dim)

Run PCA on the ETCD of the input NetworkX graph

The best algorithm and parameters for doing so are selected dynamically, based on the size of the graph. A graph G with number of nodes n < 50 will use the naive algorithm, reduce_graph_naively, which has more stable behaviour at low node counts. Above that will use reduce_graph_efficiently. For such graphs the connectivity is checked, and if the graph has 20 or more connected components we use the add_supernode trick.

Parameters
  • nx_graph (nx.Graph or nx.DiGraph) – The graph to be reduced

  • output_dim (int) – The number of dimensions to reduce to

graphpca.reduce_graph_efficiently(nx_graph, output_dim, add_supernode=False, eigendecomp_strategy='smart')

Run PCA on the ETCD of the input NetworkX graph

We skip calculating the actual ETCD for efficiency. The ETCD is given by the Moore-Penrose pseudoinverse of the Laplacian of the input graph. The input graph is G, the Laplacian is L, and its pseudoinverse is pinv(L). We actually only care about the eigenvectors associated with the top output_dim eigenvalues. Therefore we use the fact that:

eigvals(pinv(A)) == [1/e for e in eigvals(A) if e != 0 else e]

and the corresponding eigenvectors are the same. Further, we only care about the top output_dim eigenpairs of pinv(L), which correspond to the smallest nonzero eigenvalues of L. We use scipy.sparse.linalg.eigs with which=SM to calculate eigenpairs, which includes zero eigenpairs. Therefore in order to calculate the smallest nonzero eigenpairs we need to calculate the smallest output_dim + nullity eigenpairs. We compute the nullity using the convenient fact that the nullity of L is equal to the number of connected components in G.

Parameters
  • nx_graph (nx.Graph or nx.DiGraph) – The graph to be reduced

  • output_dim (int) – The number of dimensions to reduce to

  • add_supernode (bool) – If True, adds a node to the graph that is connected to every other node in the graph. This reduces the nullspace of the Laplacian to 1, making there many fewer eigenpairs that need to be computed. The cost is minor information loss.

  • eigendecomp_strategy ('exact' | 'sparse' | 'smart') –

    Chooses the eigendecomp strategy. ‘exact’ uses numpy.linalg.eigh on a dense matrix. Calculates all

    eigenpairs and then strips to just the necessary ones.

    ’sparse’ uses numpy.sparse.linalg.eigsh on a sparse matrix.

    Calculates just the necessary eigenpairs. Is an iterative- approximative algorithm, and so sometimes yields things that are not amazing, especially for edge cases.

    ’smart’ uses ‘exact’ if n < 1000, ‘sparse’ otherwise.

Returns

The reduced data in output_dim dimensions

Return type

numpy.ndarray

graphpca.reduce_graph_naively(nx_graph, output_dim, eigendecomp_strategy='exact')

Run PCA on the ETCD of a NetworkX graph using a slow but precise method

This is the method that calculates the actual ETCD. It calculates the Moore-Penrose pseudoinverse of the Laplacian of the input graph. We return the first output_dim dimensions of the ETCD, ordered by decreasing eigenvalue.

This method starts to take a very, very long time as graph size reaches into the thousands due to the matrix inversion.

Parameters
  • nx_graph (nx.Graph or nx.DiGraph) – The graph to be reduced

  • output_dim (int) – The number of dimensions to reduce to

  • eigendecomp_strategy ('exact' | 'sparse' | 'smart') –

    Chooses the eigendecomp strategy. ‘exact’ uses numpy.linalg.eigh on a dense matrix. Calculates all

    eigenpairs and then strips to just the necessary ones.

    ’sparse’ uses numpy.sparse.linalg.eigsh on a sparse matrix.

    Calculates just the necessary eigenpairs. Is an iterative- approximative algorithm, and so sometimes yields things that are not amazing, especially for edge cases.

    ’smart’ uses ‘exact’ if n < 1000, ‘sparse’ otherwise.

Returns

The reduced data in output_dim dimensions

Return type

numpy.ndarray