API¶
-
graphpca.
draw_graph
(nx_graph)¶ Draws the input graph on two axes with lines between the nodes
Positions of the nodes are determined with reduce_graph, of course.
- Parameters
nx_graph (
nx.Graph
ornx.DiGraph
) – The graph to be plotted
-
graphpca.
reduce_graph
(nx_graph, output_dim)¶ Run PCA on the ETCD of the input NetworkX graph
The best algorithm and parameters for doing so are selected dynamically, based on the size of the graph. A graph G with number of nodes n < 50 will use the naive algorithm, reduce_graph_naively, which has more stable behaviour at low node counts. Above that will use reduce_graph_efficiently. For such graphs the connectivity is checked, and if the graph has 20 or more connected components we use the add_supernode trick.
- Parameters
nx_graph (
nx.Graph
ornx.DiGraph
) – The graph to be reducedoutput_dim (int) – The number of dimensions to reduce to
-
graphpca.
reduce_graph_efficiently
(nx_graph, output_dim, add_supernode=False, eigendecomp_strategy='smart')¶ Run PCA on the ETCD of the input NetworkX graph
We skip calculating the actual ETCD for efficiency. The ETCD is given by the Moore-Penrose pseudoinverse of the Laplacian of the input graph. The input graph is G, the Laplacian is L, and its pseudoinverse is pinv(L). We actually only care about the eigenvectors associated with the top output_dim eigenvalues. Therefore we use the fact that:
eigvals(pinv(A)) == [1/e for e in eigvals(A) if e != 0 else e]
and the corresponding eigenvectors are the same. Further, we only care about the top output_dim eigenpairs of pinv(L), which correspond to the smallest nonzero eigenvalues of L. We use scipy.sparse.linalg.eigs with which=SM to calculate eigenpairs, which includes zero eigenpairs. Therefore in order to calculate the smallest nonzero eigenpairs we need to calculate the smallest
output_dim + nullity
eigenpairs. We compute the nullity using the convenient fact that the nullity of L is equal to the number of connected components in G.- Parameters
nx_graph (
nx.Graph
ornx.DiGraph
) – The graph to be reducedoutput_dim (int) – The number of dimensions to reduce to
add_supernode (bool) – If True, adds a node to the graph that is connected to every other node in the graph. This reduces the nullspace of the Laplacian to 1, making there many fewer eigenpairs that need to be computed. The cost is minor information loss.
eigendecomp_strategy ('exact' | 'sparse' | 'smart') –
Chooses the eigendecomp strategy. ‘exact’ uses numpy.linalg.eigh on a dense matrix. Calculates all
eigenpairs and then strips to just the necessary ones.
- ’sparse’ uses numpy.sparse.linalg.eigsh on a sparse matrix.
Calculates just the necessary eigenpairs. Is an iterative- approximative algorithm, and so sometimes yields things that are not amazing, especially for edge cases.
’smart’ uses ‘exact’ if n < 1000, ‘sparse’ otherwise.
- Returns
The reduced data in output_dim dimensions
- Return type
numpy.ndarray
-
graphpca.
reduce_graph_naively
(nx_graph, output_dim, eigendecomp_strategy='exact')¶ Run PCA on the ETCD of a NetworkX graph using a slow but precise method
This is the method that calculates the actual ETCD. It calculates the Moore-Penrose pseudoinverse of the Laplacian of the input graph. We return the first output_dim dimensions of the ETCD, ordered by decreasing eigenvalue.
This method starts to take a very, very long time as graph size reaches into the thousands due to the matrix inversion.
- Parameters
nx_graph (
nx.Graph
ornx.DiGraph
) – The graph to be reducedoutput_dim (int) – The number of dimensions to reduce to
eigendecomp_strategy ('exact' | 'sparse' | 'smart') –
Chooses the eigendecomp strategy. ‘exact’ uses numpy.linalg.eigh on a dense matrix. Calculates all
eigenpairs and then strips to just the necessary ones.
- ’sparse’ uses numpy.sparse.linalg.eigsh on a sparse matrix.
Calculates just the necessary eigenpairs. Is an iterative- approximative algorithm, and so sometimes yields things that are not amazing, especially for edge cases.
’smart’ uses ‘exact’ if n < 1000, ‘sparse’ otherwise.
- Returns
The reduced data in output_dim dimensions
- Return type
numpy.ndarray