Additional utilities

This module defines miscellaneous utility functions that is public to users.

prody.utilities.catchall.calcGromacsClusters(rmsd_matrix, c, labels=None)

Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)

Returns a list of lists with labels divided into clusters.

prody.utilities.catchall.calcGromosClusters(rmsd_matrix, c, labels=None)

Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)

Returns a list of lists with labels divided into clusters.

prody.utilities.catchall.calcRMSDclusters(rmsd_matrix, c, labels=None)[source]

Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)

Returns a list of lists with labels divided into clusters.

prody.utilities.catchall.calcTree(names, distance_matrix, method='upgma', linkage=False)[source]

Given a distance matrix, it creates an returns a tree structure.

Parameters:
  • names (list, ndarray) – a list of names

  • distance_matrix (ndarray) – a square matrix with length of ensemble. If numbers does not match names it will raise an error

  • method (str) – method used for constructing the tree. Acceptable options are "upgma", "nj", or methods supported by linkage() such as "single", "average", "ward", etc. Default is "upgma"

  • linkage (bool) – whether the linkage matrix is returned. Note that NJ trees do not support linkage

prody.utilities.catchall.clusterMatrix(distance_matrix=None, similarity_matrix=None, labels=None, return_linkage=None, **kwargs)[source]

Cluster a distance matrix using scipy.cluster.hierarchy and return the sorted matrix, indices used for sorting, sorted labels (if labels are passed), and linkage matrix (if return_linkage is True).

Parameters:
  • distance_matrix (ndarray) – an N-by-N matrix containing some measure of distance such as 1. - seqid_matrix (Hamming distance), rmsds, or distances in PCA space

  • similarity_matrix (ndarray) – an N-by-N matrix containing some measure of similarity such as sequence identity, mode-mode overlap, or spectral overlap. Each element will be subtracted from 1. to get distance, so make sure this is reasonable.

  • labels (list) – labels for each matrix row that can be returned sorted

  • no_plot (bool) – if True, don’t plot the dendrogram. default is True

  • reversed (bool) – if set to True, then the sorting indices will be reversed.

Other arguments for linkage() and dendrogram() can also be provided and will be taken as kwargs.

prody.utilities.catchall.clusterSubfamilies(similarities, n_clusters=0, linkage='all', method='tsne', cutoff=0.0, **kwargs)[source]

Perform clustering based on members of the ensemble projected into lower a reduced dimension.

Parameters:
  • similarities (ndarray) – a matrix of similarities for each structure in the ensemble, such as RMSD-matrix, dynamics-based spectral overlap, sequence similarity

  • n_clusters (int) – the number of clusters to generate. If 0, will scan a range of number of clusters and return the best one based on highest silhouette score. Default is 0.

  • linkage (str, list, tuple, ndarray) – if all, will test all linkage types (ward, average, complete, single). Otherwise will use only the one(s) given as input. Default is all.

  • method (str) – if set to spectral, will generate a Kirchoff matrix based on the cutoff value given and use that as input as clustering instead of the values themselves. Default is tsne.

  • cutoff (float) – only used if method is set to spectral. This value is used for generating the Kirchoff matrix to use for generating clusters when doing spectral clustering. Default is 0.0.

prody.utilities.catchall.findSubgroups(tree, c, method='naive', **kwargs)[source]

Divide tree into subgroups using a criterion method and a cutoff c. Returns a list of lists with labels divided into subgroups.

prody.utilities.catchall.getCoords(data)[source]

Get coordinates from data if possible and handle errors well.

Parameters:

data (numpy.ndarray, Atomic, Ensemble, Trajectory) – a coordinate set or an object with getCoords method

prody.utilities.catchall.getLinkage(names, tree)[source]

Obtain the linkage() matrix encoding tree.

Parameters:
  • names (list, ndarray) – a list of names, the order determines the values in the linkage matrix

  • tree (Tree) – tree to be converted

prody.utilities.catchall.getTreeFromLinkage(names, linkage)[source]

Obtain the tree encoded by linkage.

Parameters:
  • names (list, ndarray) – a list of names, the order should correspond to the values in linkage

  • linkage (ndarray) – linkage matrix

prody.utilities.catchall.printAtomicMatrix(matrix, atoms=None, step=10, fmt='%8d', sep='\t')[source]

Prints a new table for a matrix with atom labels along the top and at the beginning of each line.

Parameters:
  • matrix (tuple, list, ndarray) – any square 2D data with a size matching atoms

  • atoms (Atomic) – any Atomic object to label the data

prody.utilities.catchall.reorderMatrix(names, matrix, tree, axis=None)[source]

Reorder a matrix based on a tree and return the reordered matrix and indices for reordering other things.

Parameters:
  • names (list) – a list of names associated with the rows of the matrix These names must match the ones used to generate the tree

  • matrix (ndarray) – any square matrix

  • tree (Tree) – any tree from calcTree()

  • axis (int) – along which axis the matrix should be reordered. Default is None which reorder along all the axes

prody.utilities.catchall.showBars(ydata, xdata=None, *args, **kwargs)[source]

Show 1-D data using bar().

Parameters:
  • x (ndarray) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors. If not provided, a range with the length of the y data will be used.

  • y (ndarray) – data array. y can be an 1-D array or a 2-D matrix of column vectors.

  • ticklabels (list) – user-defined tick labels for x-axis.

prody.utilities.catchall.showLines(*args, **kwargs)[source]

Show 1-D data using plot().

Parameters:
  • x (ndarray) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors.

  • y (ndarray) – data array. y can be an 1-D array or a 2-D matrix of column vectors.

  • dy (ndarray) – an array of variances of y which will be plotted as a band along y. It should have the same shape with y.

  • lower (ndarray) – an array of lower bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with upper.

  • upper (ndarray) – an array of upper bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with lower.

  • alpha (float) – the transparency of the band(s) for plotting dy.

  • beta (float) – the transparency of the band(s) for plotting miny and maxy.

  • ticklabels (list) – user-defined tick labels for x-axis.

prody.utilities.catchall.showMatrix(matrix, x_array=None, y_array=None, **kwargs)[source]

Show a matrix using imshow() or scatter() if markersize is provided.

Curves on x- and y-axis can be added.

Parameters:
  • matrix (ndarray) – matrix to be displayed

  • x_array (ndarray) – data to be plotted above the matrix

  • y_array (ndarray) – data to be plotted on the left side of the matrix

  • percentile (float) – a percentile threshold to remove outliers, i.e. only showing data within p-th to 100-p-th percentile

  • vmin (float) – a minimum value threshold to remove outliers, i.e. only showing data greater than vmin This overrides percentile.

  • vmax (float) – a maximum value threshold to remove outliers, i.e. only showing data less than vmax This overrides percentile.

  • interactive (bool) – turn on or off the interactive options

  • xtickrotation (float) – how much to rotate the xticklabels in degrees default is 0

  • markersize (float) – size of square markers for using scatter() to help show matrices with small data regions compared to zeros. Note only non-zeros are plotted so the colorbar range may change if not using norm Default is None, which results in using imshow()