Additional utilities
This module defines miscellaneous utility functions that is public to users.
- prody.utilities.catchall.calcGromacsClusters(rmsd_matrix, c, labels=None)
Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
- prody.utilities.catchall.calcGromosClusters(rmsd_matrix, c, labels=None)
Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
- prody.utilities.catchall.calcRMSDclusters(rmsd_matrix, c, labels=None)[source]
Divide rmsd_matrix into clusters using the gromos method with a cutoff c as implemented in gromacs (see https://manual.gromacs.org/documentation/current/onlinehelp/gmx-cluster.html)
Returns a list of lists with labels divided into clusters.
- prody.utilities.catchall.calcTree(names, distance_matrix, method='upgma', linkage=False)[source]
Given a distance matrix, it creates an returns a tree structure.
- Parameters:
names (list,
ndarray) – a list of namesdistance_matrix (
ndarray) – a square matrix with length of ensemble. If numbers does not match names it will raise an errormethod (str) – method used for constructing the tree. Acceptable options are
"upgma","nj", or methods supported bylinkage()such as"single","average","ward", etc. Default is"upgma"linkage (bool) – whether the linkage matrix is returned. Note that NJ trees do not support linkage
- prody.utilities.catchall.clusterMatrix(distance_matrix=None, similarity_matrix=None, labels=None, return_linkage=None, **kwargs)[source]
Cluster a distance matrix using scipy.cluster.hierarchy and return the sorted matrix, indices used for sorting, sorted labels (if labels are passed), and linkage matrix (if return_linkage is True).
- Parameters:
distance_matrix (
ndarray) – an N-by-N matrix containing some measure of distance such as 1. - seqid_matrix (Hamming distance), rmsds, or distances in PCA spacesimilarity_matrix (
ndarray) – an N-by-N matrix containing some measure of similarity such as sequence identity, mode-mode overlap, or spectral overlap. Each element will be subtracted from 1. to get distance, so make sure this is reasonable.labels (list) – labels for each matrix row that can be returned sorted
no_plot (bool) – if True, don’t plot the dendrogram. default is True
reversed (bool) – if set to True, then the sorting indices will be reversed.
Other arguments for
linkage()anddendrogram()can also be provided and will be taken as kwargs.
- prody.utilities.catchall.clusterSubfamilies(similarities, n_clusters=0, linkage='all', method='tsne', cutoff=0.0, **kwargs)[source]
Perform clustering based on members of the ensemble projected into lower a reduced dimension.
- Parameters:
similarities (
ndarray) – a matrix of similarities for each structure in the ensemble, such as RMSD-matrix, dynamics-based spectral overlap, sequence similarityn_clusters (int) – the number of clusters to generate. If 0, will scan a range of number of clusters and return the best one based on highest silhouette score. Default is 0.
linkage (str, list, tuple,
ndarray) – if all, will test all linkage types (ward, average, complete, single). Otherwise will use only the one(s) given as input. Default is all.method (str) – if set to spectral, will generate a Kirchoff matrix based on the cutoff value given and use that as input as clustering instead of the values themselves. Default is tsne.
cutoff (float) – only used if method is set to spectral. This value is used for generating the Kirchoff matrix to use for generating clusters when doing spectral clustering. Default is 0.0.
- prody.utilities.catchall.findSubgroups(tree, c, method='naive', **kwargs)[source]
Divide tree into subgroups using a criterion method and a cutoff c. Returns a list of lists with labels divided into subgroups.
- prody.utilities.catchall.getCoords(data)[source]
Get coordinates from data if possible and handle errors well.
- Parameters:
data (
numpy.ndarray,Atomic,Ensemble,Trajectory) – a coordinate set or an object withgetCoordsmethod
- prody.utilities.catchall.getLinkage(names, tree)[source]
Obtain the
linkage()matrix encodingtree.- Parameters:
names (list,
ndarray) – a list of names, the order determines the values in the linkage matrixtree (
Tree) – tree to be converted
- prody.utilities.catchall.getTreeFromLinkage(names, linkage)[source]
Obtain the tree encoded by
linkage.
- prody.utilities.catchall.printAtomicMatrix(matrix, atoms=None, step=10, fmt='%8d', sep='\t')[source]
Prints a new table for a matrix with atom labels along the top and at the beginning of each line.
- prody.utilities.catchall.reorderMatrix(names, matrix, tree, axis=None)[source]
Reorder a matrix based on a tree and return the reordered matrix and indices for reordering other things.
- Parameters:
names (list) – a list of names associated with the rows of the matrix These names must match the ones used to generate the tree
matrix (
ndarray) – any square matrixtree (
Tree) – any tree fromcalcTree()axis (int) – along which axis the matrix should be reordered. Default is None which reorder along all the axes
- prody.utilities.catchall.showBars(ydata, xdata=None, *args, **kwargs)[source]
Show 1-D data using
bar().- Parameters:
x (
ndarray) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors. If not provided, a range with the length of the y data will be used.y (
ndarray) – data array. y can be an 1-D array or a 2-D matrix of column vectors.ticklabels (list) – user-defined tick labels for x-axis.
- prody.utilities.catchall.showLines(*args, **kwargs)[source]
Show 1-D data using
plot().- Parameters:
x (
ndarray) – (optional) x coordinates. x can be an 1-D array or a 2-D matrix of column vectors.y (
ndarray) – data array. y can be an 1-D array or a 2-D matrix of column vectors.dy (
ndarray) – an array of variances of y which will be plotted as a band along y. It should have the same shape with y.lower (
ndarray) – an array of lower bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with upper.upper (
ndarray) – an array of upper bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with lower.alpha (float) – the transparency of the band(s) for plotting dy.
beta (float) – the transparency of the band(s) for plotting miny and maxy.
ticklabels (list) – user-defined tick labels for x-axis.
- prody.utilities.catchall.showMatrix(matrix, x_array=None, y_array=None, **kwargs)[source]
Show a matrix using
imshow()orscatter()if markersize is provided.Curves on x- and y-axis can be added.
- Parameters:
matrix (
ndarray) – matrix to be displayedx_array (
ndarray) – data to be plotted above the matrixy_array (
ndarray) – data to be plotted on the left side of the matrixpercentile (float) – a percentile threshold to remove outliers, i.e. only showing data within p-th to 100-p-th percentile
vmin (float) – a minimum value threshold to remove outliers, i.e. only showing data greater than vmin This overrides percentile.
vmax (float) – a maximum value threshold to remove outliers, i.e. only showing data less than vmax This overrides percentile.
interactive (bool) – turn on or off the interactive options
xtickrotation (float) – how much to rotate the xticklabels in degrees default is 0
markersize (float) – size of square markers for using
scatter()to help show matrices with small data regions compared to zeros. Note only non-zeros are plotted so the colorbar range may change if not using norm Default is None, which results in usingimshow()