Clustering: msmbuilder.clustering

MSMBuilder uses clustering on MD trajectories to discretize phase space. A number of clustering algorithms are provided, and each can be used with a variety of metrics (link to metrics page?) to produce a large set of possible discretizations.

Currently, the following clustering algorithms are available

KCenters, HybridKMedoids, Clarans, Hierarchical

Abstract Classes

class msmbuilder.clustering.BaseFlatClusterer(metric, trajectories=None, prep_trajectories=None)[source]

(Abstract) base class / mixin that Clusterers can extend. Provides convenience functions for the user.

To implement a clusterer using this base class, subclass it and define your init method to do the clustering you want, and then set self._generator_indices, self._assignments, and self._distances with the result.

For convenience (and to enable some of its functionality), let BaseFlatCluster prepare the trajectories for you by calling BaseFlatClusterer’s __init__ method and then using the prepared, concatenated trajectory self.ptraj for your clustering.

BaseFlatClusterer.get_distances() Extract the distance from each frame to its assigned cluster kcenter
BaseFlatClusterer.get_assignments() Assign the trajectories you passed into the constructor based on
BaseFlatClusterer.get_generators_as_traj() Get a trajectory containing the generators

Flat Clustering Classes

class msmbuilder.clustering.KCenters(metric, trajectories=None, prep_trajectories=None, k=None, distance_cutoff=None, seed=0)[source]

Bases: msmbuilder.clustering.BaseFlatClusterer

KCenters.__init__(metric[, trajectories, ...]) Run kcenters clustering algorithm.
KCenters.get_distances() Extract the distance from each frame to its assigned cluster kcenter
KCenters.get_assignments() Assign the trajectories you passed into the constructor based on
KCenters.get_generators_as_traj() Get a trajectory containing the generators
class msmbuilder.clustering.HybridKMedoids(metric, trajectories=None, prep_trajectories=None, k=None, distance_cutoff=None, local_num_iters=10, global_num_iters=0, norm_exponent=2.0, too_close_cutoff=0.0001, ignore_max_objective=False)[source]

Bases: msmbuilder.clustering.BaseFlatClusterer

HybridKMedoids.__init__(metric[, ...]) Run the hybrid kmedoids clustering algorithm on a set of trajectories
HybridKMedoids.get_distances() Extract the distance from each frame to its assigned cluster kcenter
HybridKMedoids.get_assignments() Assign the trajectories you passed into the constructor based on
HybridKMedoids.get_generators_as_traj() Get a trajectory containing the generators
class msmbuilder.clustering.Clarans(metric, trajectories=None, prep_trajectories=None, k=None, num_local_minima=10, max_neighbors=20, local_swap=False)[source]

Bases: msmbuilder.clustering.BaseFlatClusterer

Clarans.__init__(metric[, trajectories, ...]) Run the CLARANS clustering algorithm on the frames in a trajectory
Clarans.get_distances() Extract the distance from each frame to its assigned cluster kcenter
Clarans.get_assignments() Assign the trajectories you passed into the constructor based on
Clarans.get_generators_as_traj() Get a trajectory containing the generators
class msmbuilder.clustering.SubsampledClarans(metric, trajectories=None, prep_trajectories=None, k=None, num_samples=None, shrink_multiple=None, num_local_minima=10, max_neighbors=20, local_swap=False, parallel=None)[source]

Bases: msmbuilder.clustering.BaseFlatClusterer

SubsampledClarans.__init__(metric[, ...]) Run the CLARANS algorithm (see the Clarans class for more description) on
SubsampledClarans.get_distances() Extract the distance from each frame to its assigned cluster kcenter
SubsampledClarans.get_assignments() Assign the trajectories you passed into the constructor based on
SubsampledClarans.get_generators_as_traj() Get a trajectory containing the generators

Hierarchical Clustering

class msmbuilder.clustering.Hierarchical(metric, trajectories, method='single', precomputed_values=None)[source]
Hierarchical.get_assignments([k, ...]) Assign the frames into clusters.
Hierarchical.load_from_disk(filename) Load up a clusterer from disk
Hierarchical.save_to_disk(filename) Save this clusterer to disk.

Clustering Functions

_kcenters(metric, ptraj[, k, ...]) Run kcenters clustering algorithm.
_hybrid_kmedoids(metric, ptraj[, k, ...]) Run the hybrid kmedoids clustering algorithm to cluster a trajectory
_clarans(metric, ptraj, k, num_local_minima, ...) Run the CLARANS clustering algorithm on the frames in a trajectory

Utility Functions

_assign(metric, ptraj, generator_indices) Assign the frames in ptraj to the centers with indices generator_indices
concatenate_trajectories(trajectories) Concatenate a list of trajectories into a single long trajectory
unconcatenate_trajectory(trajectory, lengths) Take a single trajectory that was created by concatenating seperate trajectories and unconcenatenate it, returning the original trajectories.
split(longlist, lengths) Split a long list into segments
stochastic_subsample(trajectories, ...) Randomly subsample from a trajectory
deterministic_subsample(trajectories, stride) Given a list of trajectories, return a single trajectory
empty_trajectory_like(traj) Get a trajectory with the right metadata, but no xyz coordinates
p_norm(data[, p]) p_norm of an ndarray with XYZ coordinates