Distance Metrics: msmbuilder.metrics

Distance metrics in MSMBuilder object which compute the the distance between two frames from an MD simulation. A variety of distance metrics are supported, which all implement the same interface defined in msmbuilder.metrics.AbstractDistanceMetric.

The basic architecture of the distance metrics are that all of their configurables are set in an __init__() method which is specific to each metric. A method called prepare_trajectory(self, trajectory) takes as input a msmbuilder.Trajectory.Trajectory object and returns a python container object (at this point either a custom container supporting slice operations for RMSD or a numpy array for every other metric). This method does any required preprocessing such as extracting the dihedral angles for the msmbuilder.metrics.Dihedral metric. Then, distances can be computed using the metric object and the prepared trajectory by invoking a variety of methods on the metric object with the prepared trajectories, e.g. one_to_all(self, ptraj1, ptraj2, index1) to compute the distance from one frame to every frame in a trajectory, one_to_many(self, ptraj1, ptraj2, index1, indices2) to compute the distance from one frame to a collection of frames in a trajectory, or other similar metrics.

Currently, the following distance metrics are implemented:

msmbuilder.metrics.RMSD Measures the cartesians room mean square deviation between corresponding atoms in two frames after rotating and translating the two frames to bring them into maximum coincidence.

msmbuilder.metrics.Dihedral Measures the difference in the dihedral angles between two frames. Because of the periodic symmetry of dihedral angles, this is implemented by comparing difference in the sine and cosine of each the dihedral angles, which is equivalent to the difference of the angles in the complex plane.

msmbuilder.metrics.ContinuousContact Each frame is represented by the pairwise distance between residues, and then frames are compared based on the difference between these sets of distances. The set of distances that are monitored is configurable. The sense in which the distance between residues is computed (C-alpha, closest heavy atom, etc) is configurable.

msmbuilder.metrics.BooleanContact Each frame is represented a set of booleans representing whether each of a set of residue-residue contacts is present or absent, and frames are compared based on the different between these sets of booleans.

msmbuilder.metrics.AtomPairs While ContinuousCotact monitors the distance between residues, AtomPairs monitors the distance between specific atoms. Each frame is represented by the pairwise distance between a set of atoms, and the distance between frames is computed based on the the distance between these vectors of pairwise distances.

msmbuilder.metrics.hybrid.Hybrid This class can be used to compose metrics additively. For example, you can have a metric that is 0.5 * rmsd + 0.5 * dihedral. Note that the different metrics are probably in different units, so you’ll have to be careful what weights you want to give them when you add them.

msmbuilder.metrics.hybrid.HybridPNorm This metric is used to compose other metrics by adding them in quadrature (p=2) or some other power mean. It is more general than msmbuilder.metrics.Hybrid, which is a special case for p=1.

Performance Notes

Most of the leg work for these distance metrics is done in C with shared memory parallelism (OpenMP) to take advantage of multicore architecture. The RMSD code has been highly optimized using SSE intrinsics, attention to cache performance, a faster matrix multiply for the Nx3 case than BLAS, etc. The other codes have been less highly optimized, but are algorithmically simpler than RMSD.

If you’d like to reduce the number of cores used by the distance metrics, you can set the OMP_NUM_THREADS environment variable. (e.g. export OMP_NUM_THREADS=1 in bash to use only 1 core/thread).

Kinetic Distance Metrics

New distance metrics for kinetic clustering are currently an active area of research in the Pande Lab.

Concrete Metrics

RMSD

class msmbuilder.metrics.RMSD(atomindices=None, omp_parallel=True)

Bases: msmbuilder.metrics.baseclasses.AbstractDistanceMetric

Compute distance between frames using the Room Mean Square Deviation over a specifiable set of atoms using the Theobald QCP algorithm

References

[R10]Theobald, D. L. Acta. Crystallogr., Sect. A 2005, 61, 478-480.
RMSD.__init__([atomindices, omp_parallel]) Initalize an RMSD calculator
RMSD.prepare_trajectory(trajectory) Prepare the trajectory for RMSD calculation.
RMSD.one_to_many(prepared_traj1, ...) Calculate a vector of distances from one frame of the first trajectory
RMSD.one_to_all(prepared_traj1, ...) Calculate a vector of distances from one frame of the first trajectory
RMSD.all_pairwise(prepared_traj) Calculate condensed distance metric of all pairwise distances

Vectorized Metrics

class msmbuilder.metrics.Dihedral(metric='euclidean', p=2, angles='phi/psi', userfilename='DihedralIndices.dat', V=None, VI=None, indices=None)

Bases: msmbuilder.metrics.baseclasses.Vectorized, msmbuilder.metrics.baseclasses.AbstractDistanceMetric

Distance metric for calculating distances between frames based on their projection in dihedral space.

Dihedral.__init__([metric, p, angles, ...]) Create a distance metric to act on torison angles
Dihedral.prepare_trajectory(trajectory) Prepare the dihedral angle representation of a trajectory, suitable for distance calculations.
Dihedral.one_to_all(prepared_traj1, ...) Measure the distance from one frame to every frame in a trajectory
Dihedral.one_to_many(prepared_traj1, ...) Calculate a vector of distances from one frame of the first trajectory
Dihedral.many_to_many(prepared_traj1, ...) Get a matrix of distances from each frame in a set to each other frame in a second set.
Dihedral.all_pairwise(prepared_traj) Calculate a condense” distance matrix of all the pairwise distances
Dihedral.all_to_all(prepared_traj1, ...) Get a matrix of distances from all frames in one traj to all frames in
class msmbuilder.metrics.ContinuousContact(metric='euclidean', p=2, contacts='all', scheme='closest-heavy', V=None, VI=None)

Bases: msmbuilder.metrics.baseclasses.Vectorized, msmbuilder.metrics.baseclasses.AbstractDistanceMetric

Distance metric for calculating distances between frames based on the pairwise distances between residues.

Here each frame is represented as a vector of the distances between pairs of residues.

ContinuousContact.__init__([metric, p, ...]) Create a distance calculator based on the distances between pairs of atoms in a sturcture – like the contact map except without casting to boolean.
ContinuousContact.prepare_trajectory(trajectory) Prepare a trajectory for distance calculations based on the contact map.
ContinuousContact.one_to_all(prepared_traj1, ...) Measure the distance from one frame to every frame in a trajectory
ContinuousContact.one_to_many(...) Calculate a vector of distances from one frame of the first trajectory
ContinuousContact.many_to_many(...) Get a matrix of distances from each frame in a set to each other frame in a second set.
ContinuousContact.all_pairwise(prepared_traj) Calculate a condense” distance matrix of all the pairwise distances
ContinuousContact.all_to_all(prepared_traj1, ...) Get a matrix of distances from all frames in one traj to all frames in
class msmbuilder.metrics.BooleanContact(metric='matching', contacts='all', cutoff=0.5, scheme='closest-heavy')

Bases: msmbuilder.metrics.baseclasses.Vectorized, msmbuilder.metrics.baseclasses.AbstractDistanceMetric

Distance metric for calculating distances between frames based on their contact maps.

Here each frame is represented as a vector of booleans representing whether the distance between pairs of residues is less than a cutoff.

BooleanContact.__init__([metric, contacts, ...]) Create a distance metric that will measure the distance between frames
BooleanContact.prepare_trajectory(trajectory) Prepare a trajectory for distance calculations based on the contact map.
BooleanContact.one_to_all(prepared_traj1, ...) Measure the distance from one frame to every frame in a trajectory
BooleanContact.one_to_many(prepared_traj1, ...) Calculate a vector of distances from one frame of the first trajectory
BooleanContact.many_to_many(prepared_traj1, ...) Get a matrix of distances from each frame in a set to each other frame in a second set.
BooleanContact.all_pairwise(prepared_traj) Calculate a condense” distance matrix of all the pairwise distances
BooleanContact.all_to_all(prepared_traj1, ...) Get a matrix of distances from all frames in one traj to all frames in
class msmbuilder.metrics.AtomPairs(metric='cityblock', p=1, atom_pairs=None, V=None, VI=None)

Bases: msmbuilder.metrics.baseclasses.Vectorized, msmbuilder.metrics.baseclasses.AbstractDistanceMetric

Concrete distance metric that monitors the distance between certain pairs of atoms (as opposed to certain pairs of residues as ContinuousContact does)

AtomPairs.__init__([metric, p, atom_pairs, ...]) Atom pairs should be a N x 2 array of the N pairs of atoms
AtomPairs.prepare_trajectory(trajectory)
AtomPairs.one_to_all(prepared_traj1, ...) Measure the distance from one frame to every frame in a trajectory
AtomPairs.one_to_many(prepared_traj1, ...) Calculate a vector of distances from one frame of the first trajectory
AtomPairs.many_to_many(prepared_traj1, ...) Get a matrix of distances from each frame in a set to each other frame in a second set.
AtomPairs.all_pairwise(prepared_traj) Calculate a condense” distance matrix of all the pairwise distances
AtomPairs.all_to_all(prepared_traj1, ...) Get a matrix of distances from all frames in one traj to all frames in

Combination Metrics

class msmbuilder.metrics.hybrid.Hybrid(base_metrics, weights)[source]

A linear combination of other distance metrics

class msmbuilder.metrics.hybrid.HybridPNorm(base_metrics, weights, p=2)[source]

A p-norm combination of other distance metrics. With p=2 for instance, this gives you the root mean square combination of the base metrics

Abstract Classes

class msmbuilder.metrics.baseclasses.AbstractDistanceMetric[source]

Abstract base class for distance metrics. All distance metrics should inherit from this abstract class.

Provides a niave implementation of all_pairwise and one_to_many in terms of the abstract method one_to_all, which may be overridden by subclasses.

AbstractDistanceMetric.prepare_trajectory(...) Prepare trajectory on a format that is more conventient to take distances on.
AbstractDistanceMetric.all_pairwise(...) Calculate condensed distance metric of all pairwise distances
AbstractDistanceMetric.one_to_all(...) Calculate the vector of distances from the index1th frame of prepared_traj1 to all of the frames in prepared_traj2.
AbstractDistanceMetric.one_to_many(...) Calculate the a vector of distances from the index1th frame of prepared_traj1 to all of the indices2 frames of prepared_traj2.
class msmbuilder.metrics.baseclasses.Vectorized(metric='euclidean', p=2, V=None, VI=None)[source]

Represent MSM frames as vectors in some arbitrary vector space, and then use standard vector space metrics.

Some examples of this might be extracting the contact map or dihedral angles.

In order to be a full featured DistanceMetric, a subclass of Vectorized implements its own prepared_trajectory() method, Vectorized provides the remainder.

allowable_scipy_metrics gives the list of metrics which your client can use. If the vector space that you’re projecting your trajectory onto is just a space of boolean vectors, then you probably don’t want to allow eulcidean distance for instances.

default_scipy_metric is the metric that will be used by your default metric if the user leaves the ‘metric’ field blank/unspecified.

default_scipy_p is the default value of ‘p’ that will be used if left unspecified. the value ‘p’ is ONLY used for the minkowski (pnorm) metric, so otherwise the scipy.spatial.distance code ignores it anyways.

See http://docs.scipy.org/doc/scipy/reference/spatial.distance.html for a description of all the distance metrics and how they work.

Vectorized.__init__([metric, p, V, VI]) Create a Vectorized metric
Vectorized.prepare_trajectory(trajectory) Prepare trajectory on a format that is more conventient to take distances on.
Vectorized.all_pairwise(prepared_traj) Calculate a condense” distance matrix of all the pairwise distances
Vectorized.all_to_all(prepared_traj1, ...) Get a matrix of distances from all frames in one traj to all frames in
Vectorized.many_to_many(prepared_traj1, ...) Get a matrix of distances from each frame in a set to each other frame in a second set.
Vectorized.one_to_all(prepared_traj1, ...) Measure the distance from one frame to every frame in a trajectory
Vectorized.one_to_many(prepared_traj1, ...) Calculate a vector of distances from one frame of the first trajectory

Utility Methods

fast_cdist(XA, XB[, metric, p, V, VI]) Computes distance between each pair of the two collections of inputs.
fast_pdist(X[, metric, p, V, VI]) Computes the pairwise distances between m original observations in n-dimensional space.