msmbuilder.assigning.assign_with_checkpoint

msmbuilder.assigning.assign_with_checkpoint(metric, project, generators, assignments_path, distances_path, chunk_size=10000, atom_indices_to_load=None)[source]

Assign every frame to its closest generator

The results will be checkpointed along the way, trajectory by trajectory. If the process is killed, it should be able to roughly pick up where it left off.

Parameters:

metric : msmbuilder.metrics.AbstractDistanceMetric

A distance metric used to define “closest”

project : msmbuilder.Project

Used to load the trajectories

generators : msmbuilder.Trajectory

A trajectory containing the structures of all of the cluster centers

assignments_path : str

Path to a file that contains/will contain the assignments, as a 2D array of integers in hdf5 format

distances_path : str

Path to a file that contains/will contain the assignments, as a 2D array of integers in hdf5 format

chunk_size : int

The number of frames to load and process per step. The optimal number here depends on your system memory – it should probably be roughly the number of frames you can fit in memory at any one time. Note, this is only important if your trajectories are long, as the effective chunk_size is really min(traj_length, chunk_size)

atom_indices_to_load : {None, list}

The indices of the atoms to load for each trajectory chunk. Note that this method is responsible for loading up atoms from the project, but does NOT load up the generators. Those are passed in as a trajectory object (above). So if the generators are already subsampled to a restricted set of atom indices, but the trajectories on disk are NOT, you’ll need to pass in a set of indices here to resolve the difference.

See also

assign_in_memory