msmbuilder.assigning.assign_with_checkpoint¶
- msmbuilder.assigning.assign_with_checkpoint(metric, project, generators, assignments_path, distances_path, chunk_size=10000, atom_indices_to_load=None)[source]¶
Assign every frame to its closest generator
The results will be checkpointed along the way, trajectory by trajectory. If the process is killed, it should be able to roughly pick up where it left off.
Parameters: metric : msmbuilder.metrics.AbstractDistanceMetric
A distance metric used to define “closest”
project : msmbuilder.Project
Used to load the trajectories
generators : msmbuilder.Trajectory
A trajectory containing the structures of all of the cluster centers
assignments_path : str
Path to a file that contains/will contain the assignments, as a 2D array of integers in hdf5 format
distances_path : str
Path to a file that contains/will contain the assignments, as a 2D array of integers in hdf5 format
chunk_size : int
The number of frames to load and process per step. The optimal number here depends on your system memory – it should probably be roughly the number of frames you can fit in memory at any one time. Note, this is only important if your trajectories are long, as the effective chunk_size is really min(traj_length, chunk_size)
atom_indices_to_load : {None, list}
The indices of the atoms to load for each trajectory chunk. Note that this method is responsible for loading up atoms from the project, but does NOT load up the generators. Those are passed in as a trajectory object (above). So if the generators are already subsampled to a restricted set of atom indices, but the trajectories on disk are NOT, you’ll need to pass in a set of indices here to resolve the difference.
See also