Skip to content

Description of Data Formats

To increase interoperability between external packages, we herein describe the different data formats used as input to and export from programs. Orientations on a per-particle bases are currently stored as Euler angles in ZYZ format with angles ordered as . Note that the package is still under development and the exact way the data is represented might change in the future.

Data from the match template program

The match template program collates statistics from a large number of cross-correlograms taken over an orientation and defocus search space. See the API on the MatchTemplateResult object for further information on how these data are stored, but here we provide an overview of what files get written to disk.

Best statistic maps

Each of the "best" statistics is stored on a per-position basis in what we dub "statistics maps" saved as .mrc files. We have the following tracked statistics for each valid (x, y) position:

  • Maximum Intensity Projection (MIP): Maximum cross-correlation value over the entire search space.
  • Scaled MIP (z-score): The MIP value normalized by the mean and variance of the cross-correlation over the entire search space.
  • Correlation Mean: The mean of the cross-correlation values over the entire search space. Used to calculate the scaled MIP.
  • Correlation Variance: The variance of the cross-correlation values over the entire search space. Used to calculate the scaled MIP.
  • Phi: The angle (in degrees) which produced the MIP value.
  • Theta: The angle (in degrees) which produced the MIP value.
  • Psi: The angle (in degrees) which produced the MIP value.
  • Defocus: The relative defocus value (in Angstroms, relative to estimated defocus of micrograph) which produced the MIP value.

Each of these statistics maps are saved to disk in the MRC format based on paths provided in the MatchTemplateResult object.

A note on correlation modes and output shapes

Three general modes for convolution/correlation exist in digital signal processing: "full", "same", and "valid". This chapter of Digital Signals Theory provides a good overview of these modes.

We use the the "valid" mode by default when saving these statistics maps, but they are initially stored in their "same" modes. The MatchTemplateResult.apply_valid_cropping method does this "same" to "valid" cropping. For an image with shape and template , the modes will output statistics maps with the following shapes:

  • same:
  • valid:

Note that same mode pads the image with zeros along the edges and does not increase the number of particles detectable; values along the padded portions of the edge do not hold significance in the context of the particle detection. In each case, the position in the map at corresponds to the top-left corner of the template at that position, not the center of the template.

Match template DataFrame

Since not all positions contain a particle from the match template search, using full statistics maps for downstream analysis can be inefficient in terms of speed, memory requirements, and code overhead. The match template manager class has the method MatchTemplateManager.results_to_dataframe() which automatically picks peaks within the scaled MIP map and stores the peak locations, orientations, and defocus values in a pandas DataFrame. We take a verbose approach to constructing this DataFrame where some columns store similar information about each particle.

Additional columns besides locations and orientations are included in the DataFrame to increase the utility of the data, namely the construction of ParticleStack objects. The columns and corresponding descriptions are as follows:

Column Name Type Description
mip float Maximum cross-correlation value over all search orientations and relative defocus values.
scaled_mip float Scaled MIP value (z-score) normalized by cross-correlation mean and variance.
correlation_mean float Mean of the cross-correlation values over the entire search space.
correlation_variance float Variance of the cross-correlation values over the entire search space.
total_correlations int Total number of cross-correlations taken over the search space (defoci times orientations).
pos_x int Particle x position (units of pixels) in the statistics maps. Corresponds to the top-left corner of the template.
pos_y int Particle y position (units of pixels) in the statistics maps. Corresponds to the top-left corner of the template.
pos_x_img int Center of of the particle (x position, units of pixels) in the micrograph.
pos_y_img int Center of of the particle (y position, units of pixels) in the micrograph.
pos_x_img_angstrom float Center of the particle (x position, in Angstroms) in the micrograph.
pos_y_img_angstrom float Center of the particle (y position, in Angstroms) in the micrograph.
psi float The angle (in degrees) which produced the MIP value.
theta float The angle (in degrees) which produced the MIP value.
phi float The angle which (in degrees) produced the MIP value.
relative_defocus float The relative defocus value (in Angstroms) which produced the MIP value. Relative to defocus_u and defocus_v.
defocus_u float Defocus value along the major axis for the micrograph (in Angstroms).
defocus_v float Defocus value along the minor axis for the micrograph (in Angstroms).
astigmatism_angle float Angle of the astigmatism (in degrees) for defocus.
pixel_size float Pixel size of the micrograph (in Angstroms).
voltage float Voltage of the microscope (in kV).
spherical_aberration float Spherical aberration of the microscope (in mm).
amplitude_contrast_ratio float Amplitude contrast ratio of the microscope.
phase_shift float Phase shift of the microscope (in degrees).
ctf_B_factor float B-factor of the CTF, in Angstroms^2.
micrograph_path str Path to the original micrograph.
template_path str Path to the template used for the search.
mip_path str Path to the saved MIP map.
scaled_mip_path str Path to the saved scaled MIP map.
psi_path str Path to the saved psi map.
theta_path str Path to the saved theta map.
phi_path str Path to the saved phi map.
defocus_path str Path to the saved defocus map.
correlation_average_path str Path to the saved correlation mean map.
correlation_variance_path str Path to the saved correlation variance map.

Data from the refine template program

The refine template program takes in the DataFrame from the match template program and refines the orientation & defocus values of each particle. Each of the refined parameters are stored in new columns prefixed with the refined_ string. Note that refined results can be re-refined, for example with a slightly different template, and the already refined parameters will be used.

Refine template DataFrame

The program outputs another DataFrame with additional columns for the refined orientations, defocus values, and positions. New columns with descriptions are listed below:

Column Name Type Description
refined_mip float New maximum cross-correlation over refinement search space.
refined_scaled_mip float New scaled MIP value (z-score) normalized by cross-correlation mean and variance.
refined_pos_x int The refined x position of the particle, top-left corner of the template.
refined_pos_y int The refined y position of the particle, top-left corner of the template.
refined_pos_x_img int The refined x position of the particle, center of the particle in the micrograph.
refined_pos_y_img int The refined y position of the particle, center of the particle in the micrograph.
refined_pos_x_img_angstrom float The refined x position of the particle, center of the particle in the micrograph (in Angstroms).
refined_pos_y_img_angstrom float The refined y position of the particle, center of the particle in the micrograph (in Angstroms).
refined_psi float The refined angle (in degrees).
refined_theta float The refined angle (in degrees).
refined_phi float The refined angle (in degrees).
refined_relative_defocus float The refined relative defocus value (in Angstroms).