Description of Data Formats
To increase interoperability between external packages, we herein describe the different data formats used as input to and export from programs. Orientations on a per-particle bases are currently stored as Euler angles in ZYZ format. Note that the package is still under development and the exact way the data is represented might change in the future.
Data from the match template program
The match template program collates statistics from a large number of cross-correlograms taken over an orientation and defocus search space.
See the API on the MatchTemplateResult
object for further information on how these data are stored, but here we provide an overview of what files get written to disk.
Best statistic maps
Each of the "best" statistics is stored on a per-position basis in what we dub "statistics maps" saved as .mrc
files.
We have the following tracked statistics for each valid (x, y) position:
- Maximum Intensity Projection (MIP): Maximum cross-correlation value over all search orientations and relative defocus values.
- Scaled MIP (z-score or SNR): The MIP value normalized by the mean and variance of the cross-correlation over the entire search space.
- Correlation Mean: The mean of the cross-correlation values over the entire search space. Used to calculate the scaled MIP.
- Correlation Variance: The variance of the cross-correlation values over the entire search space. Used to calculate the scaled MIP.
- Phi: The angle which (in degrees) produced the MIP value.
- Theta: The angle (in degrees) which produced the MIP value.
- Psi: The angle (in degrees) which produced the MIP value.
- Defocus: The relative defocus value (in Angstroms) which produced the MIP value.
A note on correlation modes and output shapes
Three general modes for convolution/correlation exist in digital signal processing: "full", "same", and "valid". This chapter of Digital Signals Theory provides a good overview of these modes.
We support both the "same" and "valid" modes for cross-correlation. For an image with shape and template , the modes will output statistics maps with the following shapes:
- same:
- valid:
Note that same mode pads the image with zeros along the edges and does not increase the number of particles detectable; values along the padded portions of the edge do not hold significance in the context of the particle detection. In each case, the position in the map at corresponds to the top-left corner of the template at that position, not the center of the template. Below is an example comparing the two correlation modes:
Match template DataFrame
Since not all positions contain a particle from the match template search, using only the statistics maps can be inefficient in terms of size and speed.
The match template manager class has the method MatchTemplateManager.results_to_dataframe()
which automatically picks peaks within the scaled MIP map and stores the peak locations, orientations, and defocus values in a pandas DataFrame.
Additional columns besides locations and orientations are included in the DataFrame to increase the utility of the data, namely the construction of ParticleStack
objects.
The columns and corresponding descriptions are as follows:
Column Name | Type | Description |
---|---|---|
mip |
float | Maximum cross-correlation value over all search orientations and relative defocus values. |
scaled_mip |
float | Scaled MIP value (z-score) normalized by cross-correlation mean and variance. |
correlation_mean |
float | Mean of the cross-correlation values over the entire search space. |
correlation_variance |
float | Variance of the cross-correlation values over the entire search space. |
total_correlations |
int | Total number of cross-correlations taken over the search space (defoci times orientations). |
pos_x |
int | Particle x position (units of pixels) in the statistics maps. Corresponds to the top-left corner of the template. |
pos_y |
int | Particle y position (units of pixels) in the statistics maps. Corresponds to the top-left corner of the template. |
pos_x_img |
int | Center of of the particle (x position, units of pixels) in the micrograph. |
pos_y_img |
int | Center of of the particle (y position, units of pixels) in the micrograph. |
pos_x_img_angstrom |
float | Center of the particle (x position, in Angstroms) in the micrograph. |
pos_y_img_angstrom |
float | Center of the particle (y position, in Angstroms) in the micrograph. |
psi |
float | The angle (in degrees) which produced the MIP value. |
theta |
float | The angle (in degrees) which produced the MIP value. |
phi |
float | The angle which (in degrees) produced the MIP value. |
relative_defocus |
float | The relative defocus value (in Angstroms) which produced the MIP value. Relative to defocus_u and defocus_v . |
defocus_u |
float | Defocus value along the major axis for the micrograph (in Angstroms). |
defocus_v |
float | Defocus value along the minor axis for the micrograph (in Angstroms). |
astigmatism_angle |
float | Angle of the astigmatism (in degrees) for defocus. |
pixel_size |
float | Pixel size of the micrograph (in Angstroms). |
voltage |
float | Voltage of the microscope (in kV). |
spherical_aberration |
float | Spherical aberration of the microscope (in mm). |
amplitude_contrast_ratio |
float | Amplitude contrast ratio of the microscope. |
phase_shift |
float | Phase shift of the microscope (in degrees). |
ctf_B_factor |
float | B-factor of the CTF, in Angstroms^2. |
micrograph_path |
str | Path to the original micrograph. |
template_path |
str | Path to the template used for the search. |
mip_path |
str | Path to the saved MIP map. |
scaled_mip_path |
str | Path to the saved scaled MIP map. |
psi_path |
str | Path to the saved psi map. |
theta_path |
str | Path to the saved theta map. |
phi_path |
str | Path to the saved phi map. |
defocus_path |
str | Path to the saved defocus map. |
correlation_average_path |
str | Path to the saved correlation mean map. |
correlation_variance_path |
str | Path to the saved correlation variance map. |
Data from the refine template program
The refine template program takes in a set of particles and refined their orientations and relative defocus values.
Refine template DataFrame
The program outputs another DataFrame with additional columns for the refined orientations, defocus values, and positions. New columns are listed below:
Column Name | Type | Description |
---|---|---|
refined_mip |
float | New maximum cross-correlation over refinement search space. |
refined_scaled_mip |
float | New scaled MIP value (z-score) normalized by cross-correlation mean and variance. |
refined_pos_x |
int | The refined x position of the particle, top-left corner of the template. |
refined_pos_y |
int | The refined y position of the particle, top-left corner of the template. |
refined_pos_x_img |
int | The refined x position of the particle, center of the particle in the micrograph. |
refined_pos_y_img |
int | The refined y position of the particle, center of the particle in the micrograph. |
refined_pos_x_img_angstrom |
float | The refined x position of the particle, center of the particle in the micrograph (in Angstroms). |
refined_pos_y_img_angstrom |
float | The refined y position of the particle, center of the particle in the micrograph (in Angstroms). |
refined_psi |
float | The refined angle (in degrees). |
refined_theta |
float | The refined angle (in degrees). |
refined_phi |
float | The refined angle (in degrees). |
refined_relative_defocus |
float | The refined relative defocus value (in Angstroms). |