Extracting results from the template matching search¶
After running through a template matching process, we have a set of location, correlation, and orientation statistics that need parsed. For example, peaks need to be picked from the z-score map, and the corresponding best orientations and defoci at those locations need to be read.
While the necessary information can be grabbed from a configuration file and array-like objects, having a tabular data format is more flexible to work with.
The default match_template
program script does export a csv file with picked peaks by default, but you may want to go back and re-extract results in a different way.
This example goes through the very basics of loading a manager object from a YAML config and then extracting the peak information from the results into a pandas.DataFrame
object.
"""Extracting results to DataFrame."""
from leopard_em.pydantic_models.managers import MatchTemplateManager
Downloading example data¶
Run the following code cell to download pre-processed results into the current directory of the notebook. If you have an actual result already on your system, you can proceed with that configuration file
import os
import requests
def download_zenodo_file(url: str) -> str:
"""Downloads a zenodo file from the given URL. Returns the output filename."""
output_filename = url.split("/")[-1]
# Check if the file already exists
if os.path.exists(output_filename):
print(f"File {output_filename} already exists. Skipping download.")
return output_filename
response = requests.get(url, stream=True)
response.raise_for_status() # Check for request errors
with open(output_filename, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return output_filename
# NOTE: This may take a few seconds to few minutes, depending on internet connection
# fmt: off
file_urls = [
"https://zenodo.org/records/15426374/files/60S_map_px0.936_bscale0.5.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_mip.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_scaled_mip.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_orientation_phi.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_orientation_theta.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_orientation_psi.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_relative_defocus.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_correlation_average.mrc",
"https://zenodo.org/records/15426374/files/xenon_216_000_0_output_correlation_variance.mrc",
]
# fmt: on
for url in file_urls:
filename = download_zenodo_file(url)
print(f"Downloaded {filename}")
File 60S_map_px0.936_bscale0.5.mrc already exists. Skipping download. Downloaded 60S_map_px0.936_bscale0.5.mrc File xenon_216_000_0_output_mip.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_mip.mrc File xenon_216_000_0_output_scaled_mip.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_scaled_mip.mrc File xenon_216_000_0_output_orientation_phi.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_orientation_phi.mrc File xenon_216_000_0_output_orientation_theta.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_orientation_theta.mrc File xenon_216_000_0_output_orientation_psi.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_orientation_psi.mrc File xenon_216_000_0_output_relative_defocus.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_relative_defocus.mrc File xenon_216_000_0_output_correlation_average.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_correlation_average.mrc File xenon_216_000_0_output_correlation_variance.mrc already exists. Skipping download. Downloaded xenon_216_000_0_output_correlation_variance.mrc
Loading a configuration¶
Here, we use an example configuration file to demonstrate how to extract the relevant information.
If you've already run template matching, then this information should already be present in the MatchTemplateManager
instance.
Or you can load your actual configuration file.
# Update this path based on which match template config you want to use
yaml_path = "02_extract_peak_info_config.yaml"
# Instantiate the MatchTemplateManager from the config and get the result object
mt_manager = MatchTemplateManager.from_yaml(yaml_path)
mt_result = mt_manager.match_template_result
mt_result.load_tensors_from_paths() # Needed to load results into memory
# Manually set the number of correlations; used for z-score cutoff determination
# Is automatically calculated after an actual run
total_corr = int(13 * 1.59e6)
mt_result.total_projections = total_corr
Extracting minimal peak information¶
Minimal information at the peaks (location, orientation, peak height) can be extracted from the MatchTemplateResult
instance as either a dict
or pandas.DataFrame
. Here, we show how to extract the information as a pandas.DataFrame
.
mt_result.locate_peaks()
df_peaks = mt_result.peaks_to_dataframe()
# Print the columns in the DataFrame
print("Columns in the DataFrame:")
for col in df_peaks.columns:
print(f" {col}")
df_peaks
Columns in the DataFrame: pos_y pos_x mip scaled_mip psi theta phi relative_defocus correlation_mean correlation_variance total_correlations
pos_y | pos_x | mip | scaled_mip | psi | theta | phi | relative_defocus | correlation_mean | correlation_variance | total_correlations | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3336 | 3470 | 11.810101 | 12.104484 | 301.5 | 90.0 | 302.500000 | 200.0 | 0.025537 | 0.973570 | 20670000 |
1 | 1945 | 3322 | 11.348068 | 11.758850 | 118.5 | 137.5 | 11.134021 | 600.0 | 0.130799 | 0.953943 | 20670000 |
2 | 3353 | 1842 | 11.823074 | 11.681293 | 19.5 | 22.5 | 111.272728 | 800.0 | 0.062959 | 1.006748 | 20670000 |
3 | 3452 | 133 | 11.246000 | 11.189375 | 166.5 | 65.0 | 296.793884 | 0.0 | -0.028836 | 1.007638 | 20670000 |
4 | 3249 | 1811 | 11.394345 | 11.041210 | 243.0 | 130.0 | 52.363636 | 400.0 | -0.032740 | 1.034949 | 20670000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
193 | 1733 | 2254 | 7.664472 | 7.848475 | 315.0 | 90.0 | 350.000000 | -400.0 | 0.042711 | 0.971114 | 20670000 |
194 | 2300 | 1376 | 7.992585 | 7.818872 | 153.0 | 105.0 | 106.187050 | -200.0 | 0.133709 | 1.005116 | 20670000 |
195 | 1819 | 1945 | 7.570004 | 7.792142 | 45.0 | 107.5 | 328.467163 | 200.0 | 0.016312 | 0.969399 | 20670000 |
196 | 2183 | 2747 | 7.620340 | 7.788315 | 255.0 | 82.5 | 234.125870 | 400.0 | 0.082988 | 0.967777 | 20670000 |
197 | 897 | 2553 | 7.862420 | 7.782097 | 7.5 | 142.5 | 0.000000 | -1000.0 | 0.046521 | 1.004344 | 20670000 |
198 rows × 11 columns
Converting full result information to a DataFrame¶
The MatchTemplateManager
class also has a method, results_to_dataframe
, which populates a pandas.DataFrame
with all necessary information for (most) downstream processing.
The additional columns include paths to the result statistic maps, original micrograph, and the reference template used for the template matching process.
df_full = mt_manager.results_to_dataframe()
# Print the columns in the DataFrame
print("Columns in the DataFrame:")
for col in df_full.columns:
print(f" {col}")
df_full
Columns in the DataFrame: particle_index mip scaled_mip correlation_mean correlation_variance total_correlations pos_x pos_y pos_x_img pos_y_img pos_x_img_angstrom pos_y_img_angstrom phi theta psi relative_defocus defocus_u defocus_v astigmatism_angle pixel_size refined_pixel_size voltage spherical_aberration amplitude_contrast_ratio phase_shift ctf_B_factor micrograph_path template_path mip_path scaled_mip_path psi_path theta_path phi_path defocus_path correlation_average_path correlation_variance_path
particle_index | mip | scaled_mip | correlation_mean | correlation_variance | total_correlations | pos_x | pos_y | pos_x_img | pos_y_img | ... | micrograph_path | template_path | mip_path | scaled_mip_path | psi_path | theta_path | phi_path | defocus_path | correlation_average_path | correlation_variance_path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 11.810101 | 12.104484 | 0.025537 | 0.973570 | 20670000 | 3470 | 3336 | 3726 | 3592 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
1 | 1 | 11.348068 | 11.758850 | 0.130799 | 0.953943 | 20670000 | 3322 | 1945 | 3578 | 2201 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
2 | 2 | 11.823074 | 11.681293 | 0.062959 | 1.006748 | 20670000 | 1842 | 3353 | 2098 | 3609 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
3 | 3 | 11.246000 | 11.189375 | -0.028836 | 1.007638 | 20670000 | 133 | 3452 | 389 | 3708 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
4 | 4 | 11.394345 | 11.041210 | -0.032740 | 1.034949 | 20670000 | 1811 | 3249 | 2067 | 3505 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
193 | 193 | 7.664472 | 7.848475 | 0.042711 | 0.971114 | 20670000 | 2254 | 1733 | 2510 | 1989 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
194 | 194 | 7.992585 | 7.818872 | 0.133709 | 1.005116 | 20670000 | 1376 | 2300 | 1632 | 2556 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
195 | 195 | 7.570004 | 7.792142 | 0.016312 | 0.969399 | 20670000 | 1945 | 1819 | 2201 | 2075 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
196 | 196 | 7.620340 | 7.788315 | 0.082988 | 0.967777 | 20670000 | 2747 | 2183 | 3003 | 2439 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
197 | 197 | 7.862420 | 7.782097 | 0.046521 | 1.004344 | 20670000 | 2553 | 897 | 2809 | 1153 | ... | dummy_micrograph.mrc | 60S_map_px0.936_bscale0.5.mrc | xenon_216_000_0_output_mip.mrc | xenon_216_000_0_output_scaled_mip.mrc | xenon_216_000_0_output_orientation_psi.mrc | xenon_216_000_0_output_orientation_theta.mrc | xenon_216_000_0_output_orientation_phi.mrc | xenon_216_000_0_output_relative_defocus.mrc | xenon_216_000_0_output_correlation_average.mrc | xenon_216_000_0_output_correlation_variance.mrc |
198 rows × 36 columns