Introduction to 2DTM with Leopard-EM

This introductory tutorial goes through the basics of running the match_template and refine_template programs using a 60S large ribosomal subunit (LSU) template with a micrograph of a yeast lamella. Here, we focus on understanding the basics of 2DTM which serves as the foundation for building more complex cryo-EM template matching workflows.

What's covered in this tutorial

In this tutorial we will cover:

Simulate reference volume from PDB files - We download a prepared 60S LSU structure and demonstrate how to simulate realistic cryo-EM reference volumes. These steps are similar for any other PDB structure.
Configure template matching runs - How to define orientation and defocus search parameters as well as running the match_template program.
Optimize 2DTM parameters (pixel size) - By using initial results, we can optimize parameters for later 2DTM searches to increase SNR and identify more particles.
Running full 2DTM with Leopard-EM - Initial Particle locations and orientations are found using the match_template program.
Refining particle parameters post-search - Particle orientation and defocus values (initially identified with match_template) are locally refined using the refine_template program.

Code copy button (top-left)

This tutorial includes a lot of code snippets. There is a copy button in the top-left corner of each code block which will copy the entire code block to your clipboard.

Data and computation pre-requisites

Computational resources

The match_template program requires a GPU for accelerated computation, and Leopard-EM only supports Linux operating systems. This tutorial also assumes you have Leopard-EM installed on your system. Please refer to the installation instructions for more further details, but Leopard-EM should be installable via:

pip install leopard-em

Data

Data used in this tutorial are hosted in this Zenodo record. Download the following files into some project directory on your machine:

60S_aligned.pdb - LSU structure processed from PDB 6Q8Y.
xenon_131_000_0.0_DWS.mrc - Full micrograph (4k by 4k) to search over
xenon_131_000_0.0_DWS_cropped_4.mrc - Cropped portion of micrograph (1k by 1k) to run optimizations on.

This can be accomplished via the following command

zenodo_get https://zenodo.org/records/16423583 --glob 60S_aligned.pdb --glob xenon_131_000_0.0_DWS.mrc --glob xenon_131_000_0.0_DWS_cropped_4.mrc

Your project directory should now look like this:

leopardEM_intro/
├── 60S_aligned.pdb
├── xenon_131_000_0.0_DWS_cropped_4.mrc
└── xenon_131_000_0.0_DWS.mrc

Inspecting the micrograph

The micrograph of yeast cytoplasm contains ribosomes in various orientations and has typical features of cryo-EM data. Notably, the edge of the lamella is visible in the top right corner of the micrograph which we will revisit later in this tutorial to make sure this artifact does not cause spurious detections.

Yeast lamella micrograph

Step 1: Reference template simulation

Our objective is to generate a realistic 3D cyro-EM density map from the provided LSU PDB structure. This volume is needed for the 2D template match process.

Template simulation process

We will use the TeamTomo package ttsim3d to simulate realistic cryo-EM conditions including:

Electron dose weighting - accounts for radiation damage during exposure
Modulation Transfer Function (MTF) - simulates detector response characteristics
B-factor scaling - accounts for structural flexibility and motion blur

This simulation transforms our atomic PDB model into a realistic density map that matches what we expect to see in actual cryo-EM data.

Python script for template simulation

Create a new Python script simulate_template.py in your project directory with the following contents:

from ttsim3d.models import Simulator, SimulatorConfig

# Instantiate the configuration object
sim_conf = SimulatorConfig(
    voltage=300.0,  # in keV
    apply_dose_weighting=True,
    dose_start=0.0,  # in e-/A^2
    dose_end=50.0,  # in e-/A^2
    dose_filter_modify_signal="rel_diff",
    upsampling=-1,  # auto
    mtf_reference="falcon4EC_300kv",
)

# Instantiate the simulator
sim = Simulator(
    pdb_filepath="60S_aligned.pdb",
    pixel_spacing=0.95,  # Angstroms
    volume_shape=(512, 512, 512),
    center_atoms=True,
    remove_hydrogens=True,
    b_factor_scaling=0.5,
    additional_b_factor=0,
    simulator_config=sim_conf,
)

# Run the simulation and save to disk
mrc_filepath = "60S_map_px0.95_bscale0.5_intro.mrc"
sim.export_to_mrc(mrc_filepath)

Then, run the Python script:

python simulate_template.py

A new file should appear in your project directory.

leopardEM_intro/
├── 60S_aligned.pdb
├── 60S_map_px0.95_bscale0.5_intro.mrc  <-- New
├── simulate_template.py
├── xenon_131_000_0.0_DWS_cropped_4.mrc
└── xenon_131_000_0.0_DWS.mrc

Step 2: Initial template matching process

During the initial template matching process, we begin by performing a test run on a smaller region of the image rather than the entire micrograph. This approach helps us validate that our program is functioning as expected and the high-confidence peaks identified in this initial search can be used for further optimization, such as pixel size, before processing many micrographs.

Cropping the micrograph to its central region and reducing its size by a factor of four in each dimension (4,096 --> 1,024 pixels) makes this initial search much faster thus making the optimization and debugging steps much faster.

(Optional) Crop the micrograph to smaller size

The cropped micrograph xenon_131_000_0.0_DWS_cropped_4.mrc should already be downloaded in your project directory, but for completeness, here is a Python script that performs the cropping:

Micrograph cropping script

import mrcfile
import numpy as np

# Crop center of micrograph by division factor
INPUT_FILE = "xenon_131_000_0.0_DWS.mrc"
DIVISION_FACTOR = 4

if __name__ == "__main__":
    with mrcfile.open(INPUT_FILE, permissive=True) as mrc:
        data = mrc.data[0] if len(mrc.data.shape) > 2 else mrc.data

        # Crop square in center of image
        h, w = data.shape
        new_h, new_w = h // DIVISION_FACTOR, w // DIVISION_FACTOR
        start_h, start_w = h // 2 - new_h // 2, w // 2 - new_w // 2

        cropped = data[start_h:start_h + new_h, start_w:start_w + new_w]

    # Save cropped image
    output_file = INPUT_FILE.replace(".mrc", f"_cropped_{DIVISION_FACTOR}.mrc")
    with mrcfile.new(output_file, overwrite=True) as mrc_new:
        mrc_new.set_data(cropped.astype(np.float32))

    print(f"Cropped {data.shape} to {cropped.shape} -> {output_file}")

Configure the template matching run

Now we'll set up template matching run on the cropped image. We use a YAML configuration file to organize all the parameters in a clean, readable format rather than specifying them directly in Python code or passing them as command line arguments. Some benefits of using configuration files over large numbers of command line arguments are:

Reproducibility - exact parameters are saved and can be shared
Clarity - all settings are visible and easy to modify
Organization - complex parameter sets are well-structured

Example YAML configurations for all programs can be found under the programs/ directory on the GitHub page, and detailed documentation for each program is available on the Program Documentation Overview page. The template matching configuration for this tutorial is shown below; copy its contents into a new file match_template_config_crop_intro.yaml.

CTF estimations for match template

Defocus estimations per micrograph are necessary for running 2DTM, and we have previously run CTFFIND5 to obtain the optics_group.astigmatism_angle, optics_group.defocus_u, and optics_group.defocus_v parameters. Using a different micrograph with this tutorial will require you to replace these values (along with any other optics changes) in the configuration file.

micrograph_path: xenon_131_000_0.0_DWS_cropped_4.mrc
template_volume_path: 60S_map_px0.95_bscale0.5_intro.mrc
computational_config:
  gpu_ids: "all"  # Use all available GPUs
  num_cpus: 4     # 4 CPUs per GPU
defocus_search_config:
  enabled: true
  defocus_max: 1200.0
  defocus_min: -1200.0
  defocus_step: 200.0
match_template_result:
  allow_file_overwrite: true
  correlation_average_path:  cropped_correlation_average_intro.mrc
  correlation_variance_path: cropped_correlation_variance_intro.mrc
  mip_path:                  cropped_mip_intro.mrc
  orientation_phi_path:      cropped_orientation_phi_intro.mrc
  orientation_psi_path:      cropped_orientation_psi_intro.mrc
  orientation_theta_path:    cropped_orientation_theta_intro.mrc
  relative_defocus_path:     cropped_relative_defocus_intro.mrc
  scaled_mip_path:           cropped_scaled_mip_intro.mrc
optics_group:
  label: micrograph_1
  amplitude_contrast_ratio: 0.07
  ctf_B_factor: 0.0
  astigmatism_angle: 39.417260
  defocus_u: 5978.758301
  defocus_v: 5617.462402
  phase_shift: 0.0
  pixel_size: 0.95
  spherical_aberration: 2.7
  voltage: 300.0
orientation_search_config:
  base_grid_method: uniform
  psi_step: 1.5    # in degrees
  theta_step: 2.5  # in degrees
preprocessing_filters:
  whitening_filter:
    enabled: true
    do_power_spectrum: true
    max_freq: 1.0

In terms of configuration, that's it! We can move onto running the match_template program.

Running template matching

Now, we move onto executing the template matching program based on the blueprint run_match_template.py script. Create a new file run_match_template.py in your project directory with the following contents:

from leopard_em.pydantic_models.managers import MatchTemplateManager

YAML_CONFIG_PATH = "match_template_config_crop_intro.yaml"
ORIENTATION_BATCH_SIZE = 8  # larger values may run faster


def main():
    """Main function to run the match template program."""
    mt_manager = MatchTemplateManager.from_yaml(YAML_CONFIG_PATH)
    mt_manager.run_match_template(ORIENTATION_BATCH_SIZE)
    df = mt_manager.results_to_dataframe(locate_peaks_kwargs={"false_positives": 1.0})
    df.to_csv("results_match_template_crop_intro.csv")


# NOTE: invoking from `if __name__ == "__main__"` is necessary
# for proper multiprocessing/GPU-distribution behavior
if __name__ == "__main__":
    main()

Now, run the script:

python run_match_template.py

After the program completes, a new CSV file along with several output MRC files should appear in the results/ directory. Looking at the results CSV file, we've found 14 peaks above the statistical threshold. While this is a modest number for the cropped region, these high-confidence detections provide valuable data for optimizing our template parameters.

leopardEM_intro/
├── 60S_aligned.pdb
├── 60S_map_px0.95_bscale0.5_intro.mrc
├── cropped_correlation_average_intro.mrc   <-- New
├── cropped_correlation_variance_intro.mrc  <-- New
├── cropped_mip_intro.mrc                   <-- New
├── cropped_orientation_phi_intro.mrc       <-- New
├── cropped_orientation_psi_intro.mrc       <-- New
├── cropped_orientation_theta_intro.mrc     <-- New
├── cropped_relative_defocus_intro.mrc      <-- New
├── cropped_scaled_mip_intro.mrc            <-- New
├── match_template_config_crop_intro.yaml
├── results_match_template_crop_intro.csv   <-- New
├── run_match_template.py
├── simulate_template.py
├── xenon_131_000_0.0_DWS.mrc
└── xenon_131_000_0.0_DWS_cropped_4.mrc

Step 3: Template optimization

Using the initial template matching results, we optimize critical parameters to improve 2DTM detections and sensitivity. Even small errors in parameters can significantly reduce 2DTM sensitivity. The most critical parameter to optimize is the pixel size used for template simulation, and the optimize_template program iteratively refines the pixel size.

Template optimization good practices

The common sources of pixel size errors are:

PDB models built with incorrect pixel size calibration (±1-3% for cryo-EM maps)
Inaccurate microscope magnification calibration

Even small misalignments in the relative pixel size between the micrograph and model structure used for 2DTM can cause ~10-20% decrease in sensitivity. We highly recommend re-running this pixel size optimization procedure for each new combination of template structure and dataset. Who knows, you may find more peaks with higher confidences!

Optimize template configuration

Let's create the optimization configuration (create optimize_template_config_intro.yaml in your project directory).

particle_stack:
  df_path: results_match_template_crop_intro.csv
  extracted_box_size: [528, 528]
  original_template_size: [512, 512]
pixel_size_coarse_search:
  enabled: true
  pixel_size_min: -0.05
  pixel_size_max: 0.05
  pixel_size_step: 0.01
pixel_size_fine_search:
  enabled: true
  pixel_size_min: -0.008
  pixel_size_max: 0.008
  pixel_size_step: 0.001
preprocessing_filters:
  whitening_filter:
    do_power_spectrum: true
    enabled: true
    max_freq: 1.0
    num_freq_bins: null
computational_config:
  gpu_ids: 0
  num_cpus: 2
simulator:
  simulator_config:
    voltage: 300.0
    apply_dose_weighting: true
    dose_start: 0.0
    dose_end: 50.0
    dose_filter_modify_signal: "rel_diff"
    upsampling: -1  
    mtf_reference: "falcon4EC_300kv"
  pdb_filepath: "60S_aligned.pdb"
  volume_shape: [512, 512, 512]
  b_factor_scaling: 0.5
  additional_b_factor: 0
  pixel_spacing: 0.95
  center_atoms: true
  remove_hydrogens: true

Running the optimize template program

The optimize_template program is executed in much the same way as the match_template program. Create a new Python script run_optimize_template.py with the following contents

from leopard_em.pydantic_models.managers import OptimizeTemplateManager

OPTIMIZE_YAML_PATH = "optimize_template_config_intro.yaml"


def main() -> None:
    """Main function to run the optimize template program."""
    otm = OptimizeTemplateManager.from_yaml(OPTIMIZE_YAML_PATH)
    otm.run_optimize_template(
        output_text_path="optimize_template_results_intro.txt"
    )


if __name__ == "__main__":
    main()

Then again run the script.

python run_optimize_template.py

Optimization results

Inspecting the new results file optimize_template_results_intro.txt we see the pixel size optimization determined that 0.936 Å/pixel provides the best match to our experimental data (results may vary by ±0.002 Å). This represents a ~1.5% correction from our initial estimate of 0.95 Å/pixel - a small but important improvement.

Re-generating Optimized Templates

Now we'll regenerate our 60S template using the optimized pixel size. This updated template should provide better correlation scores and more reliable particle detection in the full template matching run.

Create a new file simulate_optimized_template.py and run it like before.

# Re-simulate 60S map with optimized pixel size
from ttsim3d.models import Simulator, SimulatorConfig

# Instantiate the configuration object
sim_conf = SimulatorConfig(
    voltage=300.0,  # in keV
    apply_dose_weighting=True,
    dose_start=0.0,  # in e-/A^2
    dose_end=50.0,  # in e-/A^2
    dose_filter_modify_signal="rel_diff",
    upsampling=-1,  # auto
    mtf_reference="falcon4EC_300kv",
)

# Instantiate the simulator for 60S
sim = Simulator(
    pdb_filepath="60S_aligned.pdb",
    pixel_spacing=0.936,  # Optimized (Angstroms)
    volume_shape=(512, 512, 512),
    center_atoms=False,
    remove_hydrogens=True,
    b_factor_scaling=0.5,
    additional_b_factor=0,
    simulator_config=sim_conf,
)

# Run the simulation
mrc_filepath = "60S_map_px0.936_bscale0.5_intro.mrc"
sim.export_to_mrc(mrc_filepath)

We will now use the new 60S_map_px0.936_bscale0.5_intro.mrc template in the full template matching process.

Step 4: Full template matching run

With our optimized 60S template in hand, we're ready to move on to running template matching on the entire micrograph to locate and orient 60S ribosomes within our image. The full search is computationally intensive taking ~8 hours on an RTX A6000 ada GPU and parallelizes across multi-GPU systems.

Full match template configuration

The full match template configuration is is nearly identical to the previous cropped configuration, but this time we point it towards the full 4k by 4k micrograph. Create a new file match_template_config_60S_intro.yaml with the following contents.

micrograph_path: xenon_131_000_0.0_DWS.mrc
template_volume_path: 60S_map_px0.936_bscale0.5_intro.mrc
computational_config:
  gpu_ids: "all"  # Use all available GPUs
  num_cpus: 4     # 4 CPUs per GPU
defocus_search_config:
  enabled: true
  defocus_max: 1200.0
  defocus_min: -1200.0
  defocus_step: 200.0
match_template_result:
  allow_file_overwrite: true
  correlation_average_path:  output_correlation_average_intro.mrc
  correlation_variance_path: output_correlation_variance_intro.mrc
  mip_path:                  output_mip_intro.mrc
  orientation_phi_path:      output_orientation_phi_intro.mrc
  orientation_psi_path:      output_orientation_psi_intro.mrc
  orientation_theta_path:    output_orientation_theta_intro.mrc
  relative_defocus_path:     output_relative_defocus_intro.mrc
  scaled_mip_path:           output_scaled_mip_intro.mrc
optics_group:
  label: micrograph_1
  amplitude_contrast_ratio: 0.07
  ctf_B_factor: 0.0
  astigmatism_angle: 39.417260
  defocus_u: 5978.758301
  defocus_v: 5617.462402
  phase_shift: 0.0
  pixel_size: 0.936  # Optimized pixel size
  spherical_aberration: 2.7
  voltage: 300.0
orientation_search_config:
  base_grid_method: uniform
  psi_step: 1.5    # in degrees
  theta_step: 2.5  # in degrees
preprocessing_filters:
  whitening_filter:
    enabled: true
    do_power_spectrum: true
    max_freq: 1.0

Running the search

We will use the same run_match_template.py file, but we need to update which YAML configuration file the script reads and the output CSV file. Change the following lines in the script

YAML_CONFIG_PATH = "match_template_config_60S_intro.yaml"  # line 3
...
df.to_csv("results/results_match_template_60S_intro.csv")  # line 12

and run the Python script again

python run_match_template.py

Encountering CUDA Out of Memory Error

On GPUs with lower amounts of VRAM, you may encounter an out-of-memory error. Decrease the batch size to a lower value (e.g., ORIENTATION_BATCH_SIZE = 4) and try re-running the script.

Results

Excellent! Once the template matching process finishes, there should be the following new files in the project directory.

leopardEM_intro/
60S_aligned.pdb
...
output_correlation_average_intro.mrc    <-- New
output_correlation_variance_intro.mrc   <-- New
output_mip_intro.mrc                    <-- New
output_orientation_phi_intro.mrc        <-- New
output_orientation_psi_intro.mrc        <-- New
output_orientation_theta_intro.mrc      <-- New
output_relative_defocus_intro.mrc       <-- New
output_scaled_mip_intro.mrc             <-- New
results_match_template_60S_intro.csv    <-- New
results_match_template_crop_intro.csv
run_match_template.py
...
xenon_131_000_0.0_DWS.mrc

Their exact contents and visualizing the results are discussed elsewhere in the documentation, and what we're most interested in is the results_match_template_60S_intro.csv file which contains all the information about our 407 identified peaks above the statistical threshold.

Step 5: Data inspection step

Re-referencing the micrograph, we can clearly see the dark lamella edge is visible in the top-right corner. It is good practice to make sure artifacts in your micrographs are not causing spurious detections and apply a filtering strategy if necessary. In this case, we don't see any ribosome detections within the dark region or along its boundary.

Distribution of identified peaks in the micrograph

Since we are confident in our detections, we can move onto the template refinement process.

Python script for generating above plot images

The following is a very simple plotting script using matplotlib to visualize the distribution of particles. Note, you may need to install matplotlib via pip install matplotlib before running the script.

import mrcfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load data
img = mrcfile.read("xenon_131_000_0.0_DWS.mrc")
df = pd.read_csv("results_match_template_60S_intro.csv")

# Plot 1: Only unfiltered data
plt.figure(figsize=(8, 8))
plt.imshow(img, cmap="gray")
plt.scatter(
    df["pos_x_img"], df["pos_y_img"], c="red", s=30, label="All", alpha=0.4, marker="x"
)
plt.axis("off")
plt.legend()
plt.savefig("match_template_intro_all_locations.png", dpi=200, bbox_inches="tight")
plt.close()

print("Plots saved successfully!")

Our objective now is to improve the accuracy of particle location and orientation estimates through local refinement. This is accomplished using the refine_template program which performs a local search around the initial estimates from match_template. Key difference between match_template and refine_template are:

match_template performs a global search over the entire micrograph and orientation space which is computationally expensive.
refine_template performs a local search around known particles with finer angular and defocus sampling. Local searches are much faster.

At its most basic, the refinement configuration tells the program where particles are located in the micrograph, how to extract their particle images, and what orientation and defocus values to sample locally. Our original template volume produces projections of 512 by 512 pixels, so we'll set the extracted box size to 518 by 518 pixels to provide some small margin of movement during refinement. Generally, the extracted box size should be 4-24 pixels larger than the template projection size.

Create a new file refine_template_config_60S_intro.yaml with the following contents.

template_volume_path: 60S_map_px0.936_bscale0.5_intro.mrc
particle_stack:
  df_path: results_match_template_60S_intro.csv
  extracted_box_size: [518, 518]
  original_template_size: [512, 512]
defocus_refinement_config:
  enabled: true

  defocus_max:  100.0  # in Angstroms, relative to "best" particle defocus value
  defocus_min: -100.0  # in Angstroms, relative to "best" particle defocus value
  defocus_step: 20.0   # in Angstroms
orientation_refinement_config:
  enabled: true
  psi_step_coarse:     1.5  # in degrees
  psi_step_fine:       0.1  # in degrees
  theta_step_coarse:   2.5  # in degrees
  theta_step_fine:     0.1  # in degrees
pixel_size_refinement_config:
  enabled: false
preprocessing_filters:
  whitening_filter:
    do_power_spectrum: true
    enabled: true
    max_freq: 1.0
    num_freq_bins: null
computational_config:
  gpu_ids: "all"
  num_cpus: 1  # 1 CPU per GPU

The refinement program is executed in a similar manner to the previous programs. Again, we use the included blueprint script run_refine_template.py from the GitHub repository. Create a new Python script run_refine_template.py with the following contents:

from leopard_em.pydantic_models.managers import RefineTemplateManager

YAML_CONFIG_PATH = "refine_template_config_60S_intro.yaml"
DATAFRAME_OUTPUT_PATH = "results_refine_template_60S_intro.csv"
PARTICLE_BATCH_SIZE = 64  # Tune this values based on GPU memory

def main() -> None:
    rt_manager = RefineTemplateManager.from_yaml(YAML_CONFIG_PATH)
    rt_manager.run_refine_template(DATAFRAME_OUTPUT_PATH, PARTICLE_BATCH_SIZE)

if __name__ == "__main__":
    main()

Then run the script:

python run_refine_template.py

We now have a set of particle locations and orientations with improved orientation and defocus estimates along with improved z-scores. These results are saved in an updated CSV file results_refine_template_60S_intro.csv.

The introductory tutorial is now complete, and you should now have solid understanding of the standard 2DTM workflow using Leopard-EM. Feel free to explore other parts of the documentation to learn more about how you can use these refined results to visualize particles, such as with Visualizing 2DTM Results, or the exact data formats which Leopard-EM uses.

Tutorial summary

Throughout this tutorial, we've covered the essential steps in the "standard 2DTM workflow" by identifying 60S ribosomes using Leopard-EM. You've learned how to

Simulate reference volumes from PDB structures using ttsim3d.
Configure and run 2DTM searches with the match_template program in Leopard-EM.
(once per dataset) Optimize the pixel size using the optimize_template program.
Perform a full 2DTM search on a micrograph to identify particle locations and orientations.
Inspect results on top of the micrograph to ensure they make contextual sense (no spurious detections).
Refine particle parameters using the refine_template program for improved accuracy.