Modular workflow

General info

Normal ShapeMapper execution automatically performs a series of analysis steps internally. This is usually the most convenient option for general users. However, some users may wish to perform individual analyses semi-manually in isolation, allowing sequence alignment with a different aligner, or the introduction of specialized filtering stages, or the inclusion of mutation processing steps within a larger workflow.

Under the hood, ShapeMapper is fairly modular. This page attempts to summarize the main steps power users will be most interested in. Core shapemapper binary executables and scripts are located in internals/bin. Python scripts are designed for python>=3.5. Thirdparty executables are in various locations within internals/thirdparty.

To temporarily put bundled binaries and thirdparty executables into the active shell PATH, run source internals/paths/bin_paths.sh. This will enable, for example, simply typing make_reactivity_profiles.py instead of internals/thirdparty/miniconda/bin/python3 internals/bin/make_reactivity_profiles.py.

Reverse engineering

To aid in understanding the various files, executables, and commandline parameters used in a run, we recommend executing run_example_modular.sh, or running shapemapper on a small dataset with the following additional parameters: --serial --verbose --render-flowchart --output-processed-reads --output-aligned-reads --output-parsed-mutations --output-counted-mutations

Examine the log file to see the exact commands used by shapemapper to run each module, and examine primary intermediate and output files in the shapemapper_out folder. Additional intermediate files will be present within the shapemapper_temp folder. A workflow graphic will be written to shapemapper_out/*flowchart.svg; this can be inspected in software such as Inkscape.

Main steps

1. Alignment

ShapeMapper performs sequence alignment using Bowtie2 by default, or optionally STAR.

Running an alignment typically consists of two separate commands: building an index, then aligning reads using the index. There are many tutorials online on how to perform these steps.

See Alignment to reference sequences for an overview geared toward SHAPE-MaP and see Aligner parameters for commandline details.

Important notes:

  • Usage of a gapped aligner is highly recommended, since sequence deletions are a large component of the MaP signal (at least for backbone adducts read out with the SuperScript II reverse transcriptase under relaxed-fidelity conditions). See Fig. 1C in Busan and Weeks, 2018.
  • Alignments must be in SAM format for processing with ShapeMapper.
  • ShapeMapper requires the MD field to be present in each SAM read
  • Paired reads (if present) must be located on adjacent lines for correct handling.
  • Sorting reads by mapped location is not required.

2. Alignment parsing and mutation processing

This stage parses a single .sam alignment file as input and produces a single .mut file as output containing read mapping information and processed mutations for each read passing quality filters.

To run this module in isolation, use internals/bin/shapemapper_mutation_parser. Run with no arguments for commandline help.

See Parsed mutations for output file format.

Analyses performed by shapemapper_mutation_parser are documented in Analysis steps:

3. Mutation counting

This stage takes a single .mut parsed mutations file as input and outputs a table of counted mutations and read depths.

Use shapemapper_mutation_counter to run this module. Run with no arguments for commandline help.

See Mutation counts for output file format.

4. Reactivity profile calculation

This stage takes mutation counts and read depth tables from one, two, or three samples (modified, untreated, denatured control), computes an overall reactivity profile, normalizes that profile, generates output figures, and performs quality control checks.

Compute profile

To calculate a reactivity profile from mutation counts tables, run make_reactivity_profiles.py. Run with --help for usage. This script produces a single reactivity profile table as output.

See column descriptions. Also see Calculation of mutation rates and Reactivity profile calculation.

Normalize profile

To normalize a reactivity profile, run normalize_profiles.py. For most situations, provide a single reactivity profile table as input with --tonorm (the other arguments can be safely ignored). The file will be overwritten with added columns for normalized reactivity and reactivity stderr.

See Reactivity profile normalization.

Convert file formats

To convert a reactivity profile into various output formats convenient for downstream software, run tab_to_shape.py. See File formats.

Render figures

To perform quality control checks and render summary figures, run render_figures.py.

See Quality control checks.

    

← back to README