Multiple Sequence Alignment Utilities

Functions to extract features from and to manipulate multiple sequence alignments (MSA).

Defines

VRNA_ALN_DEFAULT
#include <ViennaRNA/utils/alignments.h>

Use default alignment settings.

VRNA_ALN_RNA
#include <ViennaRNA/utils/alignments.h>

Convert to RNA alphabet.

VRNA_ALN_DNA
#include <ViennaRNA/utils/alignments.h>

Convert to DNA alphabet.

VRNA_ALN_UPPERCASE
#include <ViennaRNA/utils/alignments.h>

Convert to uppercase nucleotide letters.

VRNA_ALN_LOWERCASE
#include <ViennaRNA/utils/alignments.h>

Convert to lowercase nucleotide letters.

VRNA_MEASURE_SHANNON_ENTROPY
#include <ViennaRNA/utils/alignments.h>

Flag indicating Shannon Entropy measure.

Shannon Entropy is defined as \( H = - \sum_c p_c \cdot \log_2 p_c \)

Typedefs

typedef struct vrna_pinfo_s vrna_pinfo_t
#include <ViennaRNA/utils/alignments.h>

Typename for the base pair info repesenting data structure vrna_pinfo_s.

Functions

int vrna_aln_mpi(const char **alignment)
#include <ViennaRNA/utils/alignments.h>

Get the mean pairwise identity in steps from ?to?(ident)

SWIG Wrapper Notes:

This function is available as function aln_mpi(). See e.g. RNA.aln_mpi() in the Python API.

Parameters
  • alignment – Aligned sequences

Returns

The mean pairwise identity

vrna_pinfo_t *vrna_aln_pinfo(vrna_fold_compound_t *fc, const char *structure, double threshold)
#include <ViennaRNA/utils/alignments.h>

Retrieve an array of vrna_pinfo_t structures from precomputed pair probabilities.

This array of structures contains information about positionwise pair probabilies, base pair entropy and more

See also

vrna_pinfo_t, and vrna_pf()

Parameters
  • fc – The vrna_fold_compound_t of type VRNA_FC_TYPE_COMPARATIVE with precomputed partition function matrices

  • structure – An optional structure in dot-bracket notation (Maybe NULL)

  • threshold – Do not include results with pair probabilities below threshold

Returns

The vrna_pinfo_t array

int *vrna_aln_pscore(const char **alignment, vrna_md_t *md)
#include <ViennaRNA/utils/alignments.h>

SWIG Wrapper Notes:

This function is available as overloaded function aln_pscore() where the last parameter may be omitted, indicating md = NULL. See e.g. RNA.aln_pscore() in the Python API.

int vrna_pscore(vrna_fold_compound_t *fc, unsigned int i, unsigned int j)
#include <ViennaRNA/utils/alignments.h>
int vrna_pscore_freq(vrna_fold_compound_t *fc, const unsigned int *frequencies, unsigned int pairs)
#include <ViennaRNA/utils/alignments.h>
char **vrna_aln_slice(const char **alignment, unsigned int i, unsigned int j)
#include <ViennaRNA/utils/alignments.h>

Slice out a subalignment from a larger alignment.

See also

vrna_aln_free()

Note

The user is responsible to free the memory occupied by the returned subalignment

Parameters
  • alignment – The input alignment

  • i – The first column of the subalignment (1-based)

  • j – The last column of the subalignment (1-based)

Returns

The subalignment between column \(i\) and \(j\)

void vrna_aln_free(char **alignment)
#include <ViennaRNA/utils/alignments.h>

Free memory occupied by a set of aligned sequences.

Parameters
  • alignment – The input alignment

char **vrna_aln_uppercase(const char **alignment)
#include <ViennaRNA/utils/alignments.h>

Create a copy of an alignment with only uppercase letters in the sequences.

See also

vrna_aln_copy

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

Returns

A copy of the input alignment where lowercase sequence letters are replaced by uppercase letters

char **vrna_aln_toRNA(const char **alignment)
#include <ViennaRNA/utils/alignments.h>

Create a copy of an alignment where DNA alphabet is replaced by RNA alphabet.

See also

vrna_aln_copy

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

Returns

A copy of the input alignment where DNA alphabet is replaced by RNA alphabet (T -> U)

char **vrna_aln_copy(const char **alignment, unsigned int options)
#include <ViennaRNA/utils/alignments.h>

Make a copy of a multiple sequence alignment.

This function allows one to create a copy of a multiple sequence alignment. The options parameter additionally allows for sequence manipulation, such as converting DNA to RNA alphabet, and conversion to uppercase letters.

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

  • options – Option flags indicating whether the aligned sequences should be converted

Returns

A (manipulated) copy of the input alignment

float *vrna_aln_conservation_struct(const char **alignment, const char *structure, const vrna_md_t *md)
#include <ViennaRNA/utils/alignments.h>

Compute base pair conservation of a consensus structure.

This function computes the base pair conservation (fraction of canonical base pairs) of a consensus structure given a multiple sequence alignment. The base pair types that are considered canonical may be specified using the vrna_md_t.pair array. Passing NULL as parameter md results in default pairing rules, i.e. canonical Watson-Crick and GU Wobble pairs.

SWIG Wrapper Notes:

This function is available as overloaded function aln_conservation_struct() where the last parameter md may be omitted, indicating md = NULL. See, e.g. RNA.aln_conservation_struct() in the Python API.

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

  • structure – The consensus structure in dot-bracket notation

  • md – Model details that specify compatible base pairs (Maybe NULL)

Returns

A 1-based vector of base pair conservations

float *vrna_aln_conservation_col(const char **alignment, const vrna_md_t *md_p, unsigned int options)
#include <ViennaRNA/utils/alignments.h>

Compute nucleotide conservation in an alignment.

This function computes the conservation of nucleotides in alignment columns. The simples measure is Shannon Entropy and can be selected by passing the VRNA_MEASURE_SHANNON_ENTROPY flag in the options parameter.

SWIG Wrapper Notes:

This function is available as overloaded function aln_conservation_col() where the last two parameters may be omitted, indicating md = NULL, and options = VRNA_MEASURE_SHANNON_ENTROPY, respectively. See e.g. RNA.aln_conservation_col() in the Python API.

Note

Currently, only VRNA_MEASURE_SHANNON_ENTROPY is supported as conservation measure.

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

  • md – Model details that specify known nucleotides (Maybe NULL)

  • options – A flag indicating which measure of conservation should be applied

Returns

A 1-based vector of column conservations

char *vrna_aln_consensus_sequence(const char **alignment, const vrna_md_t *md_p)
#include <ViennaRNA/utils/alignments.h>

Compute the consensus sequence for a given multiple sequence alignment.

SWIG Wrapper Notes:

This function is available as overloaded function aln_consensus_sequence() where the last parameter may be omitted, indicating md = NULL. See e.g. RNA.aln_consensus_sequence() in the Python API.

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

  • md_p – Model details that specify known nucleotides (Maybe NULL)

Returns

The consensus sequence of the alignment, i.e. the most frequent nucleotide for each alignment column

char *vrna_aln_consensus_mis(const char **alignment, const vrna_md_t *md_p)
#include <ViennaRNA/utils/alignments.h>

Compute the Most Informative Sequence (MIS) for a given multiple sequence alignment.

The most informative sequence (MIS) [Freyhult et al., 2005] displays for each alignment column the nucleotides with frequency greater than the background frequency, projected into IUPAC notation. Columns where gaps are over-represented are in lower case.

SWIG Wrapper Notes:

This function is available as overloaded function aln_consensus_mis() where the last parameter may be omitted, indicating md = NULL. See e.g. RNA.aln_consensus_mis() in the Python API.

Parameters
  • alignment – The input sequence alignment (last entry must be NULL terminated)

  • md_p – Model details that specify known nucleotides (Maybe NULL)

Returns

The most informative sequence for the alignment

struct vrna_pinfo_s
#include <ViennaRNA/utils/alignments.h>

A base pair info structure.

For each base pair (i,j) with i,j in [0, n-1] the structure lists:

  • its probability ‘p’

  • an entropy-like measure for its well-definedness ‘ent’

  • the frequency of each type of pair in ‘bp[]’

    • ’bp[0]’ contains the number of non-compatible sequences

    • ’bp[1]’ the number of CG pairs, etc.

Public Members

unsigned i

nucleotide position i

unsigned j

nucleotide position j

float p

Probability.

float ent

Pseudo entropy for \( p(i,j) = S_i + S_j - p_ij*ln(p_ij) \).

short bp[8]

Frequencies of pair_types.

char comp

1 iff pair is in mfe structure