RNAlib-2.4.14
Multiple Sequence Alignments

Functions to read/write multiple sequence alignments (MSA) in various file formats. More...

Detailed Description

Functions to read/write multiple sequence alignments (MSA) in various file formats.

+ Collaboration diagram for Multiple Sequence Alignments:

Files

file  file_formats_msa.h
 Functions dealing with file formats for Multiple Sequence Alignments (MSA)
 

Macros

#define VRNA_FILE_FORMAT_MSA_CLUSTAL   1U
 Option flag indicating ClustalW formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_STOCKHOLM   2U
 Option flag indicating Stockholm 1.0 formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_FASTA   4U
 Option flag indicating FASTA (Pearson) formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_MAF   8U
 Option flag indicating MAF formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_MIS   16U
 Option flag indicating most informative sequence (MIS) output. More...
 
#define VRNA_FILE_FORMAT_MSA_DEFAULT
 Option flag indicating the set of default file formats. More...
 
#define VRNA_FILE_FORMAT_MSA_NOCHECK   4096U
 Option flag to disable validation of the alignment. More...
 
#define VRNA_FILE_FORMAT_MSA_UNKNOWN   8192U
 Return flag of vrna_file_msa_detect_format() to indicate unknown or malformatted alignment. More...
 
#define VRNA_FILE_FORMAT_MSA_APPEND   16384U
 Option flag indicating to append data to a multiple sequence alignment file rather than overwriting it. More...
 
#define VRNA_FILE_FORMAT_MSA_QUIET   32768U
 Option flag to suppress unnecessary spam messages on stderr More...
 
#define VRNA_FILE_FORMAT_MSA_SILENT   65536U
 Option flag to completely silence any warnings on stderr More...
 

Functions

int vrna_file_msa_read (const char *filename, char ***names, char ***aln, char **id, char **structure, unsigned int options)
 Read a multiple sequence alignment from file. More...
 
int vrna_file_msa_read_record (FILE *fp, char ***names, char ***aln, char **id, char **structure, unsigned int options)
 Read a multiple sequence alignment from file handle. More...
 
unsigned int vrna_file_msa_detect_format (const char *filename, unsigned int options)
 Detect the format of a multiple sequence alignment file. More...
 
int vrna_file_msa_write (const char *filename, const char **names, const char **aln, const char *id, const char *structure, const char *source, unsigned int options)
 Write multiple sequence alignment file. More...
 

Macro Definition Documentation

#define VRNA_FILE_FORMAT_MSA_CLUSTAL   1U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating ClustalW formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_STOCKHOLM   2U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating Stockholm 1.0 formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_FASTA   4U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating FASTA (Pearson) formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_MAF   8U
#define VRNA_FILE_FORMAT_MSA_MIS   16U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating most informative sequence (MIS) output.

The default reference sequence output for an alignment is simply a consensus sequence. This flag allows to write the most informative equence (MIS) instead.

See also
vrna_file_msa_write()
#define VRNA_FILE_FORMAT_MSA_DEFAULT

#include <ViennaRNA/io/file_formats_msa.h>

Value:
( \
)
#define VRNA_FILE_FORMAT_MSA_STOCKHOLM
Option flag indicating Stockholm 1.0 formatted files.
Definition: file_formats_msa.h:28
#define VRNA_FILE_FORMAT_MSA_CLUSTAL
Option flag indicating ClustalW formatted files.
Definition: file_formats_msa.h:22
#define VRNA_FILE_FORMAT_MSA_FASTA
Option flag indicating FASTA (Pearson) formatted files.
Definition: file_formats_msa.h:34
#define VRNA_FILE_FORMAT_MSA_MAF
Option flag indicating MAF formatted files.
Definition: file_formats_msa.h:40

Option flag indicating the set of default file formats.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_NOCHECK   4096U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag to disable validation of the alignment.

See also
vrna_file_msa_read(), vrna_file_msa_read_record()
#define VRNA_FILE_FORMAT_MSA_UNKNOWN   8192U

#include <ViennaRNA/io/file_formats_msa.h>

Return flag of vrna_file_msa_detect_format() to indicate unknown or malformatted alignment.

See also
vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_APPEND   16384U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating to append data to a multiple sequence alignment file rather than overwriting it.

See also
vrna_file_msa_write()
#define VRNA_FILE_FORMAT_MSA_QUIET   32768U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag to suppress unnecessary spam messages on stderr

See also
vrna_file_msa_read(), vrna_file_msa_read_record()
#define VRNA_FILE_FORMAT_MSA_SILENT   65536U

#include <ViennaRNA/io/file_formats_msa.h>

Option flag to completely silence any warnings on stderr

See also
vrna_file_msa_read(), vrna_file_msa_read_record()

Function Documentation

int vrna_file_msa_read ( const char *  filename,
char ***  names,
char ***  aln,
char **  id,
char **  structure,
unsigned int  options 
)

#include <ViennaRNA/io/file_formats_msa.h>

Read a multiple sequence alignment from file.

This function reads the (first) multiple sequence alignment from an input file. The read alignment is split into the sequence id/name part and the actual sequence information and stored in memory as arrays of ids/names and sequences. If the alignment file format allows for additional information, such as an ID of the entire alignment or consensus structure information, this data is retrieved as well and made available. The options parameter allows to specify the set of alignment file formats that should be used to retrieve the data. If 0 is passed as option, the list of alignment file formats defaults to VRNA_FILE_FORMAT_MSA_DEFAULT.

Currently, the list of parsable multiple sequence alignment file formats consists of:

Note
After successfully reading an alignment, this function performs a validation of the data that includes uniqueness of the sequence identifiers, and equal sequence lengths. This check can be deactivated by passing VRNA_FILE_FORMAT_MSA_NOCHECK in the options parameter.
It is the users responsibility to free any memory occupied by the output arguments names, aln, id, and structure after calling this function. The function automatically sets the latter two arguments to NULL in case no corresponding data could be retrieved from the input alignment.
See also
vrna_file_msa_read_record(), VRNA_FILE_FORMAT_MSA_CLUSTAL, VRNA_FILE_FORMAT_MSA_STOCKHOLM, VRNA_FILE_FORMAT_MSA_FASTA, VRNA_FILE_FORMAT_MSA_MAF, VRNA_FILE_FORMAT_MSA_DEFAULT, VRNA_FILE_FORMAT_MSA_NOCHECK
Parameters
filenameThe name of input file that contains the alignment
namesAn address to the pointer where sequence identifiers should be written to
alnAn address to the pointer where aligned sequences should be written to
idAn address to the pointer where the alignment ID should be written to (Maybe NULL)
structureAn address to the pointer where consensus structure information should be written to (Maybe NULL)
optionsOptions to manipulate the behavior of this function
Returns
The number of sequences in the alignment, or -1 if no alignment record could be found
SWIG Wrapper Notes:

In the target scripting language, only the first and last argument, filename and options, are passed to the corresponding function. The other arguments, which serve as output in the C-library, are available as additional return values. Hence, a function call in python may look like this:

1 num_seq, names, aln, id, structure = RNA.file_msa_read("msa.stk", RNA.FILE_FORMAT_MSA_STOCKHOLM)

After successfully reading the first record, the variable num_seq contains the number of sequences in the alignment (the actual return value of the C-function), while the variables names, aln, id, and structure are lists of the sequence names and aligned sequences, as well as strings holding the alignment ID and the structure as stated in the SS_cons line, respectively. Note, the last two return values may be empty strings in case the alignment does not provide the required data.

This function exists as an overloaded version where the options parameter may be omitted! In that case, the options parameter defaults to VRNA_FILE_FORMAT_MSA_STOCKHOLM.

int vrna_file_msa_read_record ( FILE *  fp,
char ***  names,
char ***  aln,
char **  id,
char **  structure,
unsigned int  options 
)

#include <ViennaRNA/io/file_formats_msa.h>

Read a multiple sequence alignment from file handle.

Similar to vrna_file_msa_read(), this function reads a multiple sequence alignment from an input file handle. Since using a file handle, this function is not limited to the first alignment record, but allows for looping over all alignments within the input.

The read alignment is split into the sequence id/name part and the actual sequence information and stored in memory as arrays of ids/names and sequences. If the alignment file format allows for additional information, such as an ID of the entire alignment or consensus structure information, this data is retrieved as well and made available. The options parameter allows to specify the alignment file format used to retrieve the data. A single format must be specified here, see vrna_file_msa_detect_format() for helping to determine the correct MSA file format.

Currently, the list of parsable multiple sequence alignment file formats consists of:

Note
After successfully reading an alignment, this function performs a validation of the data that includes uniqueness of the sequence identifiers, and equal sequence lengths. This check can be deactivated by passing VRNA_FILE_FORMAT_MSA_NOCHECK in the options parameter.
It is the users responsibility to free any memory occupied by the output arguments names, aln, id, and structure after calling this function. The function automatically sets the latter two arguments to NULL in case no corresponding data could be retrieved from the input alignment.
See also
vrna_file_msa_read(), vrna_file_msa_detect_format(), VRNA_FILE_FORMAT_MSA_CLUSTAL, VRNA_FILE_FORMAT_MSA_STOCKHOLM, VRNA_FILE_FORMAT_MSA_FASTA, VRNA_FILE_FORMAT_MSA_MAF, VRNA_FILE_FORMAT_MSA_DEFAULT, VRNA_FILE_FORMAT_MSA_NOCHECK
Parameters
fpThe file pointer the data will be retrieved from
namesAn address to the pointer where sequence identifiers should be written to
alnAn address to the pointer where aligned sequences should be written to
idAn address to the pointer where the alignment ID should be written to (Maybe NULL)
structureAn address to the pointer where consensus structure information should be written to (Maybe NULL)
optionsOptions to manipulate the behavior of this function
Returns
The number of sequences in the alignment, or -1 if no alignment record could be found
SWIG Wrapper Notes:

In the target scripting language, only the first and last argument, fp and options, are passed to the corresponding function. The other arguments, which serve as output in the C-library, are available as additional return values. Hence, a function call in python may look like this:

1 f = open('msa.stk', 'r')
2 num_seq, names, aln, id, structure = RNA.file_msa_read_record(f, RNA.FILE_FORMAT_MSA_STOCKHOLM)
3 f.close()

After successfully reading the first record, the variable num_seq contains the number of sequences in the alignment (the actual return value of the C-function), while the variables names, aln, id, and structure are lists of the sequence names and aligned sequences, as well as strings holding the alignment ID and the structure as stated in the SS_cons line, respectively. Note, the last two return values may be empty strings in case the alignment does not provide the required data.

This function exists as an overloaded version where the options parameter may be omitted! In that case, the options parameter defaults to VRNA_FILE_FORMAT_MSA_STOCKHOLM.

unsigned int vrna_file_msa_detect_format ( const char *  filename,
unsigned int  options 
)

#include <ViennaRNA/io/file_formats_msa.h>

Detect the format of a multiple sequence alignment file.

This function attempts to determine the format of a file that supposedly contains a multiple sequence alignment (MSA). This is useful in cases where a MSA file contains more than a single record and therefore vrna_file_msa_read() can not be applied, since it only retrieves the first. Here, one can try to guess the correct file format using this function and then loop over the file, record by record using one of the low-level record retrieval functions for the corresponding MSA file format.

Note
This function parses the entire first record within the specified file. As a result, it returns VRNA_FILE_FORMAT_MSA_UNKNOWN not only if it can't detect the file's format, but also in cases where the file doesn't contain sequences!
See also
vrna_file_msa_read(), vrna_file_stockholm_read_record(), vrna_file_clustal_read_record(), vrna_file_fasta_read_record()
Parameters
filenameThe name of input file that contains the alignment
optionsOptions to manipulate the behavior of this function
Returns
The MSA file format, or VRNA_FILE_FORMAT_MSA_UNKNOWN
SWIG Wrapper Notes:
This function exists as an overloaded version where the options parameter may be omitted! In that case, the options parameter defaults to VRNA_FILE_FORMAT_MSA_DEFAULT.
int vrna_file_msa_write ( const char *  filename,
const char **  names,
const char **  aln,
const char *  id,
const char *  structure,
const char *  source,
unsigned int  options 
)

#include <ViennaRNA/io/file_formats_msa.h>

Write multiple sequence alignment file.

Note
Currently, we only support Stockholm 1.0 format output
See also
VRNA_FILE_FORMAT_MSA_STOCKHOLM, VRNA_FILE_FORMAT_MSA_APPEND, VRNA_FILE_FORMAT_MSA_MIS
Parameters
filenameThe output filename
namesThe array of sequence names / identifies
alnThe array of aligned sequences
idAn optional ID for the alignment
structureAn optional consensus structure
sourceA string describing the source of the alignment
optionsOptions to manipulate the behavior of this function
Returns
Non-null upon successfully writing the alignment to file
SWIG Wrapper Notes:
In the target scripting language, this function exists as a set of overloaded versions, where the last four parameters may be omitted. If the options parameter is missing the options default to (VRNA_FILE_FORMAT_MSA_STOCKHOLM | VRNA_FILE_FORMAT_MSA_APPEND).