bpp-seq  2.2.0
bpp::SiteContainerTools Class Reference

Some utililitary methods to deal with site containers. More...

#include <Bpp/Seq/Container/SiteContainerTools.h>

+ Collaboration diagram for bpp::SiteContainerTools:

Public Member Functions

 SiteContainerTools ()
 
virtual ~SiteContainerTools ()
 

Static Public Member Functions

static SiteContainergetSitesWithoutGaps (const SiteContainer &sites)
 Retrieves sites without gaps from SiteContainer. More...
 
static SiteContainergetCompleteSites (const SiteContainer &sites)
 Retrieves complete sites from SiteContainer. More...
 
static SiteContainerremoveGapOnlySites (const SiteContainer &sites)
 Get a site set without gap-only sites. More...
 
static void removeGapOnlySites (SiteContainer &sites)
 Remove gap-only sites from a site set. More...
 
static SiteContainerremoveGapOrUnresolvedOnlySites (const SiteContainer &sites)
 Get a site set without gap/unresolved-only sites. More...
 
static void removeGapOrUnresolvedOnlySites (SiteContainer &sites)
 Remove gap/unresolved-only sites from a site set. More...
 
static SiteContainerremoveGapSites (const SiteContainer &sites, double maxFreqGaps)
 Get a siteset with sites with less than a given amount of gaps. More...
 
static void removeGapSites (SiteContainer &sites, double maxFreqGaps)
 Remove sites with a given amount of gaps. More...
 
static SiteContainerremoveStopCodonSites (const SiteContainer &sites, const GeneticCode &gCode) throw (AlphabetException)
 Get a site set without stop codons, if the alphabet is a CodonAlphabet, otherwise throws an Exception. More...
 
static SiteContainergetSelectedSites (const SiteContainer &sequences, const SiteSelection &selection)
 Create a new container with a specified set of sites. More...
 
static SiteContainergetSelectedPositions (const SiteContainer &sequences, const SiteSelection &selection)
 Create a new container with a specified set of positions. More...
 
static SequencegetConsensus (const SiteContainer &sc, const std::string &name="consensus", bool ignoreGap=true, bool resolveUnknown=false)
 create the consensus sequence of the alignment. More...
 
static void changeGapsToUnknownCharacters (SiteContainer &sites)
 Change all gaps to unknown state in a container, according to its alphabet. More...
 
static void changeUnresolvedCharactersToGaps (SiteContainer &sites)
 Change all unresolved characters to gaps in a container, according to its alphabet. More...
 
static SiteContainerresolveDottedAlignment (const SiteContainer &dottedAln, const Alphabet *resolvedAlphabet) throw (AlphabetException, Exception)
 Resolve a container with "." notations. More...
 
static std::map< size_t, size_t > translateAlignment (const Sequence &seq1, const Sequence &seq2) throw (AlphabetMismatchException, Exception)
 Translate alignement positions from an aligned sequence to the same sequence in a different alignment. More...
 
static std::map< size_t, size_t > translateSequence (const SiteContainer &sequences, size_t i1, size_t i2)
 Translate sequence positions from a sequence to another in the same alignment. More...
 
static AlignedSequenceContaineralignNW (const Sequence &seq1, const Sequence &seq2, const AlphabetIndex2 &s, double gap) throw (AlphabetMismatchException)
 Align two sequences using the Needleman-Wunsch dynamic algorithm. More...
 
static AlignedSequenceContaineralignNW (const Sequence &seq1, const Sequence &seq2, const AlphabetIndex2 &s, double opening, double extending) throw (AlphabetMismatchException)
 Align two sequences using the Needleman-Wunsch dynamic algorithm. More...
 
static VectorSiteContainersampleSites (const SiteContainer &sites, size_t nbSites, std::vector< size_t > *index=0)
 Sample sites in an alignment. More...
 
static VectorSiteContainerbootstrapSites (const SiteContainer &sites)
 Bootstrap sites in an alignment. More...
 
static double computeSimilarity (const Sequence &seq1, const Sequence &seq2, bool dist=false, const std::string &gapOption=SIMILARITY_NODOUBLEGAP, bool unresolvedAsGap=true) throw (SequenceNotAlignedException, AlphabetMismatchException, Exception)
 Compute the similarity/distance score between two aligned sequences. More...
 
static DistanceMatrixcomputeSimilarityMatrix (const SiteContainer &sites, bool dist=false, const std::string &gapOption=SIMILARITY_NOFULLGAP, bool unresolvedAsGap=true)
 Compute the similarity matrix of an alignment. More...
 
static void merge (SiteContainer &seqCont1, const SiteContainer &seqCont2, bool leavePositionAsIs=false) throw (AlphabetMismatchException, Exception)
 Add the content of a site container to an exhisting one. More...
 
static std::vector< int > getColumnScores (const Matrix< size_t > &positions1, const Matrix< size_t > &positions2, int na=0)
 Compare an alignment to a reference alignment, and compute the column scores. More...
 
static std::vector< double > getSumOfPairsScores (const Matrix< size_t > &positions1, const Matrix< size_t > &positions2, double na=0)
 Compare an alignment to a reference alignment, and compute the sum-of-pairs scores. More...
 
Sequences coordinates.
See also
SequenceWalker For an alternative approach.
static std::map< size_t, size_t > getSequencePositions (const Sequence &seq)
 Get the index of each sequence position in an aligned sequence. More...
 
static std::map< size_t, size_t > getAlignmentPositions (const Sequence &seq)
 Get the index of each alignment position in an aligned sequence. More...
 
static void getSequencePositions (const SiteContainer &sites, Matrix< size_t > &positions)
 Fill a numeric matrix with the size of the alignment, containing the each sequence position. More...
 

Static Public Attributes

static const std::string SIMILARITY_ALL = "all sites"
 
static const std::string SIMILARITY_NOFULLGAP = "no full gap"
 
static const std::string SIMILARITY_NODOUBLEGAP = "no double gap"
 
static const std::string SIMILARITY_NOGAP = "no gap"
 

Detailed Description

Some utililitary methods to deal with site containers.

Definition at line 63 of file SiteContainerTools.h.

Constructor & Destructor Documentation

◆ SiteContainerTools()

bpp::SiteContainerTools::SiteContainerTools ( )
inline

Definition at line 66 of file SiteContainerTools.h.

◆ ~SiteContainerTools()

virtual bpp::SiteContainerTools::~SiteContainerTools ( )
inlinevirtual

Definition at line 67 of file SiteContainerTools.h.

Member Function Documentation

◆ alignNW() [1/2]

AlignedSequenceContainer * SiteContainerTools::alignNW ( const Sequence seq1,
const Sequence seq2,
const AlphabetIndex2 s,
double  gap 
)
throw (AlphabetMismatchException
)
static

Align two sequences using the Needleman-Wunsch dynamic algorithm.

If the input sequences contain gaps, they will be ignored.

See also
BLOSUM50, DefaultNucleotideScore for score matrices.
Parameters
seq1The first sequence.
seq2The second sequence.
sThe score matrix to use.
gapGap penalty.
Returns
A new SiteContainer instance.
Exceptions
AlphabetMismatchExceptionIf the sequences and the score matrix do not share the same alphabet.

Definition at line 558 of file SiteContainerTools.cpp.

References bpp::AlignedSequenceContainer::addSequence(), and bpp::SequenceTools::removeGaps().

◆ alignNW() [2/2]

AlignedSequenceContainer * SiteContainerTools::alignNW ( const Sequence seq1,
const Sequence seq2,
const AlphabetIndex2 s,
double  opening,
double  extending 
)
throw (AlphabetMismatchException
)
static

Align two sequences using the Needleman-Wunsch dynamic algorithm.

If the input sequences contain gaps, they will be ignored.

See also
BLOSUM50, DefaultNucleotideScore for score matrices.
Parameters
seq1The first sequence.
seq2The second sequence.
sThe score matrix to use.
openingGap opening penalty.
extendingGap extending penalty.
Returns
A new SiteContainer instance.
Exceptions
AlphabetMismatchExceptionIf the sequences and the score matrix do not share the same alphabet.

Definition at line 658 of file SiteContainerTools.cpp.

References bpp::AlignedSequenceContainer::addSequence(), and bpp::SequenceTools::removeGaps().

◆ bootstrapSites()

VectorSiteContainer * SiteContainerTools::bootstrapSites ( const SiteContainer sites)
static

Bootstrap sites in an alignment.

Original site positions will be kept. The resulting container will hence probably have duplicated positions. You may wish to call the reindexSites() method on the returned container.

Note: This method will be optimal with a container with vertical storage like VectorSiteContainer.

Parameters
sitesAn input alignment to sample.
Returns
A sampled alignment with the same number of sites than the input one.

Definition at line 811 of file SiteContainerTools.cpp.

References bpp::SiteContainer::getNumberOfSites().

◆ changeGapsToUnknownCharacters()

void SiteContainerTools::changeGapsToUnknownCharacters ( SiteContainer sites)
static

Change all gaps to unknown state in a container, according to its alphabet.

For DNA alphabets, this change all '-' to 'N'.

Parameters
sitesThe container to be modified.

Definition at line 184 of file SiteContainerTools.cpp.

References bpp::SequenceContainer::getAlphabet(), bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::SiteContainer::getNumberOfSites(), bpp::Alphabet::getUnknownCharacterCode(), and bpp::Alphabet::isGap().

Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().

◆ changeUnresolvedCharactersToGaps()

void SiteContainerTools::changeUnresolvedCharactersToGaps ( SiteContainer sites)
static

Change all unresolved characters to gaps in a container, according to its alphabet.

For DNA alphabets, this change all 'N', 'M', 'R', etc. to '-'.

Parameters
sitesThe container to be modified.

Definition at line 201 of file SiteContainerTools.cpp.

References bpp::SequenceContainer::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::SiteContainer::getNumberOfSites(), and bpp::Alphabet::isUnresolved().

◆ computeSimilarity()

double SiteContainerTools::computeSimilarity ( const Sequence seq1,
const Sequence seq2,
bool  dist = false,
const std::string &  gapOption = SIMILARITY_NODOUBLEGAP,
bool  unresolvedAsGap = true 
)
throw (SequenceNotAlignedException,
AlphabetMismatchException,
Exception
)
static

Compute the similarity/distance score between two aligned sequences.

The similarity measures are computed as the proportion of identical match. The distance between the two sequences is defined as 1 - similarity. This function can be used with any type of alphabet.

Parameters
seq1The first sequence.
seq2The second sequence.
distShall we return a distance instead of similarity?
gapOptionHow to deal with gaps:
  • SIMILARITY_ALL: all positions are used.
  • SIMILARITY_NODOUBLEGAP: ignore all positions with a gap in the two sequences.
  • SIMILARITY_NOGAP: ignore all positions with a gap in at least one of the two sequences.
unresolvedAsGapTell if unresolved characters must be considered as gaps when counting. If set to yes, the gap option will also apply to unresolved characters.
Returns
The proportion of matches between the two sequences.
Exceptions
SequenceNotAlignedExceptionIf the two sequences do not have the same length.
AlphabetMismatchExceptionIf the two sequences do not share the same alphabet type.
ExceptionIf an invalid gapOption is passed.

Definition at line 825 of file SiteContainerTools.cpp.

References bpp::Alphabet::getGapCharacterCode(), bpp::Alphabet::isGap(), and bpp::Alphabet::isUnresolved().

◆ computeSimilarityMatrix()

DistanceMatrix * SiteContainerTools::computeSimilarityMatrix ( const SiteContainer sites,
bool  dist = false,
const std::string &  gapOption = SIMILARITY_NOFULLGAP,
bool  unresolvedAsGap = true 
)
static

Compute the similarity matrix of an alignment.

The similarity measures are computed as the proportion of identical match. The distance between the two sequences is defined as 1 - similarity. This function can be used with any type of alphabet. Several options concerning gaps and unresolved characters are proposed:

  • SIMILARITY_ALL: all positions are used.
  • SIMILARITY_NOFULLGAP: ignore positions with a gap in all the sequences in the alignment.
  • SIMILARITY_NODOUBLEGAP: ignore all positions with a gap in the two sequences for each pair.
  • SIMILARITY_NOGAP: ignore all positions with a gap in at least one of the two sequences for each pair.
See also
computeSimilarityMatrix
Parameters
sitesThe input alignment.
distShall we return a distance instead of similarity?
gapOptionHow to deal with gaps.
unresolvedAsGapTell if unresolved characters must be considered as gaps when counting. If set to yes, the gap option will also apply to unresolved characters.
Returns
All pairwise similarity measures.

Definition at line 880 of file SiteContainerTools.cpp.

References bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::OrderedSequenceContainer::getSequence(), and bpp::OrderedSequenceContainer::getSequencesNames().

◆ getAlignmentPositions()

std::map< size_t, size_t > SiteContainerTools::getAlignmentPositions ( const Sequence seq)
static

Get the index of each alignment position in an aligned sequence.

If the sequence contains no gap, the translated and the original positions are the same. Position numbers start at 1.

Parameters
seqThe sequence to translate.
Returns
A map with original alignement positions as keys, and translated positions as values.

Definition at line 444 of file SiteContainerTools.cpp.

References bpp::SymbolList::size().

◆ getColumnScores()

vector< int > SiteContainerTools::getColumnScores ( const Matrix< size_t > &  positions1,
const Matrix< size_t > &  positions2,
int  na = 0 
)
static

Compare an alignment to a reference alignment, and compute the column scores.

Calculations are made according to formula for the "CS" score in Thompson et al 1999, Nucleic Acids Research (1999):27(13);2682–2690.

Parameters
positions1Alignment index for the test alignment.
positions2Alignment index for the reference alignment.
naThe score to use if the tested column is full of gap.
Returns
A vector of score, as 0 or 1.
See also
getSequencePositions for creating the alignment indexes.
Warning
The indexes for the two alignments must have the sequences in the exact same order!
Author
Julien Dutheil

Definition at line 989 of file SiteContainerTools.cpp.

◆ getCompleteSites()

SiteContainer * SiteContainerTools::getCompleteSites ( const SiteContainer sites)
static

Retrieves complete sites from SiteContainer.

This function build a new SiteContainer instance with only complete sites, i.e. site with fully resolved states (no gap, no unknown caracters). The container passed as input is not modified, all sites are copied.

Parameters
sitesThe container to analyse.
Returns
A pointer toward a new SiteContainer with only complete sites.

Definition at line 77 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::CompleteSiteContainerIterator::hasMoreSites(), bpp::CompleteSiteContainerIterator::nextSite(), and bpp::VectorSiteContainer::setSequencesNames().

Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().

◆ getConsensus()

Sequence * SiteContainerTools::getConsensus ( const SiteContainer sc,
const std::string &  name = "consensus",
bool  ignoreGap = true,
bool  resolveUnknown = false 
)
static

create the consensus sequence of the alignment.

In case of ambiguity (for instance a AATT site), one state will be chosen arbitrarily.

Parameters
sca site container
namethe name of the sequence object that will be created.
ignoreGapTell if gap must be counted or not. If not (true option), only fully gapped sites will result in a gap in the consensus sequence.
resolveUnknownTell is unknnown characters must resolved. In a DNA sequence for instance, N will be counted as A=1/4, T=1/4, G=1/4 and C=1/4. Otherwise it will be counted as N=1. If this option is set to true, a consensus sequence will never contain an unknown character.
Returns
A new Sequence object with the consensus sequence.

Definition at line 142 of file SiteContainerTools.cpp.

References bpp::SequenceContainer::getAlphabet(), bpp::SymbolListTools::getFrequencies(), bpp::SimpleSiteContainerIterator::hasMoreSites(), and bpp::SimpleSiteContainerIterator::nextSite().

◆ getSelectedPositions()

SiteContainer * SiteContainerTools::getSelectedPositions ( const SiteContainer sequences,
const SiteSelection selection 
)
static

Create a new container with a specified set of positions.

A new VectorSiteContainer is created with specified. The destruction of the container is up to the user.

Positions are specified by their indice, beginning at 0, and are converted to site positions given the length of the words of the alphaber.

No position verification is performed, based on the assumption that the container passed as an argument is a correct one. Redundant selection is not checked, so be careful with what you're doing!

Parameters
sequencesThe container from wich sequences are to be taken.
selectionThe positions to retrieve.
Returns
A new container with all selected sites.

Definition at line 110 of file SiteContainerTools.cpp.

References bpp::SequenceContainer::getAlphabet(), and bpp::Alphabet::getStateCodingSize().

◆ getSelectedSites()

SiteContainer * SiteContainerTools::getSelectedSites ( const SiteContainer sequences,
const SiteSelection selection 
)
static

Create a new container with a specified set of sites.

A new VectorSiteContainer is created with specified sites. The destruction of the container is up to the user. Sites are specified by their indice, beginning at 0. No position verification is performed, based on the assumption that the container passed as an argument is a correct one. Redundant selection is not checked, so be careful with what you're doing!

Parameters
sequencesThe container from wich sequences are to be taken.
selectionThe positions of all sites to retrieve.
Returns
A new container with all selected sites.

Definition at line 92 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SequenceContainer::getGeneralComments(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), bpp::AbstractSequenceContainer::setGeneralComments(), and bpp::VectorSiteContainer::setSequencesNames().

Referenced by bpp::SequenceApplicationTools::getSiteContainer().

◆ getSequencePositions() [1/2]

std::map< size_t, size_t > SiteContainerTools::getSequencePositions ( const Sequence seq)
static

Get the index of each sequence position in an aligned sequence.

If the sequence contains no gap, the translated and the original positions are the same. Position numbers start at 1.

Parameters
seqThe sequence to translate.
Returns
A map with original sequence positions as keys, and translated positions as values.

Definition at line 425 of file SiteContainerTools.cpp.

References bpp::SymbolList::size().

◆ getSequencePositions() [2/2]

void SiteContainerTools::getSequencePositions ( const SiteContainer sites,
Matrix< size_t > &  positions 
)
static

Fill a numeric matrix with the size of the alignment, containing the each sequence position.

Positions start at 1, gaps have "position" 0.

Parameters
sitesThe input alignment.
positionsA matrix object which is going to be resized and filled with the corresponding positions.
Author
Julien Dutheil

Definition at line 969 of file SiteContainerTools.cpp.

References bpp::SequenceContainer::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::SiteContainer::getNumberOfSites(), and bpp::OrderedSequenceContainer::getSequence().

◆ getSitesWithoutGaps()

SiteContainer * SiteContainerTools::getSitesWithoutGaps ( const SiteContainer sites)
static

Retrieves sites without gaps from SiteContainer.

This function build a new SiteContainer instance with only sites without gaps. The container passed as input is not modified, all sites are copied.

Parameters
sitesThe container to analyse.
Returns
A pointer toward a new SiteContainer with only sites with no gaps.

Definition at line 62 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::NoGapSiteContainerIterator::hasMoreSites(), bpp::NoGapSiteContainerIterator::nextSite(), and bpp::VectorSiteContainer::setSequencesNames().

Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().

◆ getSumOfPairsScores()

vector< double > SiteContainerTools::getSumOfPairsScores ( const Matrix< size_t > &  positions1,
const Matrix< size_t > &  positions2,
double  na = 0 
)
static

Compare an alignment to a reference alignment, and compute the sum-of-pairs scores.

Calculations are made according to formula for the "SPS" score in Thompson et al 1999, Nucleic Acids Research (1999):27(13);2682–2690.

Parameters
positions1Alignment index for the test alignment.
positions2Alignment index for the reference alignment.
naThe score to use if the tested column is not testable, that is not containing at least to residues.
Returns
A vector of score, between 0 and 1 (+ na value).
See also
getSequencePositions for creating the alignment indexes.
Warning
The indexes for the two alignments must have the sequences in the exact same order!
Author
Julien Dutheil

Definition at line 1034 of file SiteContainerTools.cpp.

◆ merge()

void SiteContainerTools::merge ( SiteContainer seqCont1,
const SiteContainer seqCont2,
bool  leavePositionAsIs = false 
)
throw (AlphabetMismatchException,
Exception
)
static

Add the content of a site container to an exhisting one.

The input containers are supposed to have unique sequence names. If it is not the case, several things can happen:

  • If the two containers have exactly the same names in the same order, then the content of the second one will be added as is to the first one.
  • If the second container does not have exactly the same sequences names or in a different order, then a reordered selection of the second contianer is created first, and in that case, only the first sequence with a given name will be used and duplicated. In any case, note that the second container should always contains all the sequence names from the first one, otherwise an exception will be thrown.
Author
Julien Dutheil
Parameters
seqCont1First container.
seqCont2Second container. This container must contain sequences with the same names as in seqcont1. Additional sequences will be ignored.
leavePositionAsIsTell is site position should be unchanged. Otherwise (the default) is to add the size of container 1 to the positions in container 2.
Exceptions
AlphabetMismatchExceptionIf the alphabet in the 2 containers do not match.
ExceptionIf sequence names do not match.

Definition at line 923 of file SiteContainerTools.cpp.

References bpp::SiteContainer::addSite(), bpp::SiteContainer::getNumberOfSites(), bpp::Site::getPosition(), bpp::SequenceContainerTools::getSelectedSequences(), and bpp::SiteContainer::getSite().

◆ removeGapOnlySites() [1/2]

SiteContainer * SiteContainerTools::removeGapOnlySites ( const SiteContainer sites)
static

Get a site set without gap-only sites.

This function build a new SiteContainer instance without sites with only gaps. The container passed as input is not modified, all sites are copied.

See also
removeGapOnlySites(SiteContainer& sites)
Parameters
sitesThe container to analyse.
Returns
A pointer toward a new SiteContainer.

Definition at line 218 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), bpp::SiteTools::isGapOnly(), and bpp::VectorSiteContainer::setSequencesNames().

◆ removeGapOnlySites() [2/2]

void SiteContainerTools::removeGapOnlySites ( SiteContainer sites)
static

Remove gap-only sites from a site set.

Parameters
sitesThe container where the sites have to be removed.

Definition at line 234 of file SiteContainerTools.cpp.

References bpp::SiteContainer::deleteSite(), bpp::SiteContainer::deleteSites(), bpp::SiteContainer::getNumberOfSites(), bpp::SiteContainer::getSite(), and bpp::SiteTools::isGapOnly().

◆ removeGapOrUnresolvedOnlySites() [1/2]

SiteContainer * SiteContainerTools::removeGapOrUnresolvedOnlySites ( const SiteContainer sites)
static

Get a site set without gap/unresolved-only sites.

This function build a new SiteContainer instance without sites with only gaps or unresolved characters. The container passed as input is not modified, all sites are copied.

Parameters
sitesThe container to analyse.
Returns
A pointer toward a new SiteContainer.

Definition at line 265 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), bpp::SiteTools::isGapOrUnresolvedOnly(), and bpp::VectorSiteContainer::setSequencesNames().

◆ removeGapOrUnresolvedOnlySites() [2/2]

void SiteContainerTools::removeGapOrUnresolvedOnlySites ( SiteContainer sites)
static

Remove gap/unresolved-only sites from a site set.

Parameters
sitesThe container where the sites have to be removed.

Definition at line 281 of file SiteContainerTools.cpp.

References bpp::SiteContainer::deleteSite(), bpp::SiteContainer::deleteSites(), bpp::SiteContainer::getNumberOfSites(), bpp::SiteContainer::getSite(), bpp::SiteTools::isGapOnly(), and bpp::SiteTools::isGapOrUnresolvedOnly().

◆ removeGapSites() [1/2]

SiteContainer * SiteContainerTools::removeGapSites ( const SiteContainer sites,
double  maxFreqGaps 
)
static

Get a siteset with sites with less than a given amount of gaps.

Parameters
sitesThe container from which the sites have to be removed.
maxFreqGapsThe maximum frequency of gaps in each site.
Returns
A pointer toward a new SiteContainer.

Definition at line 312 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SymbolListTools::getFrequencies(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), and bpp::VectorSiteContainer::setSequencesNames().

◆ removeGapSites() [2/2]

void SiteContainerTools::removeGapSites ( SiteContainer sites,
double  maxFreqGaps 
)
static

Remove sites with a given amount of gaps.

Parameters
sitesThe container from which the sites have to be removed.
maxFreqGapsThe maximum frequency of gaps in each site.

Definition at line 328 of file SiteContainerTools.cpp.

References bpp::SiteContainer::deleteSite(), bpp::SymbolListTools::getFrequencies(), bpp::SiteContainer::getNumberOfSites(), and bpp::SiteContainer::getSite().

◆ removeStopCodonSites()

SiteContainer * SiteContainerTools::removeStopCodonSites ( const SiteContainer sites,
const GeneticCode gCode 
)
throw (AlphabetException
)
static

Get a site set without stop codons, if the alphabet is a CodonAlphabet, otherwise throws an Exception.

This function build a new SiteContainer instance without sites that have at least a stop codon. The container passed as input is not modified, all sites are copied.

Parameters
sitesThe container to analyse.
gCodethe genetic code to use to determine stop codons.
Returns
A pointer toward a new SiteContainer.

Definition at line 340 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::CodonSiteTools::hasStop(), and bpp::VectorSiteContainer::setSequencesNames().

Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().

◆ resolveDottedAlignment()

SiteContainer * SiteContainerTools::resolveDottedAlignment ( const SiteContainer dottedAln,
const Alphabet resolvedAlphabet 
)
throw (AlphabetException,
Exception
)
static

Resolve a container with "." notations.

ATGCCGTTGG
.C...A..C.
..A....C..

will results in

ATGCCGTTGG
ACCCCATTCG
ATACCGTCGG

for instance. The first sequence is here called the "reference" sequence. It need not be the first in the container. The alphabet of the input alignment must be an instance of the DefaultAlphabet class, the only one which support dot characters. A new alignment is created and returned, with the specified alphabet.

If several sequences that may be considered as reference are found, the first one is used.

Parameters
dottedAlnThe input alignment.
resolvedAlphabetThe alphabet of the output alignment.
Returns
A pointer toward a dynamically created SiteContainer with the specified alphabet (can be a DefaultAlphabet).
Exceptions
AlphabetExceptionIf the alphabet of the input alignment is not of class DefaultAlphabet, or if one character does not match with the output alphabet.
ExceptionIf no reference sequence was found, or if the input alignment contains no sequence.

Definition at line 359 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SymbolList::getChar(), bpp::BasicSymbolList::getChar(), bpp::Site::getPosition(), bpp::AlphabetTools::isDefaultAlphabet(), bpp::VectorSiteContainer::setSequencesNames(), and bpp::SymbolList::size().

Referenced by bpp::NexusIOSequence::appendAlignmentFromStream().

◆ sampleSites()

VectorSiteContainer * SiteContainerTools::sampleSites ( const SiteContainer sites,
size_t  nbSites,
std::vector< size_t > *  index = 0 
)
static

Sample sites in an alignment.

Original site positions will be kept. The resulting container will hence probably have duplicated positions. You may wish to call the reindexSites() method on the returned container.

Note: This method will be optimal with a container with vertical storage like VectorSiteContainer.

Parameters
sitesAn input alignment to sample.
nbSitesThe size of the resulting container.
index[out] If non-null the underlying vector will be appended with the original site indices.
Returns
A sampled alignment with nbSites sites taken from the input one.

Definition at line 796 of file SiteContainerTools.cpp.

References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), and bpp::SiteContainer::getSite().

◆ translateAlignment()

std::map< size_t, size_t > SiteContainerTools::translateAlignment ( const Sequence seq1,
const Sequence seq2 
)
throw (AlphabetMismatchException,
Exception
)
static

Translate alignement positions from an aligned sequence to the same sequence in a different alignment.

Takes each position (starting at 1) in sequence 1, and look for the corresponding position in sequence 2. The two sequences must be the same, excepted for the gaps. If no sequence contains gaps, or if the gaps are at the same place in both sequences, the translated postion will be the same as the original positions.

Parameters
seq1The sequence to translate.
seq2The reference sequence.
Returns
A map with original alignement positions as keys, and translated positions as values.
Exceptions
AlphabetMismatchExceptionIf the sequences do not share the same alphabet.
ExceptionIf the sequence do not match.

Definition at line 463 of file SiteContainerTools.cpp.

◆ translateSequence()

std::map< size_t, size_t > SiteContainerTools::translateSequence ( const SiteContainer sequences,
size_t  i1,
size_t  i2 
)
static

Translate sequence positions from a sequence to another in the same alignment.

Takes each position (starting at 1) in sequence 1, and look for the corresponding position in sequence 2 at the same site. If no corresponding position is available (i.e. if there is a gap in sequence 2 at the corresponding position), 0 is returned.

Parameters
sequencesThe alignment to use.
i1The index of the sequence to translate.
i2The index of the reference sequence.
Returns
A map with original sequence positions as keys, and translated positions as values.

Definition at line 531 of file SiteContainerTools.cpp.

References bpp::SiteContainer::getNumberOfSites(), and bpp::OrderedSequenceContainer::getSequence().

Member Data Documentation

◆ SIMILARITY_ALL

const string SiteContainerTools::SIMILARITY_ALL = "all sites"
static

Definition at line 436 of file SiteContainerTools.h.

◆ SIMILARITY_NODOUBLEGAP

const string SiteContainerTools::SIMILARITY_NODOUBLEGAP = "no double gap"
static

Definition at line 438 of file SiteContainerTools.h.

◆ SIMILARITY_NOFULLGAP

const string SiteContainerTools::SIMILARITY_NOFULLGAP = "no full gap"
static

Definition at line 437 of file SiteContainerTools.h.

◆ SIMILARITY_NOGAP

const string SiteContainerTools::SIMILARITY_NOGAP = "no gap"
static

Definition at line 439 of file SiteContainerTools.h.


The documentation for this class was generated from the following files: