bpp-seq
2.2.0
|
Some utililitary methods to deal with site containers. More...
#include <Bpp/Seq/Container/SiteContainerTools.h>
Public Member Functions | |
SiteContainerTools () | |
virtual | ~SiteContainerTools () |
Static Public Member Functions | |
static SiteContainer * | getSitesWithoutGaps (const SiteContainer &sites) |
Retrieves sites without gaps from SiteContainer. More... | |
static SiteContainer * | getCompleteSites (const SiteContainer &sites) |
Retrieves complete sites from SiteContainer. More... | |
static SiteContainer * | removeGapOnlySites (const SiteContainer &sites) |
Get a site set without gap-only sites. More... | |
static void | removeGapOnlySites (SiteContainer &sites) |
Remove gap-only sites from a site set. More... | |
static SiteContainer * | removeGapOrUnresolvedOnlySites (const SiteContainer &sites) |
Get a site set without gap/unresolved-only sites. More... | |
static void | removeGapOrUnresolvedOnlySites (SiteContainer &sites) |
Remove gap/unresolved-only sites from a site set. More... | |
static SiteContainer * | removeGapSites (const SiteContainer &sites, double maxFreqGaps) |
Get a siteset with sites with less than a given amount of gaps. More... | |
static void | removeGapSites (SiteContainer &sites, double maxFreqGaps) |
Remove sites with a given amount of gaps. More... | |
static SiteContainer * | removeStopCodonSites (const SiteContainer &sites, const GeneticCode &gCode) throw (AlphabetException) |
Get a site set without stop codons, if the alphabet is a CodonAlphabet, otherwise throws an Exception. More... | |
static SiteContainer * | getSelectedSites (const SiteContainer &sequences, const SiteSelection &selection) |
Create a new container with a specified set of sites. More... | |
static SiteContainer * | getSelectedPositions (const SiteContainer &sequences, const SiteSelection &selection) |
Create a new container with a specified set of positions. More... | |
static Sequence * | getConsensus (const SiteContainer &sc, const std::string &name="consensus", bool ignoreGap=true, bool resolveUnknown=false) |
create the consensus sequence of the alignment. More... | |
static void | changeGapsToUnknownCharacters (SiteContainer &sites) |
Change all gaps to unknown state in a container, according to its alphabet. More... | |
static void | changeUnresolvedCharactersToGaps (SiteContainer &sites) |
Change all unresolved characters to gaps in a container, according to its alphabet. More... | |
static SiteContainer * | resolveDottedAlignment (const SiteContainer &dottedAln, const Alphabet *resolvedAlphabet) throw (AlphabetException, Exception) |
Resolve a container with "." notations. More... | |
static std::map< size_t, size_t > | translateAlignment (const Sequence &seq1, const Sequence &seq2) throw (AlphabetMismatchException, Exception) |
Translate alignement positions from an aligned sequence to the same sequence in a different alignment. More... | |
static std::map< size_t, size_t > | translateSequence (const SiteContainer &sequences, size_t i1, size_t i2) |
Translate sequence positions from a sequence to another in the same alignment. More... | |
static AlignedSequenceContainer * | alignNW (const Sequence &seq1, const Sequence &seq2, const AlphabetIndex2 &s, double gap) throw (AlphabetMismatchException) |
Align two sequences using the Needleman-Wunsch dynamic algorithm. More... | |
static AlignedSequenceContainer * | alignNW (const Sequence &seq1, const Sequence &seq2, const AlphabetIndex2 &s, double opening, double extending) throw (AlphabetMismatchException) |
Align two sequences using the Needleman-Wunsch dynamic algorithm. More... | |
static VectorSiteContainer * | sampleSites (const SiteContainer &sites, size_t nbSites, std::vector< size_t > *index=0) |
Sample sites in an alignment. More... | |
static VectorSiteContainer * | bootstrapSites (const SiteContainer &sites) |
Bootstrap sites in an alignment. More... | |
static double | computeSimilarity (const Sequence &seq1, const Sequence &seq2, bool dist=false, const std::string &gapOption=SIMILARITY_NODOUBLEGAP, bool unresolvedAsGap=true) throw (SequenceNotAlignedException, AlphabetMismatchException, Exception) |
Compute the similarity/distance score between two aligned sequences. More... | |
static DistanceMatrix * | computeSimilarityMatrix (const SiteContainer &sites, bool dist=false, const std::string &gapOption=SIMILARITY_NOFULLGAP, bool unresolvedAsGap=true) |
Compute the similarity matrix of an alignment. More... | |
static void | merge (SiteContainer &seqCont1, const SiteContainer &seqCont2, bool leavePositionAsIs=false) throw (AlphabetMismatchException, Exception) |
Add the content of a site container to an exhisting one. More... | |
static std::vector< int > | getColumnScores (const Matrix< size_t > &positions1, const Matrix< size_t > &positions2, int na=0) |
Compare an alignment to a reference alignment, and compute the column scores. More... | |
static std::vector< double > | getSumOfPairsScores (const Matrix< size_t > &positions1, const Matrix< size_t > &positions2, double na=0) |
Compare an alignment to a reference alignment, and compute the sum-of-pairs scores. More... | |
Sequences coordinates. | |
| |
static std::map< size_t, size_t > | getSequencePositions (const Sequence &seq) |
Get the index of each sequence position in an aligned sequence. More... | |
static std::map< size_t, size_t > | getAlignmentPositions (const Sequence &seq) |
Get the index of each alignment position in an aligned sequence. More... | |
static void | getSequencePositions (const SiteContainer &sites, Matrix< size_t > &positions) |
Fill a numeric matrix with the size of the alignment, containing the each sequence position. More... | |
Static Public Attributes | |
static const std::string | SIMILARITY_ALL = "all sites" |
static const std::string | SIMILARITY_NOFULLGAP = "no full gap" |
static const std::string | SIMILARITY_NODOUBLEGAP = "no double gap" |
static const std::string | SIMILARITY_NOGAP = "no gap" |
Some utililitary methods to deal with site containers.
Definition at line 63 of file SiteContainerTools.h.
|
inline |
Definition at line 66 of file SiteContainerTools.h.
|
inlinevirtual |
Definition at line 67 of file SiteContainerTools.h.
|
static |
Align two sequences using the Needleman-Wunsch dynamic algorithm.
If the input sequences contain gaps, they will be ignored.
seq1 | The first sequence. |
seq2 | The second sequence. |
s | The score matrix to use. |
gap | Gap penalty. |
AlphabetMismatchException | If the sequences and the score matrix do not share the same alphabet. |
Definition at line 558 of file SiteContainerTools.cpp.
References bpp::AlignedSequenceContainer::addSequence(), and bpp::SequenceTools::removeGaps().
|
static |
Align two sequences using the Needleman-Wunsch dynamic algorithm.
If the input sequences contain gaps, they will be ignored.
seq1 | The first sequence. |
seq2 | The second sequence. |
s | The score matrix to use. |
opening | Gap opening penalty. |
extending | Gap extending penalty. |
AlphabetMismatchException | If the sequences and the score matrix do not share the same alphabet. |
Definition at line 658 of file SiteContainerTools.cpp.
References bpp::AlignedSequenceContainer::addSequence(), and bpp::SequenceTools::removeGaps().
|
static |
Bootstrap sites in an alignment.
Original site positions will be kept. The resulting container will hence probably have duplicated positions. You may wish to call the reindexSites() method on the returned container.
Note: This method will be optimal with a container with vertical storage like VectorSiteContainer.
sites | An input alignment to sample. |
Definition at line 811 of file SiteContainerTools.cpp.
References bpp::SiteContainer::getNumberOfSites().
|
static |
Change all gaps to unknown state in a container, according to its alphabet.
For DNA alphabets, this change all '-' to 'N'.
sites | The container to be modified. |
Definition at line 184 of file SiteContainerTools.cpp.
References bpp::SequenceContainer::getAlphabet(), bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::SiteContainer::getNumberOfSites(), bpp::Alphabet::getUnknownCharacterCode(), and bpp::Alphabet::isGap().
Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().
|
static |
Change all unresolved characters to gaps in a container, according to its alphabet.
For DNA alphabets, this change all 'N', 'M', 'R', etc. to '-'.
sites | The container to be modified. |
Definition at line 201 of file SiteContainerTools.cpp.
References bpp::SequenceContainer::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::SiteContainer::getNumberOfSites(), and bpp::Alphabet::isUnresolved().
|
static |
Compute the similarity/distance score between two aligned sequences.
The similarity measures are computed as the proportion of identical match. The distance between the two sequences is defined as 1 - similarity. This function can be used with any type of alphabet.
seq1 | The first sequence. |
seq2 | The second sequence. |
dist | Shall we return a distance instead of similarity? |
gapOption | How to deal with gaps:
|
unresolvedAsGap | Tell if unresolved characters must be considered as gaps when counting. If set to yes, the gap option will also apply to unresolved characters. |
SequenceNotAlignedException | If the two sequences do not have the same length. |
AlphabetMismatchException | If the two sequences do not share the same alphabet type. |
Exception | If an invalid gapOption is passed. |
Definition at line 825 of file SiteContainerTools.cpp.
References bpp::Alphabet::getGapCharacterCode(), bpp::Alphabet::isGap(), and bpp::Alphabet::isUnresolved().
|
static |
Compute the similarity matrix of an alignment.
The similarity measures are computed as the proportion of identical match. The distance between the two sequences is defined as 1 - similarity. This function can be used with any type of alphabet. Several options concerning gaps and unresolved characters are proposed:
sites | The input alignment. |
dist | Shall we return a distance instead of similarity? |
gapOption | How to deal with gaps. |
unresolvedAsGap | Tell if unresolved characters must be considered as gaps when counting. If set to yes, the gap option will also apply to unresolved characters. |
Definition at line 880 of file SiteContainerTools.cpp.
References bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::OrderedSequenceContainer::getSequence(), and bpp::OrderedSequenceContainer::getSequencesNames().
|
static |
Get the index of each alignment position in an aligned sequence.
If the sequence contains no gap, the translated and the original positions are the same. Position numbers start at 1.
seq | The sequence to translate. |
Definition at line 444 of file SiteContainerTools.cpp.
References bpp::SymbolList::size().
|
static |
Compare an alignment to a reference alignment, and compute the column scores.
Calculations are made according to formula for the "CS" score in Thompson et al 1999, Nucleic Acids Research (1999):27(13);2682–2690.
positions1 | Alignment index for the test alignment. |
positions2 | Alignment index for the reference alignment. |
na | The score to use if the tested column is full of gap. |
Definition at line 989 of file SiteContainerTools.cpp.
|
static |
Retrieves complete sites from SiteContainer.
This function build a new SiteContainer instance with only complete sites, i.e. site with fully resolved states (no gap, no unknown caracters). The container passed as input is not modified, all sites are copied.
sites | The container to analyse. |
Definition at line 77 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::CompleteSiteContainerIterator::hasMoreSites(), bpp::CompleteSiteContainerIterator::nextSite(), and bpp::VectorSiteContainer::setSequencesNames().
Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().
|
static |
create the consensus sequence of the alignment.
In case of ambiguity (for instance a AATT site), one state will be chosen arbitrarily.
sc | a site container |
name | the name of the sequence object that will be created. |
ignoreGap | Tell if gap must be counted or not. If not (true option), only fully gapped sites will result in a gap in the consensus sequence. |
resolveUnknown | Tell is unknnown characters must resolved. In a DNA sequence for instance, N will be counted as A=1/4, T=1/4, G=1/4 and C=1/4. Otherwise it will be counted as N=1. If this option is set to true, a consensus sequence will never contain an unknown character. |
Definition at line 142 of file SiteContainerTools.cpp.
References bpp::SequenceContainer::getAlphabet(), bpp::SymbolListTools::getFrequencies(), bpp::SimpleSiteContainerIterator::hasMoreSites(), and bpp::SimpleSiteContainerIterator::nextSite().
|
static |
Create a new container with a specified set of positions.
A new VectorSiteContainer is created with specified. The destruction of the container is up to the user.
Positions are specified by their indice, beginning at 0, and are converted to site positions given the length of the words of the alphaber.
No position verification is performed, based on the assumption that the container passed as an argument is a correct one. Redundant selection is not checked, so be careful with what you're doing!
sequences | The container from wich sequences are to be taken. |
selection | The positions to retrieve. |
Definition at line 110 of file SiteContainerTools.cpp.
References bpp::SequenceContainer::getAlphabet(), and bpp::Alphabet::getStateCodingSize().
|
static |
Create a new container with a specified set of sites.
A new VectorSiteContainer is created with specified sites. The destruction of the container is up to the user. Sites are specified by their indice, beginning at 0. No position verification is performed, based on the assumption that the container passed as an argument is a correct one. Redundant selection is not checked, so be careful with what you're doing!
sequences | The container from wich sequences are to be taken. |
selection | The positions of all sites to retrieve. |
Definition at line 92 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SequenceContainer::getGeneralComments(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), bpp::AbstractSequenceContainer::setGeneralComments(), and bpp::VectorSiteContainer::setSequencesNames().
Referenced by bpp::SequenceApplicationTools::getSiteContainer().
|
static |
Get the index of each sequence position in an aligned sequence.
If the sequence contains no gap, the translated and the original positions are the same. Position numbers start at 1.
seq | The sequence to translate. |
Definition at line 425 of file SiteContainerTools.cpp.
References bpp::SymbolList::size().
|
static |
Fill a numeric matrix with the size of the alignment, containing the each sequence position.
Positions start at 1, gaps have "position" 0.
sites | The input alignment. |
positions | A matrix object which is going to be resized and filled with the corresponding positions. |
Definition at line 969 of file SiteContainerTools.cpp.
References bpp::SequenceContainer::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::OrderedSequenceContainer::getNumberOfSequences(), bpp::SiteContainer::getNumberOfSites(), and bpp::OrderedSequenceContainer::getSequence().
|
static |
Retrieves sites without gaps from SiteContainer.
This function build a new SiteContainer instance with only sites without gaps. The container passed as input is not modified, all sites are copied.
sites | The container to analyse. |
Definition at line 62 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::NoGapSiteContainerIterator::hasMoreSites(), bpp::NoGapSiteContainerIterator::nextSite(), and bpp::VectorSiteContainer::setSequencesNames().
Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().
|
static |
Compare an alignment to a reference alignment, and compute the sum-of-pairs scores.
Calculations are made according to formula for the "SPS" score in Thompson et al 1999, Nucleic Acids Research (1999):27(13);2682–2690.
positions1 | Alignment index for the test alignment. |
positions2 | Alignment index for the reference alignment. |
na | The score to use if the tested column is not testable, that is not containing at least to residues. |
Definition at line 1034 of file SiteContainerTools.cpp.
|
static |
Add the content of a site container to an exhisting one.
The input containers are supposed to have unique sequence names. If it is not the case, several things can happen:
seqCont1 | First container. |
seqCont2 | Second container. This container must contain sequences with the same names as in seqcont1. Additional sequences will be ignored. |
leavePositionAsIs | Tell is site position should be unchanged. Otherwise (the default) is to add the size of container 1 to the positions in container 2. |
AlphabetMismatchException | If the alphabet in the 2 containers do not match. |
Exception | If sequence names do not match. |
Definition at line 923 of file SiteContainerTools.cpp.
References bpp::SiteContainer::addSite(), bpp::SiteContainer::getNumberOfSites(), bpp::Site::getPosition(), bpp::SequenceContainerTools::getSelectedSequences(), and bpp::SiteContainer::getSite().
|
static |
Get a site set without gap-only sites.
This function build a new SiteContainer instance without sites with only gaps. The container passed as input is not modified, all sites are copied.
sites | The container to analyse. |
Definition at line 218 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), bpp::SiteTools::isGapOnly(), and bpp::VectorSiteContainer::setSequencesNames().
|
static |
Remove gap-only sites from a site set.
sites | The container where the sites have to be removed. |
Definition at line 234 of file SiteContainerTools.cpp.
References bpp::SiteContainer::deleteSite(), bpp::SiteContainer::deleteSites(), bpp::SiteContainer::getNumberOfSites(), bpp::SiteContainer::getSite(), and bpp::SiteTools::isGapOnly().
|
static |
Get a site set without gap/unresolved-only sites.
This function build a new SiteContainer instance without sites with only gaps or unresolved characters. The container passed as input is not modified, all sites are copied.
sites | The container to analyse. |
Definition at line 265 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), bpp::SiteTools::isGapOrUnresolvedOnly(), and bpp::VectorSiteContainer::setSequencesNames().
|
static |
Remove gap/unresolved-only sites from a site set.
sites | The container where the sites have to be removed. |
Definition at line 281 of file SiteContainerTools.cpp.
References bpp::SiteContainer::deleteSite(), bpp::SiteContainer::deleteSites(), bpp::SiteContainer::getNumberOfSites(), bpp::SiteContainer::getSite(), bpp::SiteTools::isGapOnly(), and bpp::SiteTools::isGapOrUnresolvedOnly().
|
static |
Get a siteset with sites with less than a given amount of gaps.
sites | The container from which the sites have to be removed. |
maxFreqGaps | The maximum frequency of gaps in each site. |
Definition at line 312 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SymbolListTools::getFrequencies(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), bpp::SiteContainer::getSite(), and bpp::VectorSiteContainer::setSequencesNames().
|
static |
Remove sites with a given amount of gaps.
sites | The container from which the sites have to be removed. |
maxFreqGaps | The maximum frequency of gaps in each site. |
Definition at line 328 of file SiteContainerTools.cpp.
References bpp::SiteContainer::deleteSite(), bpp::SymbolListTools::getFrequencies(), bpp::SiteContainer::getNumberOfSites(), and bpp::SiteContainer::getSite().
|
static |
Get a site set without stop codons, if the alphabet is a CodonAlphabet, otherwise throws an Exception.
This function build a new SiteContainer instance without sites that have at least a stop codon. The container passed as input is not modified, all sites are copied.
sites | The container to analyse. |
gCode | the genetic code to use to determine stop codons. |
Definition at line 340 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::CodonSiteTools::hasStop(), and bpp::VectorSiteContainer::setSequencesNames().
Referenced by bpp::SequenceApplicationTools::getSitesToAnalyse().
|
static |
Resolve a container with "." notations.
will results in
for instance. The first sequence is here called the "reference" sequence. It need not be the first in the container. The alphabet of the input alignment must be an instance of the DefaultAlphabet class, the only one which support dot characters. A new alignment is created and returned, with the specified alphabet.
If several sequences that may be considered as reference are found, the first one is used.
dottedAln | The input alignment. |
resolvedAlphabet | The alphabet of the output alignment. |
AlphabetException | If the alphabet of the input alignment is not of class DefaultAlphabet, or if one character does not match with the output alphabet. |
Exception | If no reference sequence was found, or if the input alignment contains no sequence. |
Definition at line 359 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SymbolList::getChar(), bpp::BasicSymbolList::getChar(), bpp::Site::getPosition(), bpp::AlphabetTools::isDefaultAlphabet(), bpp::VectorSiteContainer::setSequencesNames(), and bpp::SymbolList::size().
Referenced by bpp::NexusIOSequence::appendAlignmentFromStream().
|
static |
Sample sites in an alignment.
Original site positions will be kept. The resulting container will hence probably have duplicated positions. You may wish to call the reindexSites() method on the returned container.
Note: This method will be optimal with a container with vertical storage like VectorSiteContainer.
sites | An input alignment to sample. |
nbSites | The size of the resulting container. |
index | [out] If non-null the underlying vector will be appended with the original site indices. |
Definition at line 796 of file SiteContainerTools.cpp.
References bpp::VectorSiteContainer::addSite(), bpp::SequenceContainer::getAlphabet(), bpp::SiteContainer::getNumberOfSites(), bpp::OrderedSequenceContainer::getSequencesNames(), and bpp::SiteContainer::getSite().
|
static |
Translate alignement positions from an aligned sequence to the same sequence in a different alignment.
Takes each position (starting at 1) in sequence 1, and look for the corresponding position in sequence 2. The two sequences must be the same, excepted for the gaps. If no sequence contains gaps, or if the gaps are at the same place in both sequences, the translated postion will be the same as the original positions.
seq1 | The sequence to translate. |
seq2 | The reference sequence. |
AlphabetMismatchException | If the sequences do not share the same alphabet. |
Exception | If the sequence do not match. |
Definition at line 463 of file SiteContainerTools.cpp.
|
static |
Translate sequence positions from a sequence to another in the same alignment.
Takes each position (starting at 1) in sequence 1, and look for the corresponding position in sequence 2 at the same site. If no corresponding position is available (i.e. if there is a gap in sequence 2 at the corresponding position), 0 is returned.
sequences | The alignment to use. |
i1 | The index of the sequence to translate. |
i2 | The index of the reference sequence. |
Definition at line 531 of file SiteContainerTools.cpp.
References bpp::SiteContainer::getNumberOfSites(), and bpp::OrderedSequenceContainer::getSequence().
|
static |
Definition at line 436 of file SiteContainerTools.h.
|
static |
Definition at line 438 of file SiteContainerTools.h.
|
static |
Definition at line 437 of file SiteContainerTools.h.
|
static |
Definition at line 439 of file SiteContainerTools.h.