bpp-seq
2.2.0
|
Utilitary functions for codon sites. More...
#include <Bpp/Seq/CodonSiteTools.h>
Public Member Functions | |
CodonSiteTools () | |
virtual | ~CodonSiteTools () |
Static Public Member Functions | |
static bool | hasGapOrStop (const Site &site, const GeneticCode &gCode) throw (AlphabetException) |
Method to know if a codon site contains gap(s) or stop codons. More... | |
static bool | hasStop (const Site &site, const GeneticCode &gCode) throw (AlphabetException) |
Method to know if a codon site contains stop codon or not. More... | |
static bool | isMonoSitePolymorphic (const Site &site) throw (AlphabetException, EmptySiteException) |
Method to know if a polymorphic codon site is polymorphic at only one site. More... | |
static bool | isSynonymousPolymorphic (const Site &site, const GeneticCode &gCode) throw (AlphabetException, AlphabetMismatchException, EmptySiteException) |
Method to know if polymorphism at a codon site is synonymous. More... | |
static Site * | generateCodonSiteWithoutRareVariant (const Site &site, const GeneticCode &gCode, double freqmin) throw (AlphabetException, EmptySiteException) |
generate a codon site without rare variants More... | |
static size_t | numberOfDifferences (int i, int j, const CodonAlphabet &ca) |
Compute the number of differences between two codons. More... | |
static double | numberOfSynonymousDifferences (int i, int j, const GeneticCode &gCode, bool minchange=false) |
Compute the number of synonymous differences between two codons. More... | |
static double | piSynonymous (const Site &site, const GeneticCode &gCode, bool minchange=false) throw (AlphabetException, AlphabetMismatchException, EmptySiteException) |
Compute the synonymous pi per codon site. More... | |
static double | piNonSynonymous (const Site &site, const GeneticCode &gCode, bool minchange=false) throw (AlphabetException, AlphabetMismatchException, EmptySiteException) |
Compute the non-synonymous pi per codon site. More... | |
static double | numberOfSynonymousPositions (int i, const GeneticCode &gCode, double ratio=1.0) throw (Exception) |
Return the number of synonymous positions of a codon. More... | |
static double | meanNumberOfSynonymousPositions (const Site &site, const GeneticCode &gCode, double ratio=1) throw (AlphabetException, AlphabetMismatchException, EmptySiteException) |
Return the mean number of synonymous positions per codon site. More... | |
static size_t | numberOfSubsitutions (const Site &site, const GeneticCode &gCode, double freqmin=0.) throw (AlphabetException, EmptySiteException) |
Return the number of subsitutions per codon site. More... | |
static size_t | numberOfNonSynonymousSubstitutions (const Site &site, const GeneticCode &gCode, double freqmin=0.) throw (AlphabetException, AlphabetMismatchException, EmptySiteException) |
Return the number of Non Synonymous subsitutions per codon site. More... | |
static std::vector< size_t > | fixedDifferences (const Site &siteIn, const Site &siteOut, int i, int j, const GeneticCode &gCode) throw (AlphabetException, AlphabetMismatchException, EmptySiteException) |
Return a vector with the number of fixed synonymous and non-synonymous differences per codon site. More... | |
static bool | isFourFoldDegenerated (const Site &site, const GeneticCode &gCode) |
static void | getCounts (const SymbolList &list, std::map< int, size_t > &counts) |
Count all states in the list. More... | |
static void | getCounts (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, size_t > > &counts) throw (DimensionException) |
Count all pair of states for two lists of the same size. More... | |
static void | getCounts (const SymbolList &list, std::map< int, double > &counts, bool resolveUnknowns) |
Count all states in the list, optionaly resolving unknown characters. More... | |
static void | getCounts (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, double > > &counts, bool resolveUnknowns) throw (DimensionException) |
Count all pair of states for two lists of the same size, optionaly resolving unknown characters. More... | |
static void | getFrequencies (const SymbolList &list, std::map< int, double > &frequencies, bool resolveUnknowns=false) |
Get all states frequencies in the list. More... | |
static void | getFrequencies (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, double > > &frequencies, bool resolveUnknowns=false) throw (DimensionException) |
Get all state pairs frequencies for two lists of the same size.. More... | |
static double | getGCContent (const SymbolList &list, bool ignoreUnresolved=true, bool ignoreGap=true) throw (AlphabetException) |
Get the GC content of a symbol list. More... | |
static size_t | getNumberOfDistinctPositions (const SymbolList &l1, const SymbolList &l2) throw (AlphabetMismatchException) |
Get the number of distinct positions. More... | |
static size_t | getNumberOfPositionsWithoutGap (const SymbolList &l1, const SymbolList &l2) throw (AlphabetMismatchException) |
Get the number of positions without gap. More... | |
static void | changeGapsToUnknownCharacters (SymbolList &l) |
Change all gap elements to unknown characters. More... | |
static void | changeUnresolvedCharactersToGaps (SymbolList &l) |
Change all unknown characters to gap elements. More... | |
Utilitary functions for codon sites.
Definition at line 60 of file CodonSiteTools.h.
|
inline |
Definition at line 64 of file CodonSiteTools.h.
|
inlinevirtual |
Definition at line 65 of file CodonSiteTools.h.
|
staticinherited |
Change all gap elements to unknown characters.
l | The input list of characters. |
Definition at line 180 of file SymbolListTools.cpp.
References bpp::SymbolList::getAlphabet(), bpp::Alphabet::getUnknownCharacterCode(), bpp::Alphabet::isGap(), and bpp::SymbolList::size().
|
staticinherited |
Change all unknown characters to gap elements.
l | The input list of characters. |
Definition at line 189 of file SymbolListTools.cpp.
References bpp::SymbolList::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::Alphabet::isUnresolved(), and bpp::SymbolList::size().
|
static |
Return a vector with the number of fixed synonymous and non-synonymous differences per codon site.
Compute the number of synonymous and non-synonymous differences between the concensus codon of SiteIn (i) and SiteOut (j), which are fixed within each alignement. Example:
Here, the first position is non-synonymous different and fixed, the third position is synonymous different but not fixed (polymorphic in SiteIn). The return vector is thus [0,1]. In case of complex codon, the path that gives the minimum number of non-synonymous changes is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.
Rare variants (<= freqmin) can be excluded.
siteIn | a Site |
siteOut | a Site |
i | an integer |
j | an integer |
gCode | a GeneticCode |
AlphabetException | If the alphabet associated to one of the sites is not a codon alphabet. |
AlphabetMismatchException | If the codon alphabet each the site do not match the codon alphabet of the genetic code. |
EmptySiteException | If one of the sites has size 0. |
Definition at line 682 of file CodonSiteTools.cpp.
References bpp::CodonAlphabet::getFirstPosition(), bpp::CodonAlphabet::getNucleicAlphabet(), bpp::CodonAlphabet::getSecondPosition(), bpp::CodonAlphabet::getThirdPosition(), bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().
|
static |
generate a codon site without rare variants
Rare variants are replaced by the most frequent allele. This method is used to exclude rare variants in some analyses as in McDonald-Kreitman Test (McDonald & Kreitman, 1991, Nature 351 pp652-654). For an application, see for example (Fay et al. 2001, Genetics 158 pp 1227-1234).
site | a Site |
gCode | The genetic code according to which stop codons are specified. |
freqmin | a double, allele in frequency stricly lower than freqmin are replaced |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
EmptySiteException | If the site has size 0. |
Definition at line 158 of file CodonSiteTools.cpp.
References bpp::CodonAlphabet::getFirstPosition(), bpp::SymbolListTools::getFrequencies(), bpp::CodonAlphabet::getNucleicAlphabet(), bpp::CodonAlphabet::getSecondPosition(), bpp::CodonAlphabet::getThirdPosition(), bpp::BasicSymbolList::getValue(), bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().
Referenced by numberOfSubsitutions().
|
inlinestaticinherited |
Count all states in the list.
list | The list. |
counts | The output map to store the counts (existing counts will be incremented). |
Definition at line 70 of file SymbolListTools.h.
References bpp::SymbolList::getContent().
Referenced by bpp::SiteTools::getNumberOfDistinctCharacters(), bpp::SequenceApplicationTools::getSitesToAnalyse(), bpp::SiteTools::isParsimonyInformativeSite(), and numberOfNonSynonymousSubstitutions().
|
inlinestaticinherited |
Count all pair of states for two lists of the same size.
NB: The two lists do node need to share the same alphabet! The states of the first list will be used as the first index in the output, and the ones from the second list as the second index.
list1 | The first list. |
list2 | The second list. |
counts | The output map to store the counts (existing counts will be incremented). |
Definition at line 90 of file SymbolListTools.h.
|
staticinherited |
Count all states in the list, optionaly resolving unknown characters.
For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
list | The list. |
counts | The output map to store the counts (existing ocunts will be incremented). |
resolveUnknowns | Tell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4. |
Definition at line 51 of file SymbolListTools.cpp.
References bpp::Alphabet::getAlias(), bpp::SymbolList::getAlphabet(), and bpp::SymbolList::getContent().
|
staticinherited |
Count all pair of states for two lists of the same size, optionaly resolving unknown characters.
For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
NB: The two lists do node need to share the same alphabet! The states of the first list will be used as the first index in the output, and the ones from the second list as the second index.
list1 | The first list. |
list2 | The second list. |
counts | The output map to store the counts (existing ocunts will be incremented). |
resolveUnknowns | Tell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4. |
Definition at line 73 of file SymbolListTools.cpp.
|
staticinherited |
Get all states frequencies in the list.
list | The list. |
resolveUnknowns | Tell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4. |
frequencies | The output map with all states and corresponding frequencies. Existing frequencies will be erased if any. |
Definition at line 96 of file SymbolListTools.cpp.
References bpp::SymbolList::size().
Referenced by generateCodonSiteWithoutRareVariant(), bpp::SiteContainerTools::getConsensus(), bpp::SequenceApplicationTools::getSitesToAnalyse(), meanNumberOfSynonymousPositions(), piNonSynonymous(), piSynonymous(), and bpp::SiteContainerTools::removeGapSites().
|
staticinherited |
Get all state pairs frequencies for two lists of the same size..
list1 | The first list. |
list2 | The second list. |
resolveUnknowns | Tell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4. |
frequencies | The output map with all state pairs and corresponding frequencies. Existing frequencies will be erased if any. |
Definition at line 107 of file SymbolListTools.cpp.
|
staticinherited |
Get the GC content of a symbol list.
list | The list. |
ignoreUnresolved | Do not count unresolved states. Otherwise, weight by each state probability in case of ambiguity (e.g. the R state counts for 0.5). |
ignoreGap | Do not count gaps in total. |
AlphabetException | If the list is not made of nucleotide states. |
Definition at line 119 of file SymbolListTools.cpp.
|
staticinherited |
Get the number of distinct positions.
The comparison in achieved from position 0 to the minimum size of the two vectors.
l1 | SymbolList 1. |
l2 | SymbolList 2. |
AlphabetMismatchException | if the two lists have not the same alphabet type. |
Definition at line 158 of file SymbolListTools.cpp.
|
staticinherited |
Get the number of positions without gap.
The comparison in achieved from position 0 to the minimum size of the two vectors.
l1 | SymbolList 1. |
l2 | SymbolList 2. |
AlphabetMismatchException | if the two lists have not the same alphabet type. |
Definition at line 169 of file SymbolListTools.cpp.
|
static |
Method to know if a codon site contains gap(s) or stop codons.
site | a Site |
gCode | The genetic code according to which stop codons are specified. |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
Definition at line 60 of file CodonSiteTools.cpp.
References bpp::AlphabetTools::isCodonAlphabet().
|
static |
Method to know if a codon site contains stop codon or not.
site | a Site |
gCode | The genetic code according to which stop codons are specified. |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
Definition at line 75 of file CodonSiteTools.cpp.
References bpp::AlphabetTools::isCodonAlphabet().
Referenced by bpp::SiteContainerTools::removeStopCodonSites().
|
static |
site | The site to analyze. |
gCode | The genetic code to use. |
If non-synonymous mutation
Definition at line 807 of file CodonSiteTools.cpp.
References bpp::BasicSymbolList::getValue(), bpp::SiteTools::isConstant(), bpp::GeneticCode::isFourFoldDegenerated(), isSynonymousPolymorphic(), and bpp::BasicSymbolList::size().
|
static |
Method to know if a polymorphic codon site is polymorphic at only one site.
site | a Site |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
EmptySiteException | If the site has size 0. |
Definition at line 90 of file CodonSiteTools.cpp.
References bpp::CodonAlphabet::getFirstPosition(), bpp::CodonAlphabet::getNucleicAlphabet(), bpp::CodonAlphabet::getSecondPosition(), bpp::CodonAlphabet::getThirdPosition(), bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().
|
static |
Method to know if polymorphism at a codon site is synonymous.
site | a Site |
gCode | a GeneticCode |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
AlphabetMismatchException | If the codon alphabet of the site do not match the codon alphabet of the genetic code. |
EmptySiteException | If the site has size 0. |
Definition at line 128 of file CodonSiteTools.cpp.
References bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().
Referenced by isFourFoldDegenerated().
|
static |
Return the mean number of synonymous positions per codon site.
A site is consider as x% synonymous if x% of the possible mutations are synonymous Transition/transversion ratio can be taken into account (use the variable ratio) The mean is computed over the VectorSite.
Unresolved and stop codons are counted as 0.
site | a Site |
gCode | a GeneticCode |
ratio | a double Set by default to 1 |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
AlphabetMismatchException | If the codon alphabet of the site do not match the codon alphabet of the genetic code. |
EmptySiteException | If the site has size 0. |
Definition at line 563 of file CodonSiteTools.cpp.
References bpp::SymbolListTools::getFrequencies(), and bpp::AlphabetTools::isCodonAlphabet().
|
static |
Compute the number of differences between two codons.
i | a int |
j | a int |
ca | a CodonAlphabet |
Definition at line 220 of file CodonSiteTools.cpp.
References bpp::CodonAlphabet::getFirstPosition(), bpp::CodonAlphabet::getSecondPosition(), and bpp::CodonAlphabet::getThirdPosition().
|
static |
Return the number of Non Synonymous subsitutions per codon site.
It is assumed that the path linking amino acids only involved one substitution by step.
Rare variants (<= freqmin) can be excluded. In case of complex codon, the path that gives the minimum number of non-synonymous changes is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.
site | a Site |
gCode | a GeneticCode |
freqmin | a double To exclude snp in frequency strictly lower than freqmin (by default freqmin = 0). |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
AlphabetMismatchException | If the codon alphabet of the site do not match the codon alphabet of the genetic code. |
EmptySiteException | If the site has size 0. |
Definition at line 633 of file CodonSiteTools.cpp.
References bpp::SymbolListTools::getCounts(), bpp::SiteTools::hasGap(), bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().
|
static |
Return the number of subsitutions per codon site.
No recombination is assumed, that is in complex codon homoplasy is assumed. Example:
Here, 3 substitutions are counted. Assuming that the last codon (AGC) is a recombinant between ATC and AGT would have lead to counting only 2 subsitutions.
Rare variants (<= freqmin) can be excluded.
site | a Site |
gCode | a GeneticCode |
freqmin | a double To exclude snp in frequency strictly lower than freqmin (by default freqmin = 0) |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
EmptySiteException | If the site has size 0. |
Definition at line 588 of file CodonSiteTools.cpp.
References generateCodonSiteWithoutRareVariant(), bpp::CodonAlphabet::getFirstPosition(), bpp::CodonAlphabet::getNucleicAlphabet(), bpp::SiteTools::getNumberOfDistinctCharacters(), bpp::CodonAlphabet::getSecondPosition(), bpp::CodonAlphabet::getThirdPosition(), bpp::BasicSymbolList::getValue(), bpp::SiteTools::hasGap(), bpp::AlphabetTools::isCodonAlphabet(), bpp::SiteTools::isConstant(), and bpp::BasicSymbolList::size().
|
static |
Compute the number of synonymous differences between two codons.
For complex codon: If minchange = false (default option) the different paths are equally weighted. If minchange = true the path with the minimum number of non-synonymous change is chosen. Paths included stop codons are excluded.
i | a int |
j | a int |
gCode | a GeneticCode |
minchange | a boolean set by default to false |
Definition at line 234 of file CodonSiteTools.cpp.
References bpp::GeneticCode::areSynonymous(), bpp::CodonAlphabet::getCodon(), bpp::WordAlphabet::getPositions(), bpp::GeneticCode::getSourceAlphabet(), and bpp::GeneticCode::isStop().
|
static |
Return the number of synonymous positions of a codon.
A site is consider as x% synonymous if x% of the possible mutations are synonymous Transition/transversion ratio can be taken into account (use the variable ratio)
Unresolved codons and stop codon will return a value of 0.
i | a int |
gCode | a GeneticCode |
ratio | a double set by default to 1 |
Definition at line 522 of file CodonSiteTools.cpp.
References bpp::CodonAlphabet::getCodon(), bpp::WordAlphabet::getPositions(), and bpp::WordAlphabet::isUnresolved().
|
static |
Compute the non-synonymous pi per codon site.
The following formula is used:
where n is the number of sequence, and
the frequencies of each codon type occuring at the site
the number of nonsynonymous difference between these codons. Be careful: here, pi is not normalized by the number of non-synonymous sites. If minchange = false (default option) the different paths are equally weighted. If minchange = true the path with the minimum number of non-synonymous change is chosen.
site | a Site |
gCode | a GeneticCode |
minchange | a boolean set by default to false |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
AlphabetMismatchException | If the codon alphabet of the site do not match the codon alphabet of the genetic code. |
EmptySiteException | If the site has size 0. |
Definition at line 485 of file CodonSiteTools.cpp.
References bpp::SymbolListTools::getFrequencies(), bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().
|
static |
Compute the synonymous pi per codon site.
The following formula is used:
where n is the number of sequence, and
the frequencies of each codon type occuring at the site
the number of synonymous difference between these codons. Be careful: here, pi is not normalized by the number of synonymous sites.
If minchange = false (default option) the different paths are equally weighted. If minchange = true the path with the minimum number of non-synonymous change is chosen.
site | a Site |
gCode | a GeneticCode |
minchange | a boolean set by default to false |
AlphabetException | If the alphabet associated to the site is not a codon alphabet. |
AlphabetMismatchException | If the codon alphabet of the site do not match the codon alphabet of the genetic code. |
EmptySiteException | If the site has size 0. |
Definition at line 453 of file CodonSiteTools.cpp.
References bpp::SymbolListTools::getFrequencies(), bpp::AlphabetTools::isCodonAlphabet(), and bpp::SiteTools::isConstant().