bpp-seq  2.2.0
bpp::SiteTools Class Reference

Utilitary methods dealing with sites. More...

#include <Bpp/Seq/SiteTools.h>

+ Inheritance diagram for bpp::SiteTools:
+ Collaboration diagram for bpp::SiteTools:

Public Member Functions

 SiteTools ()
 
virtual ~SiteTools ()
 

Static Public Member Functions

static bool hasGap (const Site &site)
 
static bool isGapOnly (const Site &site)
 
static bool isGapOrUnresolvedOnly (const Site &site)
 
static bool hasUnknown (const Site &site)
 
static bool isComplete (const Site &site)
 
static bool isConstant (const Site &site, bool ignoreUnknown=false, bool unresolvedRaisesException=true) throw (EmptySiteException)
 Tell if a site is constant, that is displaying the same state in all sequences that do not present a gap. More...
 
static bool areSitesIdentical (const Site &site1, const Site &site2)
 
static double variabilityShannon (const Site &site, bool resolveUnknowns) throw (EmptySiteException)
 Compute the Shannon entropy index of a site. More...
 
static double variabilityFactorial (const Site &site) throw (EmptySiteException)
 Compute the factorial diversity index of a site. More...
 
static double mutualInformation (const Site &site1, const Site &site2, bool resolveUnknowns) throw (DimensionException,EmptySiteException)
 Compute the mutual information between two sites. More...
 
static double entropy (const Site &site, bool resolveUnknowns) throw (EmptySiteException)
 Compute the entropy of a site. This is an alias of method variabilityShannon. More...
 
static double jointEntropy (const Site &site1, const Site &site2, bool resolveUnknowns) throw (DimensionException,EmptySiteException)
 Compute the joint entropy between two sites. More...
 
static double heterozygosity (const Site &site) throw (EmptySiteException)
 Compute the heterozygosity index of a site. More...
 
static size_t getNumberOfDistinctCharacters (const Site &site) throw (EmptySiteException)
 Give the number of distinct characters at a site. More...
 
static bool hasSingleton (const Site &site) throw (EmptySiteException)
 Tell if a site has singletons. More...
 
static bool isParsimonyInformativeSite (const Site &site) throw (EmptySiteException)
 Tell if a site is a parsimony informative site. More...
 
static bool isTriplet (const Site &site) throw (EmptySiteException)
 Tell if a site has more than 2 distinct characters. More...
 
static void getCounts (const SymbolList &list, std::map< int, size_t > &counts)
 Count all states in the list. More...
 
static void getCounts (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, size_t > > &counts) throw (DimensionException)
 Count all pair of states for two lists of the same size. More...
 
static void getCounts (const SymbolList &list, std::map< int, double > &counts, bool resolveUnknowns)
 Count all states in the list, optionaly resolving unknown characters. More...
 
static void getCounts (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, double > > &counts, bool resolveUnknowns) throw (DimensionException)
 Count all pair of states for two lists of the same size, optionaly resolving unknown characters. More...
 
static void getFrequencies (const SymbolList &list, std::map< int, double > &frequencies, bool resolveUnknowns=false)
 Get all states frequencies in the list. More...
 
static void getFrequencies (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, double > > &frequencies, bool resolveUnknowns=false) throw (DimensionException)
 Get all state pairs frequencies for two lists of the same size.. More...
 
static double getGCContent (const SymbolList &list, bool ignoreUnresolved=true, bool ignoreGap=true) throw (AlphabetException)
 Get the GC content of a symbol list. More...
 
static size_t getNumberOfDistinctPositions (const SymbolList &l1, const SymbolList &l2) throw (AlphabetMismatchException)
 Get the number of distinct positions. More...
 
static size_t getNumberOfPositionsWithoutGap (const SymbolList &l1, const SymbolList &l2) throw (AlphabetMismatchException)
 Get the number of positions without gap. More...
 
static void changeGapsToUnknownCharacters (SymbolList &l)
 Change all gap elements to unknown characters. More...
 
static void changeUnresolvedCharactersToGaps (SymbolList &l)
 Change all unknown characters to gap elements. More...
 

Detailed Description

Utilitary methods dealing with sites.

Definition at line 57 of file SiteTools.h.

Constructor & Destructor Documentation

◆ SiteTools()

bpp::SiteTools::SiteTools ( )
inline

Definition at line 61 of file SiteTools.h.

◆ ~SiteTools()

virtual bpp::SiteTools::~SiteTools ( )
inlinevirtual

Definition at line 62 of file SiteTools.h.

Member Function Documentation

◆ areSitesIdentical()

bool SiteTools::areSitesIdentical ( const Site site1,
const Site site2 
)
static
Parameters
site1The first site.
site2The second site.
Returns
True if the two states have the same content (and, of course, alphabet).

Definition at line 121 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::getAlphabetType(), and bpp::BasicSymbolList::size().

◆ changeGapsToUnknownCharacters()

void SymbolListTools::changeGapsToUnknownCharacters ( SymbolList l)
staticinherited

Change all gap elements to unknown characters.

Parameters
lThe input list of characters.

Definition at line 180 of file SymbolListTools.cpp.

References bpp::SymbolList::getAlphabet(), bpp::Alphabet::getUnknownCharacterCode(), bpp::Alphabet::isGap(), and bpp::SymbolList::size().

◆ changeUnresolvedCharactersToGaps()

void SymbolListTools::changeUnresolvedCharactersToGaps ( SymbolList l)
staticinherited

Change all unknown characters to gap elements.

Parameters
lThe input list of characters.

Definition at line 189 of file SymbolListTools.cpp.

References bpp::SymbolList::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::Alphabet::isUnresolved(), and bpp::SymbolList::size().

◆ entropy()

static double bpp::SiteTools::entropy ( const Site site,
bool  resolveUnknowns 
)
throw (EmptySiteException
)
inlinestatic

Compute the entropy of a site. This is an alias of method variabilityShannon.

\[ I = - \sum_x f_x\cdot \ln(f_x) \]

where $f_x$ is the frequency of state $x$.

Author
J. Dutheil
Parameters
siteA site.
resolveUnknownsTell is unknown characters must be resolved.
Returns
The Shannon entropy index of this site.
Exceptions
EmptySiteExceptionIf the site has size 0.

Definition at line 178 of file SiteTools.h.

References variabilityShannon().

◆ getCounts() [1/4]

static void bpp::SymbolListTools::getCounts ( const SymbolList list,
std::map< int, size_t > &  counts 
)
inlinestaticinherited

Count all states in the list.

Author
J. Dutheil
Parameters
listThe list.
countsThe output map to store the counts (existing counts will be incremented).

Definition at line 70 of file SymbolListTools.h.

References bpp::SymbolList::getContent().

Referenced by getNumberOfDistinctCharacters(), bpp::SequenceApplicationTools::getSitesToAnalyse(), isParsimonyInformativeSite(), and bpp::CodonSiteTools::numberOfNonSynonymousSubstitutions().

◆ getCounts() [2/4]

static void bpp::SymbolListTools::getCounts ( const SymbolList list1,
const SymbolList list2,
std::map< int, std::map< int, size_t > > &  counts 
)
throw (DimensionException
)
inlinestaticinherited

Count all pair of states for two lists of the same size.

NB: The two lists do node need to share the same alphabet! The states of the first list will be used as the first index in the output, and the ones from the second list as the second index.

Author
J. Dutheil
Parameters
list1The first list.
list2The second list.
countsThe output map to store the counts (existing counts will be incremented).

Definition at line 90 of file SymbolListTools.h.

◆ getCounts() [3/4]

void SymbolListTools::getCounts ( const SymbolList list,
std::map< int, double > &  counts,
bool  resolveUnknowns 
)
staticinherited

Count all states in the list, optionaly resolving unknown characters.

For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.

Author
J. Dutheil
Parameters
listThe list.
countsThe output map to store the counts (existing ocunts will be incremented).
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
Returns
A map with all states and corresponding counts.

Definition at line 51 of file SymbolListTools.cpp.

References bpp::Alphabet::getAlias(), bpp::SymbolList::getAlphabet(), and bpp::SymbolList::getContent().

◆ getCounts() [4/4]

void SymbolListTools::getCounts ( const SymbolList list1,
const SymbolList list2,
std::map< int, std::map< int, double > > &  counts,
bool  resolveUnknowns 
)
throw (DimensionException
)
staticinherited

Count all pair of states for two lists of the same size, optionaly resolving unknown characters.

For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.

NB: The two lists do node need to share the same alphabet! The states of the first list will be used as the first index in the output, and the ones from the second list as the second index.

Author
J. Dutheil
Parameters
list1The first list.
list2The second list.
countsThe output map to store the counts (existing ocunts will be incremented).
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
Returns
A map with all states and corresponding counts.

Definition at line 73 of file SymbolListTools.cpp.

◆ getFrequencies() [1/2]

void SymbolListTools::getFrequencies ( const SymbolList list,
std::map< int, double > &  frequencies,
bool  resolveUnknowns = false 
)
staticinherited

Get all states frequencies in the list.

Author
J. Dutheil
Parameters
listThe list.
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
frequenciesThe output map with all states and corresponding frequencies. Existing frequencies will be erased if any.

Definition at line 96 of file SymbolListTools.cpp.

References bpp::SymbolList::size().

Referenced by bpp::CodonSiteTools::generateCodonSiteWithoutRareVariant(), bpp::SiteContainerTools::getConsensus(), bpp::SequenceApplicationTools::getSitesToAnalyse(), bpp::CodonSiteTools::meanNumberOfSynonymousPositions(), bpp::CodonSiteTools::piNonSynonymous(), bpp::CodonSiteTools::piSynonymous(), and bpp::SiteContainerTools::removeGapSites().

◆ getFrequencies() [2/2]

void SymbolListTools::getFrequencies ( const SymbolList list1,
const SymbolList list2,
std::map< int, std::map< int, double > > &  frequencies,
bool  resolveUnknowns = false 
)
throw (DimensionException
)
staticinherited

Get all state pairs frequencies for two lists of the same size..

Author
J. Dutheil
Parameters
list1The first list.
list2The second list.
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
frequenciesThe output map with all state pairs and corresponding frequencies. Existing frequencies will be erased if any.

Definition at line 107 of file SymbolListTools.cpp.

◆ getGCContent()

double SymbolListTools::getGCContent ( const SymbolList list,
bool  ignoreUnresolved = true,
bool  ignoreGap = true 
)
throw (AlphabetException
)
staticinherited

Get the GC content of a symbol list.

Parameters
listThe list.
Returns
The proportion of G and C states in the list.
Parameters
ignoreUnresolvedDo not count unresolved states. Otherwise, weight by each state probability in case of ambiguity (e.g. the R state counts for 0.5).
ignoreGapDo not count gaps in total.
Exceptions
AlphabetExceptionIf the list is not made of nucleotide states.

Definition at line 119 of file SymbolListTools.cpp.

◆ getNumberOfDistinctCharacters()

size_t SiteTools::getNumberOfDistinctCharacters ( const Site site)
throw (EmptySiteException
)
static

Give the number of distinct characters at a site.

Parameters
sitea Site
Returns
The number of distinct characters in the given site.

Definition at line 333 of file SiteTools.cpp.

References bpp::SymbolListTools::getCounts(), and isConstant().

Referenced by isTriplet(), and bpp::CodonSiteTools::numberOfSubsitutions().

◆ getNumberOfDistinctPositions()

size_t SymbolListTools::getNumberOfDistinctPositions ( const SymbolList l1,
const SymbolList l2 
)
throw (AlphabetMismatchException
)
staticinherited

Get the number of distinct positions.

The comparison in achieved from position 0 to the minimum size of the two vectors.

Parameters
l1SymbolList 1.
l2SymbolList 2.
Returns
The number of distinct positions.
Exceptions
AlphabetMismatchExceptionif the two lists have not the same alphabet type.

Definition at line 158 of file SymbolListTools.cpp.

◆ getNumberOfPositionsWithoutGap()

size_t SymbolListTools::getNumberOfPositionsWithoutGap ( const SymbolList l1,
const SymbolList l2 
)
throw (AlphabetMismatchException
)
staticinherited

Get the number of positions without gap.

The comparison in achieved from position 0 to the minimum size of the two vectors.

Parameters
l1SymbolList 1.
l2SymbolList 2.
Returns
The number of positions without gap.
Exceptions
AlphabetMismatchExceptionif the two lists have not the same alphabet type.

Definition at line 169 of file SymbolListTools.cpp.

◆ hasGap()

◆ hasSingleton()

bool SiteTools::hasSingleton ( const Site site)
throw (EmptySiteException
)
static

Tell if a site has singletons.

Parameters
sitea Site.
Returns
True if the site has singletons.

Definition at line 354 of file SiteTools.cpp.

References isConstant().

◆ hasUnknown()

bool SiteTools::hasUnknown ( const Site site)
static
Parameters
siteA site.
Returns
True if the site contains one or several unknwn characters.

Definition at line 95 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::getUnknownCharacterCode(), and bpp::BasicSymbolList::size().

◆ heterozygosity()

double SiteTools::heterozygosity ( const Site site)
throw (EmptySiteException
)
static

Compute the heterozygosity index of a site.

\[ H = 1 - \sum_x f_x^2 \]

where $f_x$ is the frequency of state $x$.

Parameters
siteA site.
Returns
The heterozygosity index of this site.
Exceptions
EmptySiteExceptionIf the site has size 0.

Definition at line 319 of file SiteTools.cpp.

◆ isComplete()

bool SiteTools::isComplete ( const Site site)
static

◆ isConstant()

bool SiteTools::isConstant ( const Site site,
bool  ignoreUnknown = false,
bool  unresolvedRaisesException = true 
)
throw (EmptySiteException
)
static

Tell if a site is constant, that is displaying the same state in all sequences that do not present a gap.

Parameters
siteA site.
ignoreUnknownIf true, positions with unknown positions will be ignored. Otherwise, a site with one single state + any uncertain state will not be considered as constant.
unresolvedRaisesExceptionIn case of ambiguous case (gap only site for instance), throw an exception. Otherwise returns false.
Returns
True if the site is made of only one state.
Exceptions
EmptySiteExceptionIf the site has size 0 or if the site cannot be resolved (for instance is made of gaps only) and unresolvedRaisesException is set to true.

Definition at line 141 of file SiteTools.cpp.

Referenced by bpp::CodonSiteTools::fixedDifferences(), bpp::CodonSiteTools::generateCodonSiteWithoutRareVariant(), getNumberOfDistinctCharacters(), hasSingleton(), bpp::CodonSiteTools::isFourFoldDegenerated(), bpp::CodonSiteTools::isMonoSitePolymorphic(), isParsimonyInformativeSite(), bpp::CodonSiteTools::isSynonymousPolymorphic(), bpp::CodonSiteTools::numberOfNonSynonymousSubstitutions(), bpp::CodonSiteTools::numberOfSubsitutions(), bpp::CodonSiteTools::piNonSynonymous(), and bpp::CodonSiteTools::piSynonymous().

◆ isGapOnly()

bool SiteTools::isGapOnly ( const Site site)
static
Parameters
siteA site.
Returns
True if the site contains only gaps.

Definition at line 69 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::isGap(), and bpp::BasicSymbolList::size().

Referenced by bpp::SiteContainerTools::removeGapOnlySites(), and bpp::SiteContainerTools::removeGapOrUnresolvedOnlySites().

◆ isGapOrUnresolvedOnly()

bool SiteTools::isGapOrUnresolvedOnly ( const Site site)
static
Parameters
siteA site.
Returns
True if the site contains only gaps.

Definition at line 82 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::isGap(), bpp::Alphabet::isUnresolved(), and bpp::BasicSymbolList::size().

Referenced by bpp::SiteContainerTools::removeGapOrUnresolvedOnlySites().

◆ isParsimonyInformativeSite()

bool SiteTools::isParsimonyInformativeSite ( const Site site)
throw (EmptySiteException
)
static

Tell if a site is a parsimony informative site.

At least two distinct characters must be present.

Parameters
sitea Site.
Returns
True if the site is parsimony informative.

Definition at line 374 of file SiteTools.cpp.

References bpp::SymbolListTools::getCounts(), and isConstant().

◆ isTriplet()

bool SiteTools::isTriplet ( const Site site)
throw (EmptySiteException
)
static

Tell if a site has more than 2 distinct characters.

Parameters
sitea Site.
Returns
True if the site has more than 2 distinct characters

Definition at line 397 of file SiteTools.cpp.

References getNumberOfDistinctCharacters().

◆ jointEntropy()

double SiteTools::jointEntropy ( const Site site1,
const Site site2,
bool  resolveUnknowns 
)
throw (DimensionException,
EmptySiteException
)
static

Compute the joint entropy between two sites.

\[ H_{i,j} = - \sum_x \sum_y p_{x,y}\ln\left(p_{x,y}\right) \]

where $p_{x,y}$ is the frequency of the pair $(x,y)$.

Author
J. Dutheil
Parameters
site1First site
site2Second site
resolveUnknownsTell is unknown characters must be resolved.
Returns
The mutual information for the pair of sites.
Exceptions
DimensionExceptionIf the sites do not have the same length.
EmptySiteExceptionIf the sites have size 0.

Definition at line 269 of file SiteTools.cpp.

◆ mutualInformation()

double SiteTools::mutualInformation ( const Site site1,
const Site site2,
bool  resolveUnknowns 
)
throw (DimensionException,
EmptySiteException
)
static

Compute the mutual information between two sites.

\[ MI = \sum_x \sum_y p_{x,y}\ln\left(\frac{p_{x,y}}{p_x \cdot p_y}\right) \]

where $p_x$ and $p_y$ are the frequencies of states $x$ and $y$, and $p_{x,y}$ is the frequency of the pair $(x,y)$.

Author
J. Dutheil
Parameters
site1First site
site2Second site
resolveUnknownsTell is unknown characters must be resolved.
Returns
The mutual information for the pair of sites.
Exceptions
DimensionExceptionIf the sites do not have the same length.
EmptySiteExceptionIf the sites have size 0.

Definition at line 222 of file SiteTools.cpp.

◆ variabilityFactorial()

double SiteTools::variabilityFactorial ( const Site site)
throw (EmptySiteException
)
static

Compute the factorial diversity index of a site.

\[ F = \frac{log\left(\left(\sum_x p_x\right)!\right)}{\sum_x \log(p_x)!} \]

where $p_x$ is the number of times state $x$ is observed in the site.

Author
J. Dutheil
Parameters
siteA site.
Returns
The factorial diversity index of this site.
Exceptions
EmptySiteExceptionIf the site has size 0.

Definition at line 304 of file SiteTools.cpp.

◆ variabilityShannon()

double SiteTools::variabilityShannon ( const Site site,
bool  resolveUnknowns 
)
throw (EmptySiteException
)
static

Compute the Shannon entropy index of a site.

\[ I = - \sum_x f_x\cdot \ln(f_x) \]

where $f_x$ is the frequency of state $x$.

Author
J. Dutheil
Parameters
siteA site.
resolveUnknownsTell is unknown characters must be resolved.
Returns
The Shannon entropy index of this site.
Exceptions
EmptySiteExceptionIf the site has size 0.

Definition at line 202 of file SiteTools.cpp.

Referenced by entropy().


The documentation for this class was generated from the following files: