bpp-popgen  2.2.0
bpp::SequenceStatistics Class Reference

Static class providing methods to compute statistics on sequences data. More...

#include <Bpp/PopGen/SequenceStatistics.h>

Public Member Functions

double fstHudson92 (const PolymorphismSequenceContainer &psc, size_t id1, size_t id2)

Static Public Member Functions

static unsigned int numberOfPolymorphicSites (const PolymorphismSequenceContainer &psc, bool gapflag=true, bool ignoreUnknown=true)
 Compute the number of polymorphic site in an alignment. More...
static unsigned int numberOfParsimonyInformativeSites (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Compute the number of parsimony informative sites in an alignment. More...
static unsigned int numberOfSingletons (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Count the number of singleton nucleotides in an alignment. More...
static unsigned int totalNumberOfMutations (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Count the total number of mutations in an alignment. More...
static unsigned int totalNumberOfMutationsOnExternalBranches (const PolymorphismSequenceContainer &ing, const PolymorphismSequenceContainer &outg) throw (Exception)
 Count the total number of mutations in external branchs. More...
static unsigned int numberOfTriplets (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Compute the number of triplet in an alignment. More...
static double heterozygosity (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Compute the sum of per site heterozygosity in an alignment. More...
static double squaredHeterozygosity (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Compute the sum of per site squared heterozygosity in an alignment. More...
static double gcContent (const PolymorphismSequenceContainer &psc)
 Compute the mean GC content in an alignment. More...
static std::vector< unsigned int > gcPolymorphism (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Return the number of GC alleles and the total number of alleles at polymorphic sites only. More...
static double watterson75 (const PolymorphismSequenceContainer &psc, bool gapflag=true, bool ignoreUnknown=true)
 Compute diversity estimator Theta of Watterson (1975, Theor Popul Biol, 7 pp256-276) More...
static double tajima83 (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Compute diversity estimator Theta of Tajima (1983, Genetics, 105 pp437-460) More...
static double fayWu2000 (const PolymorphismSequenceContainer &psc, const Sequence &ancestralSites)
 Compute diversity estimator Theta H (eq. 3) of Fay and Wu (2000, Genetics, 155: 1405-1413) More...
static unsigned int dvk (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Return the number of haplotype in the sample. Depaulis and Veuille (1998, Mol Biol Evol, 12 pp1788-1790) More...
static double dvh (const PolymorphismSequenceContainer &psc, bool gapflag=true)
 Return the haplotype diversity of a sample. Depaulis and Veuille (1998, Mol Biol Evol, 12 pp1788-1790) More...
static unsigned int numberOfTransitions (const PolymorphismSequenceContainer &psc)
 Return the number of transitions. More...
static unsigned int numberOfTransversions (const PolymorphismSequenceContainer &psc)
 Return the number of transversions. More...
static double ratioOfTransitionsTransversions (const PolymorphismSequenceContainer &psc) throw (Exception)
 Return the ratio of transitions/transversions. More...
static unsigned int numberOfSitesWithStopCodon (const PolymorphismSequenceContainer &psc, const GeneticCode &gCode, bool gapflag=true)
 Compute the number of codon sites with stop codon. More...
static unsigned int numberOfMonoSitePolymorphicCodons (const PolymorphismSequenceContainer &psc, bool stopflag=true, bool gapflag=true)
 Compute the number of polymorphic codon with only one mutated site. More...
static unsigned int numberOfSynonymousPolymorphicCodons (const PolymorphismSequenceContainer &psc, const GeneticCode &gc)
 Compute the number of synonymous polymorphic codon sites. More...
static double watterson75Synonymous (const PolymorphismSequenceContainer &psc, const GeneticCode &gc)
 Compute the Watterson(1975,Theor Popul Biol, 7 pp256-276) estimator for synonymous positions. More...
static double watterson75NonSynonymous (const PolymorphismSequenceContainer &psc, const GeneticCode &gc)
 Compute the Watterson(1975, Theor Popul Biol, 7 pp256-276) estimator for non synonymous positions. More...
static double piSynonymous (const PolymorphismSequenceContainer &psc, const GeneticCode &gc, bool minchange=false)
 Compute the synonymous nucleotide diversity, pi. More...
static double piNonSynonymous (const PolymorphismSequenceContainer &psc, const GeneticCode &gc, bool minchange=false)
 Compute the non-synonymous nucleotide diversity, pi. More...
static double meanNumberOfSynonymousSites (const PolymorphismSequenceContainer &psc, const GeneticCode &gc, double ratio=1.)
 compute the mean number of synonymous site in an alignment More...
static double meanNumberOfNonSynonymousSites (const PolymorphismSequenceContainer &psc, const GeneticCode &gc, double ratio=1.)
 compute the mean number of non-synonymous site in an alignment More...
static unsigned int numberOfSynonymousSubstitutions (const PolymorphismSequenceContainer &psc, const GeneticCode &gc, double freqmin=0.)
 compute the number of synonymous subsitutions in an alignment More...
static unsigned int numberOfNonSynonymousSubstitutions (const PolymorphismSequenceContainer &psc, const GeneticCode &gc, double freqmin=0.)
 compute the number of non synonymous subsitutions in an alignment More...
static std::vector< unsigned int > fixedDifferences (const PolymorphismSequenceContainer &pscin, const PolymorphismSequenceContainer &pscout, PolymorphismSequenceContainer &psccons, const GeneticCode &gc)
 compute the number of fixed differences between two alignements More...
static std::vector< unsigned int > mkTable (const PolymorphismSequenceContainer &ingroup, const PolymorphismSequenceContainer &outgroup, const GeneticCode &gc, double freqmin=0.)
 return a vector containing Pa, Ps, Da, Ds More...
static double neutralityIndex (const PolymorphismSequenceContainer &ingroup, const PolymorphismSequenceContainer &outgroup, const GeneticCode &gc, double freqmin=0.)
 return the neutrality index NI = (Pa/Ps)/(Da/Ds) (Rand & Kann 1996, Mol. Biol. Evol. 13 pp735-748) More...
static double tajimaDss (const PolymorphismSequenceContainer &psc, bool gapflag=true) throw (ZeroDivisionException)
 Return the Tajima's D test (Tajima 1989, Genetics 123 pp 585-595). More...
static double tajimaDtnm (const PolymorphismSequenceContainer &psc, bool gapflag=true) throw (ZeroDivisionException)
 Return the Tajima's D test (Tajima 1989, Genetics 123 pp 585-595). More...
static double fuLiD (const PolymorphismSequenceContainer &ingroup, const PolymorphismSequenceContainer &outgroup, bool original=true) throw (ZeroDivisionException)
 Return the Fu and Li D test (Fu & Li 1993, Genetics, 133 pp693-709). More...
static double fuLiDStar (const PolymorphismSequenceContainer &group) throw (ZeroDivisionException)
 Return the Fu and Li D* test (Fu & Li 1993, Genetics, 133 pp693-709). More...
static double fuLiF (const PolymorphismSequenceContainer &ingroup, const PolymorphismSequenceContainer &outgroup, bool original=true) throw (ZeroDivisionException)
 Return the Fu and Li F test (Fu & Li 1993, Genetics, 133 pp693-709). More...
static double fuLiFStar (const PolymorphismSequenceContainer &group) throw (ZeroDivisionException)
 Return the Fu and Li F* test (Fu & Li 1993, Genetics, 133 pp693-709). More...
static PolymorphismSequenceContainergenerateLdContainer (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.)
 generate a special PolymorphismSequenceContainer for linkage disequilbrium analysis More...
static Vdouble pairwiseDistances1 (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the vector of the pairwise distances between site positions corresponding to a LD SequencePolymorphismContainer More...
static Vdouble pairwiseDistances2 (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the vector of all mean pairwise distance between two sites to a LD SequencePolymorphismContainer More...
static Vdouble pairwiseD (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the vector of all mean pairwise D value between two sites (Lewontin & Kojima 1964, Evolution 14 pp458-472) More...
static Vdouble pairwiseDprime (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the vector of all mean pairwise D' value between two sites (Lewontin 1964, Genetics 49 pp49-67)) More...
static Vdouble pairwiseR2 (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the vector of all mean pairwise R² value between two sites (Hill & Robertson 1968, Theor. Appl. Genet., 38 pp226-231) More...
static double meanD (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give mean D over all pairwise comparisons More...
static double meanDprime (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give mean D' over all pairwise comparisons More...
static double meanR2 (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give mean R² over all pairwise comparisons More...
static double meanDistance1 (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give mean pairwise distances between sites / method 1: differences between sequences are not taken into account More...
static double meanDistance2 (const PolymorphismSequenceContainer &psc, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give mean pairwise distances between sites / method 2: differences between sequences are taken into account More...
static double originRegressionD (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope of the regression |D| = 1+a*distance More...
static double originRegressionDprime (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope of the regression |D'| = 1+a*distance More...
static double originRegressionR2 (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope of the regression R² = 1+a*distance More...
static Vdouble linearRegressionD (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope and the origin of the regression |D| = a*distance+b More...
static Vdouble linearRegressionDprime (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope and the origin of the regression |D'| = a*distance+b More...
static Vdouble linearRegressionR2 (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope and the origin of the regression R² = a*distance+b More...
static double inverseRegressionR2 (const PolymorphismSequenceContainer &psc, bool distance1=false, bool keepsingleton=true, double freqmin=0.) throw (DimensionException)
 give the slope of the regression R² = 1/(1+a*distance) More...
static double hudson87 (const PolymorphismSequenceContainer &psc, double precision=0.000001, double cinf=0.001, double csup=10000.)
 give estimate of C=4Nr using Hudson method (Hudson 1987, Genet. Res., 50 pp245-250) More...
static void testUsefulValues (std::ostream &s, size_t n)
 Test useful values. More...

Static Private Member Functions

static unsigned int getNumberOfMutations_ (const Site &site)
 Count the number of mutation for a site. More...
static unsigned int getNumberOfSingletons_ (const Site &site)
 Count the number of singleton for a site. More...
static unsigned getNumberOfDerivedSingletons_ (const Site &site_in, const Site &site_out)
 Count the number of singleton for a site. More...
static std::map< std::string, double > getUsefulValues_ (size_t n)
 Get useful values for theta estimators. More...
static double getVD_ (size_t n, double a1, double a2, double cn)
 Get the vD value of equation (32) in Fu & Li 1993, Genetics, 133 pp693-709) More...
static double getUD_ (double a1, double vD)
 Get the uD value of equation (32) in Fu & Li 1993, Genetics, 133 pp693-709) More...
static double getVDstar_ (size_t n, double a1, double a2, double dn)
 Get the vD* value of D* equation in Fu & Li 1993, Genetics, 133 pp693-709) More...
static double getUDstar_ (size_t n, double a1, double vDs)
 Get the uD* value of D* equation in Fu & Li 1993, Genetics, 133 pp693-709) More...
static double leftHandHudson_ (const PolymorphismSequenceContainer &psc)
 give the left hand term of equation (4) in Hudson (Hudson 1987, Genet. Res., 50 pp245-250) This term is used in hudson87 More...
static double rightHandHudson_ (double c, size_t n)
 give the right hand term of equation (4) in Hudson (Hudson 1987, Genet. Res., 50 pp245-250) This term is used in hudson87 More...

Detailed Description

Static class providing methods to compute statistics on sequences data.

Sylvain Gaillard

Definition at line 69 of file SequenceStatistics.h.

Member Function Documentation

◆ dvh()

double SequenceStatistics::dvh ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Return the haplotype diversity of a sample. Depaulis and Veuille (1998, Mol Biol Evol, 12 pp1788-1790)

psca PolymorphismSequenceContainer
gapflagflag set by default to true if you don't want to take gaps into account
Éric Bazin
  • remove unneeded Sequence Container recopy
  • work on Sequence rather on string

Definition at line 412 of file SequenceStatistics.cpp.

References bpp::PolymorphismSequenceContainer::getSequenceCount().

◆ dvk()

unsigned int SequenceStatistics::dvk ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Return the number of haplotype in the sample. Depaulis and Veuille (1998, Mol Biol Evol, 12 pp1788-1790)

psca PolymorphismSequenceContainer
gapflagflag set by default to true if you don't want to take gap into account
Éric Bazin
  • remove unneeded Sequence Container recopy
  • work on Sequence rather than string

Definition at line 374 of file SequenceStatistics.cpp.

◆ fayWu2000()

double SequenceStatistics::fayWu2000 ( const PolymorphismSequenceContainer psc,
const Sequence &  ancestralSites 

Compute diversity estimator Theta H (eq. 3) of Fay and Wu (2000, Genetics, 155: 1405-1413)

psca PolymorphismSequenceContainer
ancestralSitesa Sequence containing the ancestral states (reconstructed independently) to fold the mutation in the psc SequenceContainer.
Benoit Nabholz

Definition at line 327 of file SequenceStatistics.cpp.

◆ fixedDifferences()

vector< unsigned int > SequenceStatistics::fixedDifferences ( const PolymorphismSequenceContainer pscin,
const PolymorphismSequenceContainer pscout,
PolymorphismSequenceContainer psccons,
const GeneticCode &  gc 

compute the number of fixed differences between two alignements

Gaps and unresolved sites are automatically excluded

In case of complex codon, the path that gives the minimum number of non-synonymous changes is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.

pscina PolymorphismSequenceContainer
pscouta PolymorphismSequenceContainer
pscconsa PolymorphismSequenceContainer
gca GeneticCode
Sylvain Glémin

Definition at line 728 of file SequenceStatistics.cpp.

◆ fstHudson92()

double SequenceStatistics::fstHudson92 ( const PolymorphismSequenceContainer psc,
size_t  id1,
size_t  id2 

Fst of Hudson, Slatkin and Maddison

Taken from eq. 3 of Hudson, Slatkin and Maddison 1992 Genetics 132:153

\[ F_{st} = 1 - \frac{H_w}{H_b} \]

where $H_w$ is mean number of differences between different sequences sampled from the same subpopulation, and $H_b$ is the mean number of differences between sequences sampled from the two different subpopulations sampled.

psca PolymorphismSequenceContainer will at least two populations
id1is the id of the population 1
id2is the id of the population 2
Benoit Nabholz

Definition at line 898 of file SequenceStatistics.cpp.

◆ fuLiD()

double SequenceStatistics::fuLiD ( const PolymorphismSequenceContainer ingroup,
const PolymorphismSequenceContainer outgroup,
bool  original = true 
throw (ZeroDivisionException

Return the Fu and Li D test (Fu & Li 1993, Genetics, 133 pp693-709).

ingroupa PolymorphismSequenceContainer
outgroupa PolymorphismSequenceContainer
originaltrue: use the Fu & Li methode, false: use mutations in external branch.
ZeroDivisionExceptionif eta == 0
Sylvain Gaillard
Khalid Belkhir

If one set original=false then the number of mutations will be used. If the outgroup contains more than one sequence the sites with more than one variant will not be considered for external branch mutations!

Definition at line 817 of file SequenceStatistics.cpp.

◆ fuLiDStar()

double SequenceStatistics::fuLiDStar ( const PolymorphismSequenceContainer group)
throw (ZeroDivisionException

Return the Fu and Li D* test (Fu & Li 1993, Genetics, 133 pp693-709).

groupa PolymorphismSequenceContainer
Sylvain Gaillard

Definition at line 835 of file SequenceStatistics.cpp.

◆ fuLiF()

double SequenceStatistics::fuLiF ( const PolymorphismSequenceContainer ingroup,
const PolymorphismSequenceContainer outgroup,
bool  original = true 
throw (ZeroDivisionException

Return the Fu and Li F test (Fu & Li 1993, Genetics, 133 pp693-709).

ingroupa PolymorphismSequenceContainer
outgroupa PolymorphismSequenceContainer
originaltrue: use the Fu & Li methode, false: use mutations in external branch.
Sylvain Gaillard
Khalid Belkhir

If one set original=false then the number of mutations will be used. If the outgroup contains more than one sequence the sites with more than one variant will not be considered for external branch mutations!

Definition at line 857 of file SequenceStatistics.cpp.

◆ fuLiFStar()

double SequenceStatistics::fuLiFStar ( const PolymorphismSequenceContainer group)
throw (ZeroDivisionException

Return the Fu and Li F* test (Fu & Li 1993, Genetics, 133 pp693-709).

groupa PolymorphismSequenceContainer
Sylvain Gaillard

Definition at line 876 of file SequenceStatistics.cpp.

◆ gcContent()

double SequenceStatistics::gcContent ( const PolymorphismSequenceContainer psc)

Compute the mean GC content in an alignment.

psca PolymorphismSequenceContainer

Definition at line 227 of file SequenceStatistics.cpp.

◆ gcPolymorphism()

std::vector< unsigned int > SequenceStatistics::gcPolymorphism ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Return the number of GC alleles and the total number of alleles at polymorphic sites only.

G vs C and A vs T polymorphism are not taken into account

SG 15/03/2010: The code of this method is not clear. See implementation for more details.
psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account
A std::vector of size 2 containing the number of GC alleles and the total number of alleles.

Definition at line 235 of file SequenceStatistics.cpp.

◆ generateLdContainer()

PolymorphismSequenceContainer * SequenceStatistics::generateLdContainer ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 

generate a special PolymorphismSequenceContainer for linkage disequilbrium analysis

Create a PolymorphismSequenceContainer with only polymorphic site : The value 1 is assigned to the most frequent allele, and 0 to the least frequent. This psc is needed to compute Linkage Disequilibrium Statistics. Should be used before excluding gaps, but sites with gaps are not counted as polymorphic sites. Singleton can be excluded. Polymorphic site with the lowest frequency < threshold can be excluded. Only polymorphic sites with 2 alleles are kept.

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
Sylvain Glémin

Definition at line 941 of file SequenceStatistics.cpp.

◆ getNumberOfDerivedSingletons_()

unsigned int SequenceStatistics::getNumberOfDerivedSingletons_ ( const Site &  site_in,
const Site &  site_out 

Count the number of singleton for a site.

will count singletons that are not in site_out (a site in outgroup) site_in is a site from an ingroup

Khalid Belkhir

Definition at line 1562 of file SequenceStatistics.cpp.

◆ getNumberOfMutations_()

unsigned int SequenceStatistics::getNumberOfMutations_ ( const Site &  site)

Count the number of mutation for a site.

Definition at line 1533 of file SequenceStatistics.cpp.

◆ getNumberOfSingletons_()

unsigned int SequenceStatistics::getNumberOfSingletons_ ( const Site &  site)

Count the number of singleton for a site.

Definition at line 1549 of file SequenceStatistics.cpp.

◆ getUD_()

double SequenceStatistics::getUD_ ( double  a1,
double  vD 

Get the uD value of equation (32) in Fu & Li 1993, Genetics, 133 pp693-709)

a1as describe in getUsefulValues
vDas provided by getVD_
the uD value as double
Sylvain Gaillard

Definition at line 1640 of file SequenceStatistics.cpp.

◆ getUDstar_()

double SequenceStatistics::getUDstar_ ( size_t  n,
double  a1,
double  vDs 

Get the uD* value of D* equation in Fu & Li 1993, Genetics, 133 pp693-709)

nthe number of observed sequences
a1as describe in getUsefulValues
vDsas provided by getVDstar_
the uD* value as double
Sylvain Gaillard

Definition at line 1673 of file SequenceStatistics.cpp.

◆ getUsefulValues_()

std::map< std::string, double > SequenceStatistics::getUsefulValues_ ( size_t  n)

Get useful values for theta estimators.

nthe number of observed sequences
A map with 11 values. Keys are a1, a2, a1n, b1, b2, c1, c2, cn, dn, e1 and e2. The values are :

\[ a_1=\sum_{i=1}^{n-1}\frac{1}{i} \qquad a_2=\sum_{i=1}^{n-1}\frac{1}{i^2} \]

\[ a_{1n}=\sum_{i=1}^{n}\frac{1}{i} \]

\[ b_1=\frac{n+1}{3(n-1)} \qquad b_2=\frac{2(n^2+n+3)}{9n(n-1)} \]

\[ c_1=b_1-\frac{1}{a_1} \qquad c_2=b_2-\frac{n+2}{a_1n}+\frac{a_2}{a_1^2} \]

\[ c_n=2\frac{na_1-2(n-1)}{(n-1)(n-2)} \]

\[ d_n=c_n+\frac{n-2}{(n-1)^2}+\frac{2}{n-1}\left(\frac{3}{2}-\frac{2a_{1n}-3}{n-2}-\frac{1}{n}\right) \]

\[ e_1=\frac{c_1}{a_1} \qquad e_2=\frac{c_2}{a_1^2+a_2} \]

where $n$ is the number of observed sequences.
Sylvain Gaillard

Definition at line 1584 of file SequenceStatistics.cpp.

◆ getVD_()

double SequenceStatistics::getVD_ ( size_t  n,
double  a1,
double  a2,
double  cn 

Get the vD value of equation (32) in Fu & Li 1993, Genetics, 133 pp693-709)

nthe number of observed sequences
a1as describe in getUsefulValues
a2as describe in getUsefulValues
cnas describe in getUsefulValues
the vD value as double
Sylvain Gaillard

Definition at line 1631 of file SequenceStatistics.cpp.

◆ getVDstar_()

double SequenceStatistics::getVDstar_ ( size_t  n,
double  a1,
double  a2,
double  dn 

Get the vD* value of D* equation in Fu & Li 1993, Genetics, 133 pp693-709)

nthe number of observed sequences
a1as describe in getUsefulValues
a2as describe in getUsefulValues
dnas describe in getUsefulValues
the vD* value as double
Sylvain Gaillard

Definition at line 1645 of file SequenceStatistics.cpp.

◆ heterozygosity()

double SequenceStatistics::heterozygosity ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Compute the sum of per site heterozygosity in an alignment.

psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account

Definition at line 188 of file SequenceStatistics.cpp.

◆ hudson87()

double SequenceStatistics::hudson87 ( const PolymorphismSequenceContainer psc,
double  precision = 0.000001,
double  cinf = 0.001,
double  csup = 10000. 

give estimate of C=4Nr using Hudson method (Hudson 1987, Genet. Res., 50 pp245-250)

psca PolymorphismSequenceContainer
precisiondefault value = 0.000001
cinfinitial value, by default cinf=0.001
csupinitial value, by default csup = 10000
Sylvain Glémin

Definition at line 1475 of file SequenceStatistics.cpp.

◆ inverseRegressionR2()

double SequenceStatistics::inverseRegressionR2 ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope of the regression R² = 1/(1+a*distance)

To fit the theoretical prediction R² = 1/(1+4Nr) The slope is given in R² per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1451 of file SequenceStatistics.cpp.

◆ leftHandHudson_()

double SequenceStatistics::leftHandHudson_ ( const PolymorphismSequenceContainer psc)

give the left hand term of equation (4) in Hudson (Hudson 1987, Genet. Res., 50 pp245-250) This term is used in hudson87

psca PolymorphismSequenceContainer

Definition at line 1688 of file SequenceStatistics.cpp.

◆ linearRegressionD()

Vdouble SequenceStatistics::linearRegressionD ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope and the origin of the regression |D| = a*distance+b

The slope is given in |D| per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1388 of file SequenceStatistics.cpp.

◆ linearRegressionDprime()

Vdouble SequenceStatistics::linearRegressionDprime ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope and the origin of the regression |D'| = a*distance+b

The slope is given in |D'| per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1409 of file SequenceStatistics.cpp.

◆ linearRegressionR2()

Vdouble SequenceStatistics::linearRegressionR2 ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope and the origin of the regression R² = a*distance+b

The slope is given in R² per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1430 of file SequenceStatistics.cpp.

◆ meanD()

double SequenceStatistics::meanD ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give mean D over all pairwise comparisons

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1272 of file SequenceStatistics.cpp.

◆ meanDistance1()

double SequenceStatistics::meanDistance1 ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give mean pairwise distances between sites / method 1: differences between sequences are not taken into account

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites is lower than 2
Sylvain Glémin

Definition at line 1304 of file SequenceStatistics.cpp.

◆ meanDistance2()

double SequenceStatistics::meanDistance2 ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give mean pairwise distances between sites / method 2: differences between sequences are taken into account

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites is lower than 2
Sylvain Glémin

Definition at line 1317 of file SequenceStatistics.cpp.

◆ meanDprime()

double SequenceStatistics::meanDprime ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give mean D' over all pairwise comparisons

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1278 of file SequenceStatistics.cpp.

◆ meanNumberOfNonSynonymousSites()

double SequenceStatistics::meanNumberOfNonSynonymousSites ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc,
double  ratio = 1. 

compute the mean number of non-synonymous site in an alignment

A site is x% synonymous if x% of possible mutations are synonymous The transition/transversion can be taken into account (use the variable ratio). Gaps are automatically excluded

psca PolymorphismSequenceContainer
gca GeneticCode
ratioa double
Éric Bazin

Definition at line 685 of file SequenceStatistics.cpp.

◆ meanNumberOfSynonymousSites()

double SequenceStatistics::meanNumberOfSynonymousSites ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc,
double  ratio = 1. 

compute the mean number of synonymous site in an alignment

A site is x% synonymous if x% of possible mutations are synonymous. The transition/transversion can be taken into account (use the variable ratio). Gaps and unresolved sites are automatically excluded.

psca PolymorphismSequenceContainer
gca GeneticCode
ratioa double
Sylvain Glémin
Éric Bazin

Definition at line 671 of file SequenceStatistics.cpp.

◆ meanR2()

double SequenceStatistics::meanR2 ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give mean R² over all pairwise comparisons

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1291 of file SequenceStatistics.cpp.

◆ mkTable()

vector< unsigned int > SequenceStatistics::mkTable ( const PolymorphismSequenceContainer ingroup,
const PolymorphismSequenceContainer outgroup,
const GeneticCode &  gc,
double  freqmin = 0. 

return a vector containing Pa, Ps, Da, Ds

Gaps and unresolved sites are automatically excluded

ingroupa PolymorphismSequenceContainer
outgroupa PolymorphismSequenceContainer
gca GeneticCode
freqmina double, to exclude snp in frequency strictly lower than freqmin
Sylvain Glémin

Definition at line 753 of file SequenceStatistics.cpp.

References bpp::PolymorphismSequenceContainer::addSequence(), and bpp::PolymorphismSequenceContainer::setAsOutgroupMember().

◆ neutralityIndex()

double SequenceStatistics::neutralityIndex ( const PolymorphismSequenceContainer ingroup,
const PolymorphismSequenceContainer outgroup,
const GeneticCode &  gc,
double  freqmin = 0. 

return the neutrality index NI = (Pa/Ps)/(Da/Ds) (Rand & Kann 1996, Mol. Biol. Evol. 13 pp735-748)

Return -1 if Ps or Da are zero Gaps and unresolved sites are automatically excluded

ingroupa PolymorphismSequenceContainer
outgroupa PolymorphismSequenceContainer
gca GeneticCode
freqmina double, to exclude snp in frequency strictly lower than freqmin
Sylvain Glémin

Definition at line 778 of file SequenceStatistics.cpp.

◆ numberOfMonoSitePolymorphicCodons()

unsigned int SequenceStatistics::numberOfMonoSitePolymorphicCodons ( const PolymorphismSequenceContainer psc,
bool  stopflag = true,
bool  gapflag = true 

Compute the number of polymorphic codon with only one mutated site.

psca PolymorphismSequenceContainer
stopflaga boolean set by default to true if you don't want to take stop codon neither undefined sites into account
gapflaga boolean set by default to true if you don't want to take gaps into account
Sylvain Glémin

Definition at line 586 of file SequenceStatistics.cpp.

◆ numberOfNonSynonymousSubstitutions()

unsigned int SequenceStatistics::numberOfNonSynonymousSubstitutions ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc,
double  freqmin = 0. 

compute the number of non synonymous subsitutions in an alignment

Gaps and unresolved sites are automatically excluded

In case of complex codon, the path that gives the minimum number of non-synonymous changes is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.

psca PolymorphismSequenceContainer
gca GeneticCode
freqmina double, to exclude snp in frequency strictly lower than freqmin

Definition at line 715 of file SequenceStatistics.cpp.

◆ numberOfParsimonyInformativeSites()

unsigned int SequenceStatistics::numberOfParsimonyInformativeSites ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Compute the number of parsimony informative sites in an alignment.

psca PolymorphicSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account

Definition at line 92 of file SequenceStatistics.cpp.

◆ numberOfPolymorphicSites()

unsigned int SequenceStatistics::numberOfPolymorphicSites ( const PolymorphismSequenceContainer psc,
bool  gapflag = true,
bool  ignoreUnknown = true 

Compute the number of polymorphic site in an alignment.

The number of polymorphic site is also known as the number of segregating site $S$.

Gaps are consider as mutations so if you want number of polymorphic site without gap, set the gapflag parameter to true.

psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account
ignoreUnknowna boolean set by default to true to ignore unknown states

Definition at line 72 of file SequenceStatistics.cpp.

◆ numberOfSingletons()

unsigned int SequenceStatistics::numberOfSingletons ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Count the number of singleton nucleotides in an alignment.

psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account
Sylvain Gaillard

Definition at line 112 of file SequenceStatistics.cpp.

◆ numberOfSitesWithStopCodon()

unsigned int SequenceStatistics::numberOfSitesWithStopCodon ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gCode,
bool  gapflag = true 

Compute the number of codon sites with stop codon.

psca PolymorphismSequenceContainer
gCodethe genetic code to use
gapflaga boolean set by default to true if you don't want to take gaps into account
Sylvain Glémin

Definition at line 561 of file SequenceStatistics.cpp.

◆ numberOfSynonymousPolymorphicCodons()

unsigned int SequenceStatistics::numberOfSynonymousPolymorphicCodons ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc 

Compute the number of synonymous polymorphic codon sites.

Gaps and unresolved sites are automatically excluded

psca PolymorphismSequenceContainer
gca GeneticCode
Sylvain Glémin
Éric Bazin

Definition at line 609 of file SequenceStatistics.cpp.

◆ numberOfSynonymousSubstitutions()

unsigned int SequenceStatistics::numberOfSynonymousSubstitutions ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc,
double  freqmin = 0. 

compute the number of synonymous subsitutions in an alignment

Gaps and unresolved sites are automatically excluded

In case of complex codon, the path that gives the minimum number of non-synonymous changes is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.

psca PolymorphismSequenceContainer
gca GeneticCode
freqmina double, to exclude snp in frequency strictly lower than freqmin

Definition at line 701 of file SequenceStatistics.cpp.

◆ numberOfTransitions()

unsigned int SequenceStatistics::numberOfTransitions ( const PolymorphismSequenceContainer psc)

Return the number of transitions.

psca PolymorphismSequenceContainer
Éric Bazin

Definition at line 459 of file SequenceStatistics.cpp.

◆ numberOfTransversions()

unsigned int SequenceStatistics::numberOfTransversions ( const PolymorphismSequenceContainer psc)

Return the number of transversions.

psca PolymorphismSequenceContainer
Éric Bazin

Definition at line 490 of file SequenceStatistics.cpp.

◆ numberOfTriplets()

unsigned int SequenceStatistics::numberOfTriplets ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Compute the number of triplet in an alignment.

psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account
Sylvain Glémin

Definition at line 129 of file SequenceStatistics.cpp.

◆ originRegressionD()

double SequenceStatistics::originRegressionD ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope of the regression |D| = 1+a*distance

The slope is given in |D| per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1334 of file SequenceStatistics.cpp.

◆ originRegressionDprime()

double SequenceStatistics::originRegressionDprime ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope of the regression |D'| = 1+a*distance

The slope is given in |D'| per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1352 of file SequenceStatistics.cpp.

◆ originRegressionR2()

double SequenceStatistics::originRegressionR2 ( const PolymorphismSequenceContainer psc,
bool  distance1 = false,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the slope of the regression R² = 1+a*distance

The slope is given in R² per kb

psca PolymorphismSequenceContainer
distance1a boolean (true to use distance1, false to use distance2, false by default)
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1370 of file SequenceStatistics.cpp.

◆ pairwiseD()

Vdouble SequenceStatistics::pairwiseD ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the vector of all mean pairwise D value between two sites (Lewontin & Kojima 1964, Evolution 14 pp458-472)

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1145 of file SequenceStatistics.cpp.

◆ pairwiseDistances1()

Vdouble SequenceStatistics::pairwiseDistances1 ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the vector of the pairwise distances between site positions corresponding to a LD SequencePolymorphismContainer

Assume that all sequences have the same length

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites is lower than 2
Sylvain Glémin

Definition at line 1011 of file SequenceStatistics.cpp.

◆ pairwiseDistances2()

Vdouble SequenceStatistics::pairwiseDistances2 ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the vector of all mean pairwise distance between two sites to a LD SequencePolymorphismContainer

pairwise distances are computed for each sequence separately, excluding gaps. Then the mean is taken over all the sequences.

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites is lower than 2
Sylvain Glémin

Definition at line 1067 of file SequenceStatistics.cpp.

◆ pairwiseDprime()

Vdouble SequenceStatistics::pairwiseDprime ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the vector of all mean pairwise D' value between two sites (Lewontin 1964, Genetics 49 pp49-67))

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1178 of file SequenceStatistics.cpp.

◆ pairwiseR2()

Vdouble SequenceStatistics::pairwiseR2 ( const PolymorphismSequenceContainer psc,
bool  keepsingleton = true,
double  freqmin = 0. 
throw (DimensionException

give the vector of all mean pairwise R² value between two sites (Hill & Robertson 1968, Theor. Appl. Genet., 38 pp226-231)

psca PolymorphismSequenceContainer
keepsingletona boolean (true by default, false to exclude singleton)
freqmina float (to exlude site with the lowest allele frequency less than the threshold given by freqmin, 0 by default)
DimensionExceptionif the number of sites or the number of sequences is lower than 2
Sylvain Glémin

Definition at line 1234 of file SequenceStatistics.cpp.

◆ piNonSynonymous()

double SequenceStatistics::piNonSynonymous ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc,
bool  minchange = false 

Compute the non-synonymous nucleotide diversity, pi.

Gaps and unresolved sites are automatically excluded If minchange = false (default option) the different paths are equally weighted. If minchange = true the path with the minimum number of non-synonymous change is chosen.

psca PolymorphismSequenceContainer
gca GeneticCode
minchangea boolean set by default to false
Sylvain Glémin
Éric Bazin

Definition at line 657 of file SequenceStatistics.cpp.

◆ piSynonymous()

double SequenceStatistics::piSynonymous ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc,
bool  minchange = false 

Compute the synonymous nucleotide diversity, pi.

Gaps and unresolved sites are automatically excluded If minchange = false (default option) the different paths are equally weighted. If minchange = true the path with the minimum number of non-synonymous change is chosen.

psca PolymorphismSequenceContainer
gca GeneticCode
minchangea boolean set to false
Sylvain Glémin
Éric Bazin

Definition at line 643 of file SequenceStatistics.cpp.

◆ ratioOfTransitionsTransversions()

double SequenceStatistics::ratioOfTransitionsTransversions ( const PolymorphismSequenceContainer psc)
throw (Exception

Return the ratio of transitions/transversions.

psca PolymorphismSequenceContainer
Éric Bazin

Definition at line 521 of file SequenceStatistics.cpp.

◆ rightHandHudson_()

double SequenceStatistics::rightHandHudson_ ( double  c,
size_t  n 

give the right hand term of equation (4) in Hudson (Hudson 1987, Genet. Res., 50 pp245-250) This term is used in hudson87

Definition at line 1714 of file SequenceStatistics.cpp.

◆ squaredHeterozygosity()

double SequenceStatistics::squaredHeterozygosity ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Compute the sum of per site squared heterozygosity in an alignment.

psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account

Definition at line 205 of file SequenceStatistics.cpp.

◆ tajima83()

double SequenceStatistics::tajima83 ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Compute diversity estimator Theta of Tajima (1983, Genetics, 105 pp437-460)

\[ \hat{\theta}_\pi = \sum_{i=1}^{S} \left(1-\sum_{j=1}^{4} \frac{k_{j,i}\times\left(k_{j,i}-1\right)} {n_i\times\left(n_i-1\right)}\right) \qquad \textrm{with }k_{j,i}>0 \]

where $k_{j,i}$ is the count of the jth state at the ith site, $n_i$ the number of nucleotides and $S$ the number of polymorphic sites.

psca PolymorphismSequenceContainer
gapflagflag set by default to true if you don't want to take gap into account
Sylvain Gaillard

Definition at line 286 of file SequenceStatistics.cpp.

◆ tajimaDss()

double SequenceStatistics::tajimaDss ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 
throw (ZeroDivisionException

Return the Tajima's D test (Tajima 1989, Genetics 123 pp 585-595).

Calculation using the number of polymorphic (segregating) sites.

\[ D=\frac{\hat{\theta}_\pi-\hat{\theta}_S}{\sqrt{\textrm{V}\left(\hat{\theta}_\pi-\hat{\theta}_S\right)}} =\frac{\hat{\theta}_\pi-\hat{\theta}_S}{\sqrt{e_1S+e_2S(S-1)}} \]

psca PolymorphismSequenceContainer
gapflagflag set by default to true if you don't want to take gap into account
ZeroDivisionExceptionif S == 0
Sylvain Gaillard

Definition at line 791 of file SequenceStatistics.cpp.

◆ tajimaDtnm()

double SequenceStatistics::tajimaDtnm ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 
throw (ZeroDivisionException

Return the Tajima's D test (Tajima 1989, Genetics 123 pp 585-595).

Calculation using the total number of mutation.

\[ D=\frac{\hat{\theta}_\pi-\frac{\eta}{a_1}}{\sqrt{e_1\eta+e_2\eta(\eta-1)}} \]

psca PolymorphismSequenceContainer
gapflagflag set by default to true if you don't want to take gap into account
ZeroDivisionExceptionif eta == 0
Sylvain Gaillard

Definition at line 804 of file SequenceStatistics.cpp.

◆ testUsefulValues()

void SequenceStatistics::testUsefulValues ( std::ostream &  s,
size_t  n 

Test useful values.

sa ostream where write the values
nthen number of observed sequences
Sylvain Gaillard

Definition at line 1503 of file SequenceStatistics.cpp.

◆ totalNumberOfMutations()

unsigned int SequenceStatistics::totalNumberOfMutations ( const PolymorphismSequenceContainer psc,
bool  gapflag = true 

Count the total number of mutations in an alignment.

This count is assumed to be under an infinite site model.

psca PolymorphismSequenceContainer
gapflaga boolean set by default to true if you don't want to take gap into account
Sylvain Gaillard

Definition at line 149 of file SequenceStatistics.cpp.

◆ totalNumberOfMutationsOnExternalBranches()

unsigned int SequenceStatistics::totalNumberOfMutationsOnExternalBranches ( const PolymorphismSequenceContainer ing,
const PolymorphismSequenceContainer outg 
throw (Exception

Count the total number of mutations in external branchs.

This is counted as the number of distinct singleton nucleotide in the ingroup that are not shared with the outgroup. A site is ignored if it contains more than one variant in the outgroup. A site is ignored if it contains unresolved variants or gaps.

inga PolymorphismSequenceContainer the ingroup alignement
outga PolymorphismSequenceContainer the outgroup alignement
Exceptionif ing and outg are not of the same size (site number)
Khalid Belkhir

Definition at line 166 of file SequenceStatistics.cpp.

◆ watterson75()

double SequenceStatistics::watterson75 ( const PolymorphismSequenceContainer psc,
bool  gapflag = true,
bool  ignoreUnknown = true 

Compute diversity estimator Theta of Watterson (1975, Theor Popul Biol, 7 pp256-276)

\[ \hat{\theta}_S=\frac{S}{a_1} \]

where $S$ is the number of polymorphic sites and $a_1$ is describe in SequenceStatistics::getUsefulValues_().

psca PolymorphismSequenceContainer
gapflagflag set by default to true if you don't want to take gap into account
ignoreUnknowna boolean set by default to true to ignore unknown states
Sylvain Gaillard

Definition at line 276 of file SequenceStatistics.cpp.

◆ watterson75NonSynonymous()

double SequenceStatistics::watterson75NonSynonymous ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc 

Compute the Watterson(1975, Theor Popul Biol, 7 pp256-276) estimator for non synonymous positions.

Gaps and unresolved sites are automatically excluded

In case of complex codon, the path that gives the minimum number of non-synonymous changes is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.

psca PolymorphismSequenceContainer
gca GeneticCode
Sylvain Glémin

Definition at line 633 of file SequenceStatistics.cpp.

◆ watterson75Synonymous()

double SequenceStatistics::watterson75Synonymous ( const PolymorphismSequenceContainer psc,
const GeneticCode &  gc 

Compute the Watterson(1975,Theor Popul Biol, 7 pp256-276) estimator for synonymous positions.

Gaps and unresolved sites are automatically excluded

In case of complex codon, the path that gives the minimum number of non-synonymous changes* is chosen. The argument minchange=true is sent to numberOfSynonymousDifferences used in this method. Otherwise, a non-integer number could be return.

psca PolymorphismSequenceContainer
gca GeneticCode
Sylvain Glémin

Definition at line 623 of file SequenceStatistics.cpp.

The documentation for this class was generated from the following files: