bpp-seq  2.1.0
bpp::SiteTools Class Reference

Utilitary methods dealing with sites. More...

#include <Bpp/Seq/SiteTools.h>

+ Inheritance diagram for bpp::SiteTools:
+ Collaboration diagram for bpp::SiteTools:

List of all members.

Public Member Functions

 SiteTools ()
virtual ~SiteTools ()

Static Public Member Functions

static bool hasGap (const Site &site)
static bool isGapOnly (const Site &site)
static bool isGapOrUnresolvedOnly (const Site &site)
static bool hasUnknown (const Site &site)
static bool hasStopCodon (const Site &site)
static bool isComplete (const Site &site)
static bool isConstant (const Site &site, bool ignoreUnknown=false, bool unresolvedRaisesException=true) throw (EmptySiteException)
 Tell if a site is constant, that is displaying the same state in all sequences that do not present a gap.
static bool areSitesIdentical (const Site &site1, const Site &site2)
static double variabilityShannon (const Site &site, bool resolveUnknowns) throw (EmptySiteException)
 Compute the Shannon entropy index of a site.
static double variabilityFactorial (const Site &site) throw (EmptySiteException)
 Compute the factorial diversity index of a site.
static double mutualInformation (const Site &site1, const Site &site2, bool resolveUnknowns) throw (DimensionException,EmptySiteException)
 Compute the mutual information between two sites.
static double entropy (const Site &site, bool resolveUnknowns) throw (EmptySiteException)
 Compute the entropy of a site. This is an alias of method variabilityShannon.
static double jointEntropy (const Site &site1, const Site &site2, bool resolveUnknowns) throw (DimensionException,EmptySiteException)
 Compute the joint entropy between two sites.
static double heterozygosity (const Site &site) throw (EmptySiteException)
 Compute the heterozygosity index of a site.
static size_t getNumberOfDistinctCharacters (const Site &site) throw (EmptySiteException)
 Give the number of distinct characters at a site.
static bool hasSingleton (const Site &site) throw (EmptySiteException)
 Tell if a site has singletons.
static bool isParsimonyInformativeSite (const Site &site) throw (EmptySiteException)
 Tell if a site is a parsimony informative site.
static bool isTriplet (const Site &site) throw (EmptySiteException)
 Tell if a site has more than 2 distinct characters.
static void getCounts (const SymbolList &list, std::map< int, size_t > &counts)
 Count all states in the list.
static void getCounts (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, size_t > > &counts) throw (DimensionException)
 Count all pair of states for two lists of the same size.
static void getCounts (const SymbolList &list, std::map< int, double > &counts, bool resolveUnknowns)
 Count all states in the list, optionaly resolving unknown characters.
static void getCounts (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, double > > &counts, bool resolveUnknowns) throw (DimensionException)
 Count all pair of states for two lists of the same size, optionaly resolving unknown characters.
static void getFrequencies (const SymbolList &list, std::map< int, double > &frequencies, bool resolveUnknowns=false)
 Get all states frequencies in the list.
static void getFrequencies (const SymbolList &list1, const SymbolList &list2, std::map< int, std::map< int, double > > &frequencies, bool resolveUnknowns=false) throw (DimensionException)
 Get all state pairs frequencies for two lists of the same size..
static double getGCContent (const SymbolList &list, bool ignoreUnresolved=true, bool ignoreGap=true) throw (AlphabetException)
 Get the GC content of a symbol list.
static size_t getNumberOfDistinctPositions (const SymbolList &l1, const SymbolList &l2) throw (AlphabetMismatchException)
 Get the number of distinct positions.
static size_t getNumberOfPositionsWithoutGap (const SymbolList &l1, const SymbolList &l2) throw (AlphabetMismatchException)
 Get the number of positions without gap.
static void changeGapsToUnknownCharacters (SymbolList &l)
 Change all gap elements to unknown characters.
static void changeUnresolvedCharactersToGaps (SymbolList &l)
 Change all unknown characters to gap elements.

Detailed Description

Utilitary methods dealing with sites.

Definition at line 57 of file SiteTools.h.


Constructor & Destructor Documentation

Definition at line 61 of file SiteTools.h.

virtual bpp::SiteTools::~SiteTools ( ) [inline, virtual]

Definition at line 62 of file SiteTools.h.


Member Function Documentation

bool SiteTools::areSitesIdentical ( const Site site1,
const Site site2 
) [static]
Parameters:
site1The first site.
site2The second site.
Returns:
True if the two states have the same content (and, of course, alphabet).

Definition at line 137 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::getAlphabetType(), and bpp::BasicSymbolList::size().

Change all gap elements to unknown characters.

Parameters:
lThe input list of characters.

Definition at line 180 of file SymbolListTools.cpp.

References bpp::SymbolList::getAlphabet(), bpp::Alphabet::getUnknownCharacterCode(), bpp::Alphabet::isGap(), and bpp::SymbolList::size().

Change all unknown characters to gap elements.

Parameters:
lThe input list of characters.

Definition at line 189 of file SymbolListTools.cpp.

References bpp::SymbolList::getAlphabet(), bpp::Alphabet::getGapCharacterCode(), bpp::Alphabet::isUnresolved(), and bpp::SymbolList::size().

static double bpp::SiteTools::entropy ( const Site site,
bool  resolveUnknowns 
) throw (EmptySiteException) [inline, static]

Compute the entropy of a site. This is an alias of method variabilityShannon.

\[ I = - \sum_x f_x\cdot \ln(f_x) \]

where $f_x$ is the frequency of state $x$.

Author:
J. Dutheil
Parameters:
siteA site.
resolveUnknownsTell is unknown characters must be resolved.
Returns:
The Shannon entropy index of this site.
Exceptions:
EmptySiteExceptionIf the site has size 0.

Definition at line 183 of file SiteTools.h.

References variabilityShannon().

static void bpp::SymbolListTools::getCounts ( const SymbolList list,
std::map< int, size_t > &  counts 
) [inline, static, inherited]

Count all states in the list.

Author:
J. Dutheil
Parameters:
listThe list.
countsThe output map to store the counts (existing counts will be incremented).

Definition at line 70 of file SymbolListTools.h.

References bpp::SymbolList::getContent().

Referenced by getNumberOfDistinctCharacters(), bpp::SequenceApplicationTools::getSitesToAnalyse(), isParsimonyInformativeSite(), and bpp::CodonSiteTools::numberOfNonSynonymousSubstitutions().

static void bpp::SymbolListTools::getCounts ( const SymbolList list1,
const SymbolList list2,
std::map< int, std::map< int, size_t > > &  counts 
) throw (DimensionException) [inline, static, inherited]

Count all pair of states for two lists of the same size.

NB: The two lists do node need to share the same alphabet! The states of the first list will be used as the first index in the output, and the ones from the second list as the second index.

Author:
J. Dutheil
Parameters:
list1The first list.
list2The second list.
countsThe output map to store the counts (existing counts will be incremented).

Definition at line 90 of file SymbolListTools.h.

References bpp::SymbolList::size().

void SymbolListTools::getCounts ( const SymbolList list,
std::map< int, double > &  counts,
bool  resolveUnknowns 
) [static, inherited]

Count all states in the list, optionaly resolving unknown characters.

For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.

Author:
J. Dutheil
Parameters:
listThe list.
countsThe output map to store the counts (existing ocunts will be incremented).
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
Returns:
A map with all states and corresponding counts.

Definition at line 51 of file SymbolListTools.cpp.

References bpp::Alphabet::getAlias(), bpp::SymbolList::getAlphabet(), and bpp::SymbolList::getContent().

void SymbolListTools::getCounts ( const SymbolList list1,
const SymbolList list2,
std::map< int, std::map< int, double > > &  counts,
bool  resolveUnknowns 
) throw (DimensionException) [static, inherited]

Count all pair of states for two lists of the same size, optionaly resolving unknown characters.

For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.

NB: The two lists do node need to share the same alphabet! The states of the first list will be used as the first index in the output, and the ones from the second list as the second index.

Author:
J. Dutheil
Parameters:
list1The first list.
list2The second list.
countsThe output map to store the counts (existing ocunts will be incremented).
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
Returns:
A map with all states and corresponding counts.

Definition at line 73 of file SymbolListTools.cpp.

void SymbolListTools::getFrequencies ( const SymbolList list,
std::map< int, double > &  frequencies,
bool  resolveUnknowns = false 
) [static, inherited]

Get all states frequencies in the list.

Author:
J. Dutheil
Parameters:
listThe list.
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
frequenciesThe output map with all states and corresponding frequencies. Existing frequencies will be erased if any.

Definition at line 96 of file SymbolListTools.cpp.

References bpp::SymbolList::size().

Referenced by bpp::CodonSiteTools::generateCodonSiteWithoutRareVariant(), bpp::SiteContainerTools::getConsensus(), bpp::SequenceApplicationTools::getSitesToAnalyse(), bpp::CodonSiteTools::meanNumberOfSynonymousPositions(), bpp::CodonSiteTools::piNonSynonymous(), bpp::CodonSiteTools::piSynonymous(), and bpp::SiteContainerTools::removeGapSites().

void SymbolListTools::getFrequencies ( const SymbolList list1,
const SymbolList list2,
std::map< int, std::map< int, double > > &  frequencies,
bool  resolveUnknowns = false 
) throw (DimensionException) [static, inherited]

Get all state pairs frequencies for two lists of the same size..

Author:
J. Dutheil
Parameters:
list1The first list.
list2The second list.
resolveUnknownsTell is unknown characters must be resolved. For instance, in DNA, N will be counted as A=1/4,T=1/4,C=1/4,G=1/4.
frequenciesThe output map with all state pairs and corresponding frequencies. Existing frequencies will be erased if any.

Definition at line 107 of file SymbolListTools.cpp.

double SymbolListTools::getGCContent ( const SymbolList list,
bool  ignoreUnresolved = true,
bool  ignoreGap = true 
) throw (AlphabetException) [static, inherited]

Get the GC content of a symbol list.

Parameters:
listThe list.
Returns:
The proportion of G and C states in the list.
Parameters:
ignoreUnresolvedDo not count unresolved states. Otherwise, weight by each state probability in case of ambiguity (e.g. the R state counts for 0.5).
ignoreGapDo not count gaps in total.
Exceptions:
AlphabetExceptionIf the list is not made of nucleotide states.

Definition at line 119 of file SymbolListTools.cpp.

size_t SiteTools::getNumberOfDistinctCharacters ( const Site site) throw (EmptySiteException) [static]

Give the number of distinct characters at a site.

Parameters:
sitea Site
Returns:
The number of distinct characters in the given site.

Definition at line 349 of file SiteTools.cpp.

References bpp::SymbolListTools::getCounts(), and isConstant().

Referenced by isTriplet(), and bpp::CodonSiteTools::numberOfSubsitutions().

size_t SymbolListTools::getNumberOfDistinctPositions ( const SymbolList l1,
const SymbolList l2 
) throw (AlphabetMismatchException) [static, inherited]

Get the number of distinct positions.

The comparison in achieved from position 0 to the minimum size of the two vectors.

Parameters:
l1SymbolList 1.
l2SymbolList 2.
Returns:
The number of distinct positions.
Exceptions:
AlphabetMismatchExceptionif the two lists have not the same alphabet type.

Definition at line 158 of file SymbolListTools.cpp.

size_t SymbolListTools::getNumberOfPositionsWithoutGap ( const SymbolList l1,
const SymbolList l2 
) throw (AlphabetMismatchException) [static, inherited]

Get the number of positions without gap.

The comparison in achieved from position 0 to the minimum size of the two vectors.

Parameters:
l1SymbolList 1.
l2SymbolList 2.
Returns:
The number of positions without gap.
Exceptions:
AlphabetMismatchExceptionif the two lists have not the same alphabet type.

Definition at line 169 of file SymbolListTools.cpp.

bool SiteTools::hasSingleton ( const Site site) throw (EmptySiteException) [static]

Tell if a site has singletons.

Parameters:
sitea Site.
Returns:
True if the site has singletons.

Definition at line 370 of file SiteTools.cpp.

References isConstant().

bool SiteTools::hasStopCodon ( const Site site) [static]
Parameters:
siteA site.
Returns:
True if the site contains a Stop Codon, when the alphabet is a CodonAlphabet.

Definition at line 108 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::CodonAlphabet::isStop(), and bpp::BasicSymbolList::size().

Referenced by bpp::SiteContainerTools::removeStopCodonSites().

bool SiteTools::hasUnknown ( const Site site) [static]
Parameters:
siteA site.
Returns:
True if the site contains one or several unknwn characters.

Definition at line 95 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::getUnknownCharacterCode(), and bpp::BasicSymbolList::size().

double SiteTools::heterozygosity ( const Site site) throw (EmptySiteException) [static]

Compute the heterozygosity index of a site.

\[ H = 1 - \sum_x f_x^2 \]

where $f_x$ is the frequency of state $x$.

Parameters:
siteA site.
Returns:
The heterozygosity index of this site.
Exceptions:
EmptySiteExceptionIf the site has size 0.

Definition at line 335 of file SiteTools.cpp.

References bpp::MapTools::getValues().

bool SiteTools::isComplete ( const Site site) [static]
bool SiteTools::isConstant ( const Site site,
bool  ignoreUnknown = false,
bool  unresolvedRaisesException = true 
) throw (EmptySiteException) [static]

Tell if a site is constant, that is displaying the same state in all sequences that do not present a gap.

Parameters:
siteA site.
ignoreUnknownIf true, positions with unknown positions will be ignored. Otherwise, a site with one single state + any uncertain state will not be considered as constant.
unresolvedRaisesExceptionIn case of ambiguous case (gap only site for instance), throw an exception. Otherwise returns false.
Returns:
True if the site is made of only one state.
Exceptions:
EmptySiteExceptionIf the site has size 0 or if the site cannot be resolved (for instance is made of gaps only) and unresolvedRaisesException is set to true.

Definition at line 157 of file SiteTools.cpp.

Referenced by bpp::CodonSiteTools::fixedDifferences(), bpp::CodonSiteTools::generateCodonSiteWithoutRareVariant(), getNumberOfDistinctCharacters(), hasSingleton(), bpp::CodonSiteTools::isFourFoldDegenerated(), bpp::CodonSiteTools::isMonoSitePolymorphic(), isParsimonyInformativeSite(), bpp::CodonSiteTools::isSynonymousPolymorphic(), bpp::CodonSiteTools::numberOfNonSynonymousSubstitutions(), bpp::CodonSiteTools::numberOfSubsitutions(), bpp::CodonSiteTools::piNonSynonymous(), and bpp::CodonSiteTools::piSynonymous().

bool SiteTools::isGapOnly ( const Site site) [static]
Parameters:
siteA site.
Returns:
True if the site contains only gaps.

Definition at line 69 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::isGap(), and bpp::BasicSymbolList::size().

Referenced by bpp::SiteContainerTools::removeGapOnlySites(), and bpp::SiteContainerTools::removeGapOrUnresolvedOnlySites().

bool SiteTools::isGapOrUnresolvedOnly ( const Site site) [static]
Parameters:
siteA site.
Returns:
True if the site contains only gaps.

Definition at line 82 of file SiteTools.cpp.

References bpp::BasicSymbolList::getAlphabet(), bpp::Alphabet::isGap(), bpp::Alphabet::isUnresolved(), and bpp::BasicSymbolList::size().

Referenced by bpp::SiteContainerTools::removeGapOrUnresolvedOnlySites().

bool SiteTools::isParsimonyInformativeSite ( const Site site) throw (EmptySiteException) [static]

Tell if a site is a parsimony informative site.

At least two distinct characters must be present.

Parameters:
sitea Site.
Returns:
True if the site is parsimony informative.

Definition at line 390 of file SiteTools.cpp.

References bpp::SymbolListTools::getCounts(), and isConstant().

bool SiteTools::isTriplet ( const Site site) throw (EmptySiteException) [static]

Tell if a site has more than 2 distinct characters.

Parameters:
sitea Site.
Returns:
True if the site has more than 2 distinct characters

Definition at line 413 of file SiteTools.cpp.

References getNumberOfDistinctCharacters().

double SiteTools::jointEntropy ( const Site site1,
const Site site2,
bool  resolveUnknowns 
) throw (DimensionException,EmptySiteException) [static]

Compute the joint entropy between two sites.

\[ H_{i,j} = - \sum_x \sum_y p_{x,y}\ln\left(p_{x,y}\right) \]

where $p_{x,y}$ is the frequency of the pair $(x,y)$.

Author:
J. Dutheil
Parameters:
site1First site
site2Second site
resolveUnknownsTell is unknown characters must be resolved.
Returns:
The mutual information for the pair of sites.
Exceptions:
DimensionExceptionIf the sites do not have the same length.
EmptySiteExceptionIf the sites have size 0.

Definition at line 285 of file SiteTools.cpp.

double SiteTools::mutualInformation ( const Site site1,
const Site site2,
bool  resolveUnknowns 
) throw (DimensionException,EmptySiteException) [static]

Compute the mutual information between two sites.

\[ MI = \sum_x \sum_y p_{x,y}\ln\left(\frac{p_{x,y}}{p_x \cdot p_y}\right) \]

where $p_x$ and $p_y$ are the frequencies of states $x$ and $y$, and $p_{x,y}$ is the frequency of the pair $(x,y)$.

Author:
J. Dutheil
Parameters:
site1First site
site2Second site
resolveUnknownsTell is unknown characters must be resolved.
Returns:
The mutual information for the pair of sites.
Exceptions:
DimensionExceptionIf the sites do not have the same length.
EmptySiteExceptionIf the sites have size 0.

Definition at line 238 of file SiteTools.cpp.

double SiteTools::variabilityFactorial ( const Site site) throw (EmptySiteException) [static]

Compute the factorial diversity index of a site.

\[ F = \frac{log\left(\left(\sum_x p_x\right)!\right)}{\sum_x \log(p_x)!} \]

where $p_x$ is the number of times state $x$ is observed in the site.

Author:
J. Dutheil
Parameters:
siteA site.
Returns:
The factorial diversity index of this site.
Exceptions:
EmptySiteExceptionIf the site has size 0.

Definition at line 320 of file SiteTools.cpp.

References bpp::NumTools::fact(), bpp::VectorTools::fact(), bpp::MapTools::getValues(), and bpp::VectorTools::sum().

double SiteTools::variabilityShannon ( const Site site,
bool  resolveUnknowns 
) throw (EmptySiteException) [static]

Compute the Shannon entropy index of a site.

\[ I = - \sum_x f_x\cdot \ln(f_x) \]

where $f_x$ is the frequency of state $x$.

Author:
J. Dutheil
Parameters:
siteA site.
resolveUnknownsTell is unknown characters must be resolved.
Returns:
The Shannon entropy index of this site.
Exceptions:
EmptySiteExceptionIf the site has size 0.

Definition at line 218 of file SiteTools.cpp.

Referenced by entropy().


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Friends