List of methods

From Bio++ Wiki
Jump to: navigation, search

Here is a list of methods available in the Bio++ libraries, with appropriate class/function names, links and references!

Sequence analysis

Data structures

Method Class(es) / Function(s) Library References / Links / Notes
Simple sequence data structure BasicSequence bpp-seq
Sequence with annotations SequenceWithAnnotation bpp-seq
Sequence with quality scores SequenceWithQuality bpp-seq
Simple container of sequences VectorSequenceContainer bpp-seq Provides access and edit by id and by index
Alignment container, optimized for sequence access AlignedSequenceContainer bpp-seq Sequence access is <math>O(1)</math>, site access is <math>O(n)</math>, where n is the number of sequences.
Alignment container, optimized for site access VectorSiteContainer bpp-seq Sequence access is <math>O(l)</math>, site access is <math>O(1)</math>, where l is the number of sites.
Alignment container, optimized for memory usage CompressedVectorSiteContainer bpp-seq Same efficiency as VectorSiteContainer, yet with reduced memory footprint. Sequence edition is not possible, and meta information such as original position or any inherited attribute is lost.

Methods on sequences /sites

Method Class(es) / Function(s) Library Site or sequence? References / Links / Notes
Sequence manipulations bpp::SequenceTools bpp-seq Sequences Get a subsequence, concatenate sequences, etc.
In sillico molecular biology bpp::SequenceTools bpp-seq Sequences Complement, reverse, transcribe, reverse-transcribe, revert, etc.
Remove or replace stop codons bpp::SequenceTools bpp-seq Sequences
Remove or replace gaps and unresolved characters bpp::SequenceTools, bpp::SiteTools, bpp::SymbolListTools bpp-seq Both
Get percent identity bpp::SequenceTools::getPercentIdentity bpp-seq Sequences
Count characters, compute GC content bpp::SequenceTools, bpp::SiteTools, bpp::SymbolListTools bpp-seq Both
Tell if a site has gaps, unresolved characters, stop codon or is constant, has a singleton, etc. bpp::SiteTools bpp-seq Sites
Tell if a site is parsimony informative bpp::SiteTools::isParsimonyInformativeSite bpp-seq Sites
Compute heterozygosity bpp::SiteTools::heterozygosity bpp-seq Sites
Compute Shannon entropy and factorial variability bpp::SiteTools::variabilityShannon, bpp::SiteTools::variabilityShannon bpp-seq Sites
Compute joint entropy and mutual information bpp::SiteTools::jointEntropy, bpp::SiteTools::mutualInformation bpp-seq Sites Takes two sites as input

Methods on containers

Method Class(es) / Function(s) Library Sequences aligned? References / Links / Notes
Append containers bpp::SequenceContainerTools::append, bpp-seq no
Merge containers bpp::SequenceContainerTools::merge bpp-seq no
Convert container type / alphabet, extract specific positions / sequences bpp::SequenceContainerTools bpp-seq no Several methods available, check the API documentation for details
Compute frequencies, GC content, etc. bpp::SequenceContainerTools::getCounts bpp-seq no Several methods available, check API documentation of the class.
Filter sites according to gap and unresolved character content bpp::SiteContainerTools bpp-seq yes Several methods available, check API documentation of the class.
Get consensus sequence bpp::SiteContainerTools::getConsensus bpp-seq yes
Sample or bootstrap sites bpp::SiteContainerTools::sampleSites, bpp::SiteContainerTools::bootstrapSites bpp-seq yes
Compute similarity matrix SiteContainerTools::computeSimilarityMatrix bpp-seq yes
Compare alignments bpp::SiteContainerTools::getColumnScores<code> and <code>bpp::SiteContainerTools::getSumOfPairsScores bpp-seq yes These scores are used in the Heads-or-Tails or Guidance methods to assess alignment reliability.

File formats

Supported formats are here listed with there corresponding parser classes. It is further mentioned whether the format is implemented for reading and/or writing. The streaming option indicates that the parser also implements an iterator function, so that it is possible to loop over all sequences without storing them in memory.

Format Class Library Reading Writing Streaming References / Links / Notes
Fasta Fasta bpp-seq yes yes yes
Mase Mase bpp-seq yes yes no
Clustal Clustal bpp-seq yes yes no
Phylip sequential Phylip bpp-seq yes yes no
Phylip interleaved Phylip bpp-seq yes yes no
Phylip sequential, extended (for PAML and PhyML) Phylip bpp-seq yes yes no
Phylip interleaved, extended (for PAML and PhyML) Phylip bpp-seq yes yes no
Nexus NexusIOSequence bpp-seq yes no no
GenBank GenBank bpp-seq yes no no Only raw sequences are imported, annotations are ignored.
DCSE DCSE bpp-seq yes no no Format used by the Dedicated Comparative Sequence Editor, which could encode RNA secondary structure. Does not seem to be maintained anymore?
Stockholm Stockholm bpp-seq no yes no Contains structure information, although the current parser does not support this. This is the format used notably by PFam and RFam.
FastQ Fastq bpp-seq-omics yes yes yes Quality scores are also imported. The parser returns SequenceWithQuality objects.
Multiple Alignment Format (MAF) MafParser / OutputMafIterator bpp-seq-omics yes yes yes Streaming is performed on alignment blocks. Meta information, including quality scores are supported. Uses specific classes for alignment blocks and sequences.

Phylogenetics

Method Class(es) / Function(s) Library Reference Links
Neighbor Joining NeighborJoining bpp-phyl Saitou and Nei (1986)
BioNJ BioNJ bpp-phyl Gascuel (1997)

Substitution models

These models can be used for pairwise distance estimation, likelihood estimation, sequence simulation, ancestral sequence reconstruction, etc.


Substitution models for nucleotides

Model Class(es) / Function(s) Library Comment Reference Links
Jukes-Cantor model for nucleotides JC69 bpp-phyl Jukes & Cantor (1969), Evolution of proteins molecules, 121-123 in Mammalian protein metabolism
Kimura 1980 K80 bpp-phyl Kimura (1980)
Felsenstein 1984 F84 bpp-phyl Felsenstein (1984), Phylip version 2.6
Hasegawa, Kishino & Yano 1985 HKY85 bpp-phyl Hasegawa et al. (1985)
Tamura 92 T92 bpp-phyl for strong transition-transversion and G+C content biases Tamura (1992)
Tamura & Nei 1993 TN93 bpp-phyl Tamura & Nei (1993)
General Time-Reversible substitution model GTR bpp-phyl Yang (1994)
Lobry 1995 L95 bpp-phyl No-strand bias Lobry (1995)
Rhetsky & Nei 1995 RN95 bpp-phyl Rzhetsky and Nei (1995)
Strand symmetric reversible model SSR bpp-phyl Hobolth et al. (2007)

Substitution models for proteins

Model Class(es) / Function(s) Library Comment Reference Links
Jukes-Cantor model for proteins JC69 bpp-phyl Jukes & Cantor (1969), Evolution of proteins molecules, 121-123 in Mammalian protein metabolism
Dayhoff, Schwartz & Orcutt DSO78 bpp-phyl Kosiol & Goldman (2005)
Jones, Taylor & Thornton 1992 JTT92 bpp-phyl Jones et al. (1992)
Whelan & Goldman 2001 WAG01 bpp-phyl Whelan & Goldman (2001)
CAT model bpp-phyl See LLG08 models Lartillot & Philippe (2004)
Le et al. (2008)
Le & Gascuel 2008 LG08 bpp-phyl mixture substitution model for proteins Le et al. (2008)
EX2 model LLG08_EX2 bpp-phyl mixture model: buried/exposed sites Le et al. (2008)
EX3 model LLG08_EX3 bpp-phyl mixture model: buried/intermediate/highly exposed sites Le et al. (2008)
EH0 model LLG08_EHO bpp-phyl mixture model: helix/elongated/other sites Le et al. (2008)
UL2 model LLG08_UL2 bpp-phyl mixture of 2 models built by unsupervised method Le et al. (2008)
UL3 model LLG08_UL3 bpp-phyl mixture of 3 models, Q1, Q2, Q3 built by unsupervised method Le et al. (2008)

Substitution models for codon

Model Class(es) / Function(s) Library Comment Reference Links
Goldman & Yang 1994 GY94 bpp-phyl uses biochemical distances between residues Goldman & Yang (1994)
Muse & Gaut 1994 MG94 bpp-phyl Muse & Gaut (1994)
Yang & Nielsen 1998 YN98 bpp-phyl Yang & Nielsen (1998)
M0 model YNGKP_M1 bpp-phyl homogenous YN98 class model Yang et al. (2000)
M1 model YNGKP_M1 bpp-phyl mixture of YN98 class models Yang et al. (2000)
M2 model YNGKP_M2 bpp-phyl mixture of YN98 class models Yang et al. (2000)
M3 model YNGKP_M3 bpp-phyl mixture of YN98 class models Yang et al. (2000)
M7 model YNGKP_M7 bpp-phyl mixture of YN98 class models Yang et al. (2000)
M8 model YNGKP_M8 bpp-phyl mixture of YN98 class models Yang et al. (2000)

Covarion models

Model Class(es) / Function(s) Library Comment Reference Links
Tuffley & Steel 1998 TS98 bpp-phyl Tuffley & Steel (1998)
Galtier 2001 G2001 bpp-phyl Galtier (2001)

Miscellaneous

Model Class(es) / Function(s) Library Comment Reference Links
YpR YpR bpp-phyl Dinucleotides transition model Bérard et al. (2008)
RN95∩L95 RN95s bpp-phyl Intersection of models RN95 and L95 Lobry (1995)
Rivas-Eddy RE08 bpp-phyl substitution model with gap characters Rivas & Eddy (2008)
Custom model UserProteinSubstitutionModel bpp-phyl model customizable by user
2-states substitution model BinarySubstitutionModel bpp-phyl

Population genetics

Method Class(es) / Function(s) Library Reference Links / Notes
Alignment container for sequences with reference to groups (populations) identifiers. PolymorphismSequenceContainer bpp-popgen
Container for allelic data PolymorphismMultiGContainer bpp-popgen
Expected heterozygosity or Gene diversity SequenceStatistics / heterozygosity bpp-popgen Weir (1996)
Genetic diversity estimator <math>\theta</math> of Watterson SequenceStatistics / watterson75 bpp-popgen Watterson (1975) Also exist for synonymous and non-synonymous sites with functions : watterson75Synonymous / watterson75NonSynonymous
Mean nucleotide diversity estimator <math>\pi</math> of Tajima SequenceStatistics / tajima83 bpp-popgen Tajima (1983) Also exist for synonymous and non-synonymous sites with functions : piSynonymous / piNonSynonymous
Diversity estimator H of Fay and Wu SequenceStatistics / FayWu2000 bpp-popgen Fay and Wu (2000)
Haplotype number in the sample SequenceStatistics / DVK bpp-popgen Depaulis and Veuille (1998)
Haplotype diversity in the sample SequenceStatistics / DVH bpp-popgen Depaulis and Veuille (1998)
Scaled recombination parameter (C = 4Nr) SequenceStatistics / hudson87 bpp-popgen Hudson (1987)
McDonald-Kreitman contingency table SequenceStatistics / MKtable bpp-popgen McDonald and Kreitman (1991)
Neutrality-index (NI) SequenceStatistics / neutralityIndex bpp-popgen Rand and Kann (1996)
Tajima's D SequenceStatistics / tajimaDSS and tajimaDTNM bpp-popgen Tajima (1989) tajimaDSS is the calculation using the number of polymorphic (segregating) sites and tajimaDTNM is the calculation using the total number of mutation.
Fu and Li (1993) statistics D, D*, F and F* SequenceStatistics / fuliD, fuliDstar, fuliF and fuliFstar bpp-popgen Fu and Li (1993)
Fst from frequencies at polymorphic sites SequenceStatistics / FstHudson92 bpp-popgen Hudson, Slatkin, Maddison (1992) Taken from eq. 3 of Hudson, Slatkin and Maddison (1992)
F statistics of Weir and Cockerham (including Fit, Fis and Fst) MultilocusGenotypeStatistics / getAllelesFstats bpp-popgen Weir and Cockerham (1984)

Numerical methods

Algebra

Method Class(es) / Function(s) Library References / Links / Notes
Matrix storage and manipulation Matrix, RowMatrix, LinearMatrix bpp-core Matrix is the interface (template class according to scalar type). Currently two implementation exist.
Matrix operations MatrixTools bpp-core Include addition, multiplication, scaling, transposition, Hadamar product, Kronecker sum, power, exponentiation, inversion, Taylor series, etc.
Eigen values EigenValue bpp-core Compute eigen values and vectors.
LU decomposition LUDecomposition bpp-core

Functions

Numerical derivatives

Method Class(es) / Function(s) Library References / Links / Notes
Two points method TwoPointsNumericalDerivative bpp-core Only first order derivatives are supported.
Three points method ThreePointsNumericalDerivative bpp-core First and second order derivatives, as well as cross derivatives are supported.
Five points method FivePointsNumericalDerivative bpp-core First and second order derivatives are implemented, but no cross derivatives.

Parameter transformation

Method Class(es) / Function(s) Library References / Links / Notes
Transform parameters defined on a,b to parameters defined on -inf, +inf, were a or b can also be infinite bpp::ReparametrizationFunctionWrapper bpp-core
Get first order derivatives of transformed function bpp::ReparametrizationDerivableFirstOrder bpp-core
Get second order derivatives of transformed function bpp::ReparametrizationDerivableSecondOrder bpp-core

Minimization

Method Class(es) / Function(s) Library References / Links / Notes
Golden section bpp::GoldenSectionSearch bpp-core
Brent bpp::BrentOneDimension bpp-core
Newton (1D) bpp::NewtonOneDimension bpp-core
Downhill simplex bpp::DownhillSimplexMethod bpp-core
Powell bpp::PowellMultiDimensions bpp-core
Conjugate gradient bpp::ConjugateGradientMultidimensions bpp-core
Broyden–Fletcher–Goldfarb–Shanno bpp::BfgsMultiDimensions bpp-core
Brent on each dimension bpp::SimpleMultiDimensions bpp-core
Newton on each dimension bpp::SimpleNewtonMultiDimensions bpp-core
Combination of methods bpp::MetaOptimizer bpp-core This special optimizer allow you to define which parameter should be optimized with which optimizer. The Meta-optimizer with the iterate over all optimizers untill global convergence.
PseudoNewton, dedicated to phylogenetic likelihood function. bpp::PseudoNewtonOptimizer bpp-phyl This method implements the Newton-Raphson in multi-dimensions, yet ignoring the cross derivatives. The method implements backward moves in case of likelihood increase (Flesenstein-Churchill's correction, see [1]).