bpp-seq  2.4.0
bpp::Alphabet Class Referenceabstract

The Alphabet interface. More...

#include <Bpp/Seq/Alphabet/Alphabet.h>

+ Inheritance diagram for bpp::Alphabet:
+ Collaboration diagram for bpp::Alphabet:

Public Member Functions

 Alphabet ()
 
virtual ~Alphabet ()
 
virtual std::string getName (int state) const =0
 Get the complete name of a state given its int description. More...
 
virtual std::string getName (const std::string &state) const =0
 Get the complete name of a state given its string description. More...
 
virtual int getIntCodeAt (size_t stateIndex) const =0
 
virtual const std::string & getCharCodeAt (size_t stateIndex) const =0
 
virtual size_t getStateIndex (int state) const =0
 
virtual size_t getStateIndex (const std::string &state) const =0
 
virtual std::string getAlphabetType () const =0
 Identification method. More...
 
virtual unsigned int getStateCodingSize () const =0
 Get the size of the string coding a state. More...
 
virtual bool equals (const Alphabet &alphabet) const =0
 Comparison of alphabets. More...
 
The Clonable interface
Alphabetclone () const =0
 
Tests
virtual bool isIntInAlphabet (int state) const =0
 Tell if a state (specified by its int description) is allowed by the the alphabet. More...
 
virtual bool isCharInAlphabet (const std::string &state) const =0
 Tell if a state (specified by its string description) is allowed by the the alphabet. More...
 
State access
virtual const AlphabetStategetStateAt (size_t stateIndex) const =0
 Get a state given its index. More...
 
virtual const AlphabetStategetState (int state) const =0
 Get a state given its int description. More...
 
virtual const AlphabetStategetState (const std::string &state) const =0
 Get a state given its string description. More...
 
Conversion methods
virtual std::string intToChar (int state) const =0
 Give the string description of a state given its int description. More...
 
virtual int charToInt (const std::string &state) const =0
 Give the int description of a state given its string description. More...
 
Sizes.
virtual size_t getNumberOfStates () const =0
 This is a convenient alias for getNumberOfChars(), returning a size_t instead of unsigned int. More...
 
virtual unsigned int getNumberOfChars () const =0
 Get the number of supported characters in this alphabet, including generic characters (e.g. return 20 for DNA alphabet). More...
 
virtual unsigned int getNumberOfTypes () const =0
 Get the number of distinct states in alphabet (e.g. return 15 for DNA alphabet). This is the number of integers used for state description. More...
 
virtual unsigned int getSize () const =0
 Get the number of resolved states in the alphabet (e.g. return 4 for DNA alphabet). This is the method you'll need in most cases. More...
 
Utilitary methods
virtual std::vector< int > getAlias (int state) const =0
 Get all resolved states that match a generic state. More...
 
virtual std::vector< std::string > getAlias (const std::string &state) const =0
 Get all resolved states that match a generic state. More...
 
virtual int getGeneric (const std::vector< int > &states) const =0
 Get the generic state that match a set of states. More...
 
virtual std::string getGeneric (const std::vector< std::string > &states) const =0
 Get the generic state that match a set of states. More...
 
virtual const std::vector< int > & getSupportedInts () const =0
 
virtual const std::vector< std::string > & getSupportedChars () const =0
 
virtual const std::vector< std::string > & getResolvedChars () const =0
 
virtual int getUnknownCharacterCode () const =0
 
virtual int getGapCharacterCode () const =0
 
virtual bool isGap (int state) const =0
 
virtual bool isGap (const std::string &state) const =0
 
virtual bool isUnresolved (int state) const =0
 
virtual bool isUnresolved (const std::string &state) const =0
 

Detailed Description

The Alphabet interface.

An alphabet object defines all the states allowed for a particular type of sequence. These states are coded as a string and an integer. The string description is the one found in the text (human comprehensive) description of sequences, typically in sequence files. However, for computionnal needs, this is often more efficient to store the sequences as a vector of integers. The link between the two descriptions is made via the Alphabet classes, and the two methods intToChar() and charToInt(). The Alphabet interface also provides other methods, like getting the full name of the states and so on.

An Alphabet object in itself stores the states as AlphabetStates object, in a potentially arbitrary but consistent series. All states are then indexed from 0 to 'numbersOfChars'. The number of states is equal to the number of string representations, but is usually higher than the number of int representation, as several characters can correspond to the same state (for instance X, N and ? in nucleotide alphabets).

The alphabet objects may throw several exceptions derived of the AlphabetException class.

See also
AlphabetException, BadCharException, BadIntException

Definition at line 130 of file Alphabet.h.

Constructor & Destructor Documentation

bpp::Alphabet::Alphabet ( )
inline

Definition at line 134 of file Alphabet.h.

Member Function Documentation

virtual int bpp::Alphabet::charToInt ( const std::string &  state) const
pure virtual

Give the int description of a state given its string description.

Parameters
stateThe string description.
Returns
The int description.
Exceptions
BadCharExceptionWhen state is not a valid char description.

Implemented in bpp::WordAlphabet, bpp::CodonAlphabet, bpp::AbstractAlphabet, bpp::LetterAlphabet, and bpp::RNY.

Referenced by bpp::BasicSymbolList::addElement(), bpp::EdSymbolList::addElement(), bpp::StringSequenceTools::codeSequence(), bpp::BasicSymbolList::setContent(), bpp::EdSymbolList::setContent(), bpp::BasicSymbolList::setElement(), bpp::EdSymbolList::setElement(), and ~Alphabet().

virtual bool bpp::Alphabet::equals ( const Alphabet alphabet) const
pure virtual
virtual std::vector<int> bpp::Alphabet::getAlias ( int  state) const
pure virtual

Get all resolved states that match a generic state.

If the given state is not a generic code then the output vector will contain this unique code.

Parameters
stateThe alias to resolve.
Returns
A vector of resolved states.
Exceptions
BadIntExceptionWhen state is not a valid integer.

Implemented in bpp::WordAlphabet, bpp::CodonAlphabet, bpp::AbstractAlphabet, bpp::ProteicAlphabet, bpp::RNY, bpp::NumericAlphabet, bpp::DNA, bpp::RNA, and bpp::BinaryAlphabet.

Referenced by bpp::SymbolListTools::getCounts(), bpp::SequenceTools::getPutativeHaplotypes(), bpp::AlphabetTools::match(), bpp::SequenceTools::subtractHaplotype(), and ~Alphabet().

virtual std::vector<std::string> bpp::Alphabet::getAlias ( const std::string &  state) const
pure virtual

Get all resolved states that match a generic state.

If the given state is not a generic code then the output vector will contain this unique code.

Parameters
stateThe alias to resolve.
Returns
A vector of resolved states.
Exceptions
BadCharExceptionWhen state is not a valid char description.

Implemented in bpp::WordAlphabet, bpp::CodonAlphabet, bpp::AbstractAlphabet, bpp::ProteicAlphabet, bpp::RNY, bpp::NumericAlphabet, bpp::DNA, bpp::RNA, and bpp::BinaryAlphabet.

virtual std::string bpp::Alphabet::getAlphabetType ( ) const
pure virtual

Identification method.

Used to tell if two alphabets describe the same type of sequences. For instance, this method is used by sequence containers to compare two alphabets and allow or deny addition of sequences.

Returns
A text describing the alphabet.

Implemented in bpp::WordAlphabet, bpp::ProteicAlphabet, bpp::CodonAlphabet, bpp::LexicalAlphabet, bpp::RNY, bpp::NumericAlphabet, bpp::DNA, bpp::DefaultAlphabet, bpp::RNA, bpp::CaseMaskedAlphabet, bpp::IntegerAlphabet, and bpp::BinaryAlphabet.

Referenced by bpp::MapSequenceContainer::addSequence(), bpp::VectorSiteContainer::addSequence(), bpp::VectorSequenceContainer::addSequence(), bpp::VectorSiteContainer::addSite(), bpp::CompressedVectorSiteContainer::addSite(), bpp::AlignedSequenceContainer::addSite(), bpp::SiteContainerTools::alignNW(), bpp::BasicSequence::append(), bpp::SequenceWithAnnotation::append(), bpp::SequenceTools::areSequencesIdentical(), bpp::SiteTools::areSitesIdentical(), bpp::SequenceTools::combineSequences(), bpp::SequenceWithQualityTools::complement(), bpp::SiteContainerTools::computeSimilarity(), bpp::SequenceWithQualityTools::concatenate(), bpp::SequenceTools::concatenate(), bpp::AbstractAlphabet::equals(), bpp::SequenceApplicationTools::getAlphabet(), bpp::RNY::getAlphabetType(), bpp::CodonAlphabet::getAlphabetType(), bpp::SymbolListTools::getNumberOfDistinctPositions(), bpp::SequenceTools::getPercentIdentity(), bpp::SequenceContainerTools::merge(), bpp::SequenceWithAnnotation::merge(), bpp::SiteContainerTools::merge(), bpp::NucleicAcidsReplication::reverse(), bpp::AbstractReverseTransliterator::reverse(), bpp::WordAlphabet::reverse(), bpp::SequenceWithQualityTools::reverseTranscript(), bpp::MapSequenceContainer::setSequence(), bpp::VectorSiteContainer::setSequence(), bpp::VectorSequenceContainer::setSequence(), bpp::MapSequenceContainer::setSequenceByKey(), bpp::VectorSiteContainer::setSite(), bpp::CompressedVectorSiteContainer::setSite(), bpp::AlignedSequenceContainer::setSite(), bpp::SequenceWithQualityTools::transcript(), bpp::NucleicAcidsReplication::translate(), bpp::AbstractTransliterator::translate(), bpp::WordAlphabet::translate(), bpp::SiteContainerTools::translateAlignment(), and ~Alphabet().

virtual const std::string& bpp::Alphabet::getCharCodeAt ( size_t  stateIndex) const
pure virtual
Returns
The char code of a given state.
Parameters
stateIndexThe index of the state to fetch.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual int bpp::Alphabet::getGeneric ( const std::vector< int > &  states) const
pure virtual

Get the generic state that match a set of states.

If the given states contain generic code, each generic code is first resolved and then the new generic state is returned. If only a single resolved state is given the function return this state.

Parameters
statesA vector of states to resolve.
Returns
A int code for the computed state.
Exceptions
BadIntExceptionWhen a state is not a valid integer.

Implemented in bpp::WordAlphabet, bpp::CodonAlphabet, bpp::AbstractAlphabet, bpp::ProteicAlphabet, bpp::DNA, and bpp::RNA.

Referenced by bpp::PhredPoly::nextSequence(), bpp::SequenceTools::subtractHaplotype(), and ~Alphabet().

virtual std::string bpp::Alphabet::getGeneric ( const std::vector< std::string > &  states) const
pure virtual

Get the generic state that match a set of states.

If the given states contain generic code, each generic code is first resolved and then the new generic state is returned. If only a single resolved state is given the function return this state.

Parameters
statesA vector of states to resolve.
Returns
A string code for the computed state.
Exceptions
BadCharExceptionwhen a state is not a valid char description.
CharStateNotSupportedExceptionwhen the alphabet does not support Char state for unresolved state.

Implemented in bpp::WordAlphabet, bpp::CodonAlphabet, bpp::AbstractAlphabet, bpp::ProteicAlphabet, bpp::DNA, and bpp::RNA.

virtual int bpp::Alphabet::getIntCodeAt ( size_t  stateIndex) const
pure virtual
Returns
The int code of a given state.
Parameters
stateIndexThe index of the state to fetch.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual std::string bpp::Alphabet::getName ( int  state) const
pure virtual

Get the complete name of a state given its int description.

In case of several states with identical number (i.e. N and X for nucleic alphabets), this method returns the name of the first found in the vector.

Parameters
stateThe int description of the given state.
Returns
The name of the state.
Exceptions
BadIntExceptionWhen state is not a valid integer.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual std::string bpp::Alphabet::getName ( const std::string &  state) const
pure virtual

Get the complete name of a state given its string description.

In case of several states with identical number (i.e. N and X for nucleic alphabets), this method will return the name of the first found in the vector.

Parameters
stateThe string description of the given state.
Returns
The name of the state.
Exceptions
BadCharExceptionWhen state is not a valid char description.

Implemented in bpp::WordAlphabet, and bpp::AbstractAlphabet.

virtual unsigned int bpp::Alphabet::getNumberOfChars ( ) const
pure virtual

Get the number of supported characters in this alphabet, including generic characters (e.g. return 20 for DNA alphabet).

Returns
The total number of supported character descriptions.

Implemented in bpp::AbstractAlphabet.

Referenced by bpp::AlphabetTools::checkAlphabetCodingSize(), and ~Alphabet().

virtual size_t bpp::Alphabet::getNumberOfStates ( ) const
pure virtual

This is a convenient alias for getNumberOfChars(), returning a size_t instead of unsigned int.

This funcion is typically used il loops over all states of an alphabet.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual unsigned int bpp::Alphabet::getNumberOfTypes ( ) const
pure virtual

Get the number of distinct states in alphabet (e.g. return 15 for DNA alphabet). This is the number of integers used for state description.

Returns
The number of distinct states.

Implemented in bpp::WordAlphabet, bpp::NucleicAlphabet, bpp::CodonAlphabet, bpp::ProteicAlphabet, bpp::LexicalAlphabet, bpp::RNY, bpp::DefaultAlphabet, bpp::NumericAlphabet, bpp::CaseMaskedAlphabet, bpp::IntegerAlphabet, and bpp::BinaryAlphabet.

Referenced by bpp::AlphabetTools::checkAlphabetCodingSize(), bpp::CaseMaskedAlphabet::getNumberOfTypes(), and ~Alphabet().

virtual const std::vector<std::string>& bpp::Alphabet::getResolvedChars ( ) const
pure virtual
Returns
A list of all resolved character codes.

Note for developers of new alphabets: we return a const reference here since the list is supposed to be stored within the class and should not be modified outside the class.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual const AlphabetState& bpp::Alphabet::getState ( int  state) const
pure virtual

Get a state given its int description.

Note: several states can share the same int values. This function will return one.

Parameters
stateThe int description.
Returns
The AlphabetState.
Exceptions
BadIntExceptionWhen state is not a valid integer.

Implemented in bpp::AbstractAlphabet, bpp::NucleicAlphabet, and bpp::ProteicAlphabet.

Referenced by ~Alphabet().

virtual const AlphabetState& bpp::Alphabet::getState ( const std::string &  state) const
pure virtual

Get a state given its string description.

Parameters
stateThe string description.
Returns
The AlphabetState.
Exceptions
BadCharExceptionWhen state is not a valid string.

Implemented in bpp::AbstractAlphabet, bpp::NucleicAlphabet, and bpp::ProteicAlphabet.

virtual const AlphabetState& bpp::Alphabet::getStateAt ( size_t  stateIndex) const
pure virtual

Get a state given its index.

Parameters
stateIndexThe index of the state.
Returns
The AlphabetState.
Exceptions
IndexOutOfBoundsExceptionWhen index is not a valid.

Implemented in bpp::AbstractAlphabet, bpp::NucleicAlphabet, bpp::NumericAlphabet, and bpp::ProteicAlphabet.

Referenced by ~Alphabet().

virtual unsigned int bpp::Alphabet::getStateCodingSize ( ) const
pure virtual

Get the size of the string coding a state.

Returns
The size of the tring coding each states in the Alphabet.
Author
Sylvain Gaillard

Implemented in bpp::WordAlphabet, bpp::CodonAlphabet, and bpp::AbstractAlphabet.

Referenced by bpp::SiteContainerTools::getSelectedPositions(), bpp::Phylip::writeInterleaved(), bpp::Phylip::writeSequential(), and ~Alphabet().

virtual size_t bpp::Alphabet::getStateIndex ( int  state) const
pure virtual
Returns
The indices of the states with corresponding int code.

Implemented in bpp::AbstractAlphabet.

Referenced by bpp::UserAlphabetIndex1::getIndex(), bpp::SimpleScore::getIndex(), bpp::UserAlphabetIndex1::setIndex(), and ~Alphabet().

virtual size_t bpp::Alphabet::getStateIndex ( const std::string &  state) const
pure virtual
Returns
The index of the state with corresponding char code.

Implemented in bpp::AbstractAlphabet.

virtual const std::vector<std::string>& bpp::Alphabet::getSupportedChars ( ) const
pure virtual
Returns
A list of all supported character codes.

Note for developers of new alphabets: we return a const reference here since the list is supposed to be stored within the class and should not be modified outside the class.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual const std::vector<int>& bpp::Alphabet::getSupportedInts ( ) const
pure virtual
Returns
A list of all supported int codes.

Note for developers of new alphabets: we return a const reference here since the list is supposed to be stored within the class and should not be modified outside the class.

Implemented in bpp::AbstractAlphabet.

Referenced by ~Alphabet().

virtual std::string bpp::Alphabet::intToChar ( int  state) const
pure virtual
virtual bool bpp::Alphabet::isCharInAlphabet ( const std::string &  state) const
pure virtual

Tell if a state (specified by its string description) is allowed by the the alphabet.

Parameters
stateThe string description.
Returns
'true' if the state in known.

Implemented in bpp::AbstractAlphabet, and bpp::LetterAlphabet.

Referenced by bpp::BasicSequence::append(), bpp::SequenceWithAnnotation::append(), bpp::BasicSymbolList::setContent(), bpp::EdSymbolList::setContent(), and ~Alphabet().

virtual bool bpp::Alphabet::isGap ( const std::string &  state) const
pure virtual
Parameters
stateThe state to test.
Returns
'True' if the state is a gap.

Implemented in bpp::AbstractAlphabet.

virtual bool bpp::Alphabet::isIntInAlphabet ( int  state) const
pure virtual

Tell if a state (specified by its int description) is allowed by the the alphabet.

Parameters
stateThe int description.
Returns
'true' if the state in known.

Implemented in bpp::AbstractAlphabet.

Referenced by bpp::BasicSequence::append(), bpp::SequenceWithAnnotation::append(), bpp::BasicSymbolList::setContent(), bpp::EdSymbolList::setContent(), and ~Alphabet().

virtual bool bpp::Alphabet::isUnresolved ( const std::string &  state) const
pure virtual

The documentation for this class was generated from the following file: