Sequence

Sequences are the core data of the Bio++ libraries. They come as character chains, from text files or from a database. In order to be able to interpret a sequence, an alphabet is required. It will be used to encode the sequence en ensure the trnaslation between computer representation and human representation. A sequence can in some cases be associated with several features, like gene annotations or quality scores.

Depending on the user's need, there are several ways to manipulate sequences in Bio++.

Sequences as strings
The simplest way to manipulate sequences is to store them as character strings (using std::string). The class StringSequenceTools offers several methods to process such sequences. while being the easiest way to process sequences, this option is rather limited as it comes to perform more complexe data manipulation, particularly when states have more than one character (for instance: codon sequences).

Sequences as dedicated objects
Most methods in Bio++ will required a full object implementation of sequence data. In Bio++ 2.00, the class hierarchy has been rewritten in order to accomodate several implementations. It is however to a large extent backward compatible with previous versions of Bio++.

Lists of symbols
The most basic feature of a sequence is to store its constitutive series of elements, together with the associated alphabet required to decode it. Basic operations on the sequence include changing, inserting or deleting some elements. The SymbolList interface therefore defines all the required operations. There are currently two implementation of this interface:
 * BasicSymbolList, offering a minimal implementation (which was used by Bio++ < 2.00),
 * EdSymbolList, 'Ed' standing for event-driven. This implementation defines a SymbolListListener and SymbolListEvent classes. This event-driven implementation allows you to capture any modification of the sequence by appropriate events.

The Sequence interface
The Sequence interface inherits from the SymbolList interface, and adds some simple features like sequence names and comments. It also contains some utilitary methods for automatically converting a sequence from/to a character string. Two implementations are available:
 * BasicSequence, which is based on the BasicSymbolList implementations, and
 * SequenceWithAnnotation, which offers an event-driven implementation based on the EdSymbolList class. In addition, a SequenceAnnotation interface is defined, extending the SymbolListListener interface. Sequence annotations can therefore be handled in a very general way by the SequenceWithAnnotation class, as a special case of listeners. Some utilitary methods dedicated to annotations are provided.

The SequenceWithQuality class is a special case of SequencewithAnnotation. It contains a mandatory annotation, an instance of a SequenceQuality class, containing sequence quality scores (as the one obtained from the phred format for instance). This class provides some methods to edit the scores together with the sequence, for convenience.

Extending the sequence classes and capturing sequence edition events
The EdSymbolList class fire events every time the sequence content is modified. Depending on the modification, several events can be generated: Each of these event will be thrown twice: before attempting to perform the modification, and after the modification was performed. These events can be caughed by implementing the SymbolListListener interface, and adding an instance of the resulting class to the sequence object using the addSymbolListListener method.
 * SymbolListEditionEvent when the full content was affected, for instance after using the setContent method,
 * SymbolListDeletionEvent and SymbolListInsertionEvent in case of an indel, for instance after calling addElement, removeElement or resize,
 * SymbolListSubstitutionEvent when the sequence content was changed, for instance with a call to setElement.

Example of usage
Here is a simple introduction to the Sequence class.

HOW TO USE THAT FILE:


 * General comments are written using the * * syntaxe.
 * Code lines are switched off using '//'. To activate those lines, just remove the '//' characters!
 * You're welcome to extensively modify that file!


 * A way to compile the file is:


 * Here is the code: