Container

From Bio++ Wiki
Jump to: navigation, search


Here is a simple introduction to the Container class.

HOW TO USE THAT FILE:

  • General comments are written using the * * syntaxe.
  • Code lines are switched off using '//'. To activate those lines, just remove the '//' characters!
  • You're welcome to extensively modify that file!
  • A way to compile the file is:

<source lang="bash"> WHEREISBIOPP=$HOME/local

g++ -I$WHEREISBIOPP/include -L$WHEREISBIOPP/lib -lbpp-core -lbpp-seq ExContainer.cpp -o excontainer </source>


  • Here is the code:


<source lang="cpp"> /*

* We start by including what we'll need, and sort the inclusions a bit:
*/

/*

* From the STL:
*/
  1. include <iostream> //to be able to output stuff in the terminal.

/*

* We'll use the standard template library namespace:
*/

using namespace std;

/*

* From SeqLib:
*/
  1. include <Bpp/Seq/Alphabet.all> /* this includes all alphabets in one shot */
  2. include <Bpp/Seq/Container.all> /* this includes all containers */
  3. include <Bpp/Seq/StateProperties/DefaultNucleotideScore.h>
  4. include <Bpp/Seq/Io.all> /* this includes all sequence readers and writers */

/*

* We'll need a few tools from the Bio++ Utils library:
*/
  1. include <Bpp/App/ApplicationTools.h>

/*

* All Bio++ functions are also in a namespace, so we'll use it:
*/

using namespace bpp;

/*----------------------------------------------------------------------------------------------------*/ /*

* Now starts the real stuff...
*/


int main(int args, char ** argv) {

 /*
  * We surround our code with a try-catch block, in case some error occurs:
  */
 try
   {
     cout << "Hello World!" << endl;
     /*
      * Containers are object that store a family of sequences.
      * They hence play a major role in the Bio++ libraries.
      *
      * We well first create a simple container from scratch:
      */
     // Sequence *seq1 = new BasicSequence("Sequence 1", "GATTACTGATTACATGGT", &AlphabetTools::DNA_ALPHABET);
     // Sequence *seq2 = new BasicSequence("Sequence 2", "GATCACAATGTTACGCT", &AlphabetTools::DNA_ALPHABET);
     // VectorSequenceContainer* vsc = new VectorSequenceContainer(&AlphabetTools::DNA_ALPHABET);
     // vsc->addSequence(*seq1);
     // vsc->addSequence(*seq2);
     // cout << "This container has " << vsc->getNumberOfSequences() << " sequences." << endl;
     /*
      * The VectorSequenceContainer class is one of the simplest container.
      * It is a mere vector of sequence (it is hence an OrderedSequenceContainer).
      *
      * Sequence can be accessed either by their name or by their position:
      */
     // cout << vsc->getSequence("Sequence 1").toString() << endl;
     // cout << vsc->getSequence(0).toString() << endl;
     /*
      * Another important point is that sequence are cloned when being added to the container,
      * so that the container has full control over the data. As a consequence:
      * - deleting the sequence object is safe and does not damage the container
      * - deleting the container wil not damage your initial sequences.
      * As a consequence, a sequence object can be safely added to several containers and being destroyed after that:
      */
     // delete seq1;
     // delete seq2;
     // cout << vsc->getSequence("Sequence 1").toString() << endl;
     /* still works! */
     /*
      * As before, we have several utilitary methods available.
      */
     // cout << "Is that an alignment? " << (SequenceContainerTools::sequencesHaveTheSameLength(*vsc) ? "yes" : "no") << endl;
     // SiteContainer *sites = SiteContainerTools::alignNW(vsc->getSequence(0), vsc->getSequence(1), DefaultNucleotideScore(&AlphabetTools::DNA_ALPHABET), -5);
     // cout << sites->getSequence(0).toString() << endl;
     // cout << sites->getSequence(1).toString() << endl;
     // cout << "Is that an alignment? " << (SequenceContainerTools::sequencesHaveTheSameLength(*sites) ? "yes" : "no") << endl;
     /*
      * Alignement are stored in a particular structure called SiteContainer.
      * A SiteContainer has all the properties of a SequenceContainer + specific methods
      * to access the data per column. Check the documentation of those classes.
      */
     // cout << "This container has " << sites->getNumberOfSites() << " positions." << endl;
     // delete sites;
     /* ----------------
      * QUESTION 1: Look at the SiteContainerTools class, and try to:
      * - remove the gaps from the alignment
      * - change the gaps to unresolve characters
      * - compute the frequencies of nucleotides
      * ----------------
      */
     /*
      * In most cases, containers are created from a file and not from scratch.
      * We hence need to introduce a new kind of objects, called sequence reader (ISequence).
      * These objects have a method 'read' that creates a container from a file and an alphabet.
      */
     // Fasta fasReader;
     // OrderedSequenceContainer *sequences = fasReader.read("TIMnuc.aln.fasta.txt", &AlphabetTools::DNA_ALPHABET);
     // cout << "This container has " << sequences->getNumberOfSequences() << " sequences." << endl;
     // cout << "Is that an alignment? " << (SequenceContainerTools::sequencesHaveTheSameLength(*sequences) ? "yes" : "no") << endl;
     /*
      * The Fasta format can store sequences which are aligned or not.
      * It hence returns a SequenceContainer object, not an alignment.
      * Since we know that the sequences are aligned, we will create a SiteContainer
      * object from the SequenceContainer. There are two types of SiteContainer objects in Bio++,
      * the AlignedSequenceContainer, which is a specialization of the VectorSequenceContainer,
      * and the VectorSiteContainer. They differ by their storage of the data.
      * We will compare them to assess their properties:
      */
     // SiteContainer *asc = new AlignedSequenceContainer(*sequences);
     // SiteContainer *vsc2 = new VectorSiteContainer(*sequences);
     // unsigned int nrep = 1000; /* WATCHOUT!!! reduce this number if this is too slow on your computer! */
     // ApplicationTools::startTimer();
     // for(unsigned int j = 0; j < nrep; j++)
     // {
     //   ApplicationTools::displayGauge(j,nrep-1, '=');
     //   for(unsigned int i = 0; i < asc->getNumberOfSequences(); i++) asc->getSequence(i);
     // }
     // ApplicationTools::displayTime("\nTotal time used for sequence access in ASC:");
     // ApplicationTools::startTimer();
     // for(unsigned int j = 0; j < nrep; j++)
     // {
     //   ApplicationTools::displayGauge(j,nrep-1, '=');
     //   for(unsigned int i = 0; i < vsc2->getNumberOfSequences(); i++) vsc2->getSequence(i);
     // }
     // ApplicationTools::displayTime("\nTotal time used for sequence access in VSC2:");
     // ApplicationTools::startTimer();
     // for(unsigned int j = 0; j < nrep; j++)
     // {
     //   ApplicationTools::displayGauge(j,nrep-1, '=');
     //   for(unsigned int i = 0; i < asc->getNumberOfSites(); i++) asc->getSite(i);
     // }
     // ApplicationTools::displayTime("\nTotal time used for site access in ASC:");
     // ApplicationTools::startTimer();
     // for(unsigned int j = 0; j < nrep; j++)
     // {
     //   ApplicationTools::displayGauge(j,nrep-1, '=');
     //   for(unsigned int i = 0; i < vsc2->getNumberOfSites(); i++) vsc2->getSite(i);
     // }
     // ApplicationTools::displayTime("\nTotal time used for site access in VSC2:");
     // delete asc;
     // delete vsc2;
     /* ----------------
      * QUESTION 2: At last, a piece of useful code!
      * Using the help of the Bio++ API docuementation, write a piece of code that:
      * 1) Read a file in the Fasta format,
      * 2) Check that the sequences are aligned,
      * 3) Remove all the positions with gaps,
      * 4) Display a few statistics, like the nucleotides frequencies, the average, min and max sequence lengths, etc.
      * 5) Write the resulting alignment in a format understandable by PAML.
      * ----------------
      */
   }
 catch(Exception& e)
   {
     cout << "Bio++ exception:" << endl;
     cout << e.what() << endl;
     return(-1);
   }
 catch(exception& e)
   {
     cout << "Any other exception:" << endl;
     cout << e.what() << endl;
     return(-1);
   }
 return(0);

}

</source>