Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

CRUX (Creating-Reference-libraries-Using-eXisting-tools) generates taxonomic reference libraries by querying your primers against the an ecoPCR database you generated in the previous section. Anacapa then uses these libraries for taxonomic assignment of your sequences. The purpose of Step 3 is to download the required databases and then use them to generate this ecoPCR database.

Anacapa contains several pre-built ecoPCR databases, based on defined primer sets, which can be seen in the ‘High level overview’ section on the anacapa page: GitHub - Anacapa.

If you are using a set of primers that aren’t on this list you’ll need to construct your own CRUX ecoPCR database, by following this guide.

...

This uses the NCBI taxonomy database (downloaded to ~/anacapa/crux_db/TAXO) and the EMBL database (downloaded to ~/anacapa/crux_db/EMBL/*.dat) and it outputs the ecoPCR converted database to /Obitools_databases/OB_dat_EMBL_std/ and prepends the generated ecoPCR database files with OB_dat_EMBL_std....

If you have downloaded and extracted all the databases in the correct directories you should now see obiconvert running with the following messages:

Code Block
Reading taxonomy dump file...
List all taxonomy rank...
Indexing taxonomy...
Indexing parent and rank...
Adding scientific name...
Adding taxid alias...
Adding deleted taxid...
....

During initial testing on the mammal EMBL databases, this took about 8 hours to complete. Note that a PBS interactive session has a maximum time limit of 12 hours (and we requested 11 hours when we started our session). If you are working with a larger dataset - e.g. vertebrates or invertebrates - this process may take much longer, and in fact longer than an interactive session will run, requiring you to submit the above obiconvert command as a PBS script (again, see HPC for instructions on how to do this).

Step 4: Running CRUX

Once you have downloaded and converted the required databases (section above), you can run CRUX.

CRUX generates taxonomic reference libraries by querying your primers against an ecoPCR database you generated in Step 3. Anacapa then uses these libraries for taxonomic assignment of your sequences.

Cleanup

Running the anacapa workflow involves downloading and generating various large databases. These will just take up space on the HPC unless removed.

...