Source:
https://github.com/stajichlab/AAFTF
Goal:
Assemble the genome of two M. alpina strains sequenced at QUT.
Identify DAG (Diacyl-glycerol) and Phopholipase (PLA, PLB, PLC and/or PLD) genes
Method
De novo genome assembly of yeast genomes using the AAFTF pipeline. More information:
https://github.com/stajichlab/AAFTF
(pending) FunGAP pipeline for yeast genome annotation:
https://github.com/CompSynBioLab-KoreaUniv/FunGAP
Preliminary Results
https://wiki.qut.edu.au/display/erbts/M.+alpina+de+novo+genome+assemblies
Resources
Using conda environments:
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
Methodology
Installing tools and dependencies
Create a conda environment called ‘aaftf’ and install several tools needed by the workflow:
conda create -n aaftf "python>=3.6" bbmap trimmomatic bowtie2 bwa pilon sourmash \ blast minimap2 spades megahit novoplasty biopython
Activate the conda environment:
conda activate aaftf #to deactivate the conda environment type: conda deactivate
Install the aaftf pipeline as follows:
python -m pip install git+https://github.com/stajichlab/AAFTF.git
2. Running AAFTF pipeline
To run jobs on the HPC, we use PBS Pro submission scripts that will submit jobs to an scheduler that will then send jobs to the queue. Use the PBS Pro script below to run 1) quality trimming and filtering; 2) de novo assembly; and 3) check contaminant sequences using the AAFTF pipeline:
#!/bin/bash -l #PBS -N AAFTF #PBS -l walltime=24:00:00 #PBS -l select=1:ncpus=8:mem=16gb #QUT eResearch, R.A.Barrero; Modified: 26-Apr-2022 #Usage: qsub launch_AAFTF_pipeline.pbs cd $PBS_O_WORKDIR #activate the AAFTF pipeline environment conda activate aaftf ################### ### VARIABLES ### ################### MEM=16 # amount of memory; can higher for example 32gb BASE=Mort # prefix name for assembly READSDIR=/work/speight_team/projects/yeast_genomes/data/Mort_Nextera-357721696 TRIMREAD=reads_trimmed # name of output folder for quality trimmed reads CPU=8 # number of cpu cores to use ################### ### PIPELINE ### ################### ############################################ ### STEP1: Quality trimming of raw reads ### ############################################ #make output directory for QCed reads mkdir -p $TRIMREAD #run the AAFTF pipeline AAFTF trim --method bbduk --memory $MEM -c $CPU \ --left $READSDIR/${BASE}_R1.fastq.gz --right $READSDIR/${BASE}_R2.fastq.gz \ -o $TRIMREAD/${BASE} --trimmomatic_adaptors nextera_adaptor.fa # filter out possible contamination sequences. This step make take a lot of memory depending on how many filtering libraries you use AAFTF filter -c $CPU --memory $MEM --aligner bbduk \ -o $TRIMREAD/${BASE} --left $TRIMREAD/${BASE}_1P.fastq.gz --right $TRIMREAD/${BASE}_2P.fastq.gz ############################################ ### STEP2: De novo genome assembly ### ############################################ # Additional variables LEFT=$TRIMREAD/${BASE}_filtered_1.fastq.gz RIGHT=$TRIMREAD/${BASE}_filtered_2.fastq.gz WORKDIR=working_AAFTF OUTDIR=genomes ASMFILE=$OUTDIR/${BASE}.spades.fasta #make work and output directories mkdir -p $WORKDIR $OUTDIR # run the de novo genome assembly AAFTF assemble -c $CPU --mem $MEM \ --left $LEFT --right $RIGHT \ -o $ASMFILE -w $WORKDIR/spades_$BASE ############################################ ### STEP3: Check and remove vector seqs ### ############################################ # Additional variables ASMFILE=$OUTDIR/${BASE}.spades.fasta VECTRIM=$OUTDIR/${BASE}.vecscreen.fasta #mkdir -p $WORKDIR $OUTDIR #run trimming of vector sequences AAFTF vecscreen -c $CPU -i $ASMFILE -o $VECTRIM
The above template is to run the AAFTF pipeline for the Mort strain, to use it for the Alpine strain we need to modify the following two variables in the above script:
BASE=Alpina READSDIR=/work/speight_team/projects/yeast_genomes/data/Alpina_Nextera-357932578
Once change the above variables then you can submit the launch_AAFTF_pipeline.pbs script to the scheduler as follows:
qsub launch_AAFTF_pipeline.pbs
Check the progress of the pipeline:
qjobs #alternatively use qstats -u USERNAME