Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide provides a step-by-step guide to 1) convert BAM files (i.e., public) to FASTQ; and 2) run the nextflow nf-core/sarek variant calling pipeline.

...

Code Block
conda activate liver

Prepare a file called environment.yml - Tip: use a text editor (i.e., vim, nano, or other) to copy and paste the code below into the file.

...

Move to the folder where all the BAM files are present and prepare the following script (i.e.,launch_BAM2FASTQ.pbs):

Code Block
#!/bin/bash -l
#PBS -N BAM2FASTQ
#PBS -l walltime=24:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4

cd $PBS_O_WORKDIR

#activate the conda environment with the necessary tools
conda activate liver

#Sort reads in BAM file by indentifier-name (-n) using 4 CPUs (-@ 4). Note 'prefix' for sorted file noted after $i (input BAM file)
for i in `ls --color=never *.bam`
do
  echo $i
  samtools sort -@ 4 -n $i ${i%%.bam}_sorted
done

#Extract paired end reads in FASTQ format
for file in `ls --color=never *sorted.bam`
do
  echo $file
  bedtools bamtofastq -i $file -fq ${file%%.bam}_R1.fastq -fq2 ${file%%.bam}_R2.fastq
  #compress FASTQ files to run using the sarek pipeline
  gzip -c -9 ${file%%.bam}_R1.fastq > ${file%%.bam}_R1.fastq.gz
  gzip -c -9 ${file%%.bam}_R1.fastq > ${file%%.bam}_R2.fastq.gz
done

Submit the job to the PBS scehdulerscheduler:

Code Block
qsub launch_BAM2FASTQ.pbs

...

To run Sarek 3 files are required:

  1. launch.pbs → details how to run the workflow

  2. ~/.nextflow./config → specify how to run the workflow in the HPC

  3. samplesheet.csv → provides information on the samples and data to be used (i.e., FASTQ, BAM or CRAM)

...

Code Block
#!/bin/bash -l
#PBS -N sarek
#PBS -l walltime=24:00:00
#PBS -l select=1:ncpus=1:mem=5gb
cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java

nextflow run nf-core/sarek \
        -r 3.1.1 \
        -profile singularity \
        --genome GATK.GRCh38 \
        --input indexsamplesheet.csv \
        -config nextflow.config

To run variant calling and annotation use the following script instead:

Code Block
#!/bin/bash -l
#PBS -N sarek
#PBS -l walltime=24:00:00
#PBS -l select=1:ncpus=1:mem=5gb
cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java

nextflow run nf-core/sarek \
        -r 3.1.1 \
        -profile singularity \
        --genome GATK.GRCh38 \
        --input indexsamplesheet.csv \
        -config nextflow.config \
        --wes true \
        --tools 'deepvariant|,freebayes|,haplotypecaller|,manta|,mpileup|,snpeff|,strelka|,vep' \
        -resume

~/.nextflow./config file: (Note: You may already have this file if you installed Nextflow using this guide )

Code Block
singularity {
    cacheDir = '$HOME/NXF_SINGULARITY_CACHEDIR'
    autoMounts = true
}

conda {
    cacheDir = '$HOME/NXF_CONDA_CACHEDIR'
}

singularity {
    enabled = true
    autoMounts = true
}

process {
  executor = 'pbspro'
  beforeScript = {
      """
      source $HOME/.bashrc
      source $HOME/.profile
      """
  }
  scratch = false
  cleanup = false
}

...