Exosome variant analysis

This guide provides a step-by-step guide to 1) convert BAM files (i.e., public) to FASTQ; and 2) run the nextflow nf-core/sarek variant calling pipeline.

Create a Conda environment with tools needed for downstream analyze

Create a python 3.7 environment:

conda create --name liver python=3.7

Activate the conda environment:

conda activate liver

Prepare copy and paste the code below into a file called environment.yml - Tip: use a text editor (i.e., vim, nano or other)

channels:
  - bioconda
  - conda-forge
dependencies:
  - bedtools
  - samtools
  - seqkit
  - vcftools
  - emboss

Run the following command to install additional tools

conda env update --file environment.yml

To deactivate the conda environment, run:

conda deactivate

Convert BAM to FASTQ

STEP1: install bedtools if this is not yet available

conda install -c bioconda bedtools

STEP2: Move to the folder where all the BAM files are present and run the following PBS script

#!/bin/bash -l
#PBS -N BAM2FASTQ
#PBS -l walltime=12:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4

cd $PBS_O_WORKDIR

conda activate soapdenovo

#sort reads in BAM file by indentifier-name (-n)
samtools sort -@ 4 -n input.bam input_sorted

#extract paired end reads
bedtools bamtofastq -i input_sorted.bam -fq output_r1.fastq -fq2 output_r2.fastq