This guide provides a step-by-step guide to 1) convert BAM files (i.e., public) to FASTQ; and 2) run the nextflow nf-core/sarek variant calling pipeline.
Create a Conda environment with tools needed for downstream analyze
Create a python 3.7 environment:
conda create --name liver python=3.7
Activate the conda environment:
conda activate liver
Prepare copy and paste the code below into a file called environment.yml - Tip: use a text editor (i.e., vim, nano or other)
channels: - bioconda - conda-forge dependencies: - bedtools - samtools - seqkit - vcftools - emboss
Run the following command to install additional tools
conda env update --file environment.yml
To deactivate the conda environment, run:
conda deactivate
Convert BAM to FASTQ
STEP1: install bedtools if this is not yet available
conda install -c bioconda bedtools
STEP2: Move to the folder where all the BAM files are present and run the following PBS script
#!/bin/bash -l #PBS -N BAM2FASTQ #PBS -l walltime=12:00:00 #PBS -l mem=8gb #PBS -l ncpus=4 cd $PBS_O_WORKDIR conda activate soapdenovo #sort reads in BAM file by indentifier-name (-n) samtools sort -@ 4 -n input.bam input_sorted #extract paired end reads bedtools bamtofastq -i input_sorted.bam -fq output_r1.fastq -fq2 output_r2.fastq