Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Here we describe the process to fetch and extract unmapped reads onto a reference genome(s)/transcriptome using BWA.

  • This guide assumes that conda is already installed - if you have not yet done so, see here.

Pre-requisites

Strategy #1: BWA aligner

STEP 1.1 - (Optional) Create a conda environment with necessary tools if not yet available

...

Code Block
#!/bin/bash -l
#PBS -N bwa_mapping
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#define VARIABLES for the location of your FASTQ files and reference genome(s)
########################################################
R1=/path/to/sample_trimmed_R1.fastq
R2=/path/to/sample_trimmed_R1.fastq
MYGENOMES=/path/to/bwa_index/my_reference_genomes.fasta
SAMPLEID='LocationX'
#########################################################

conda activate myanalyses

#create a BWA genome index file prior mapping (this needs to be done once and then either remove or commented out using "#" as the first character in the line )
bwa index ${MYGENOMES}

#map reads onto reference
bwa mem ${READ1} ${READ2}  > ${SAMPLEID}_aln-pe.sam

#convert SAM to BAM
samtools view -S -b ${SAMPLEID}_aln-pe.sam > ${SAMPLEID}_aln-pe.bam

#retrive umapped reads
samtools view -b -f 4 ${SAMPLEID}_aln-pe.bam > ${SAMPLEID}_unmapped.bam

#sort unmapped reads by name
samtools sort -n -o ${SAMPLEID}_unmapped.sorted.bam ${SAMPLEID}_unmapped.bam

#extract FASTQ files for R1 and R2 unmapped reads
bedtools bamtofastq [OPTIONS] -i ${SAMPLEID}_unmapped.sorted.bam -fq ${SAMPLEID}_unmapped_R1.bam -fq2 ${SAMPLEID}_unmapped_R2.bam

Strategy #2: bowtie aligner

STEP 2.1 - (Optional) Create a conda environment with necessary tools if not yet available

...

  • Here we aim to generate a mapped file (BAM) that can be used to identify unmapped sequences of interest to run a follow-up de novo genome assembly, for example.

  • See bowtie user manual for more details: https://bowtie-bio.sourceforge.net/manual.shtml

  • For the mapping, we can use either individual FASTQ files for each sample or merged files for a group of related samples - if you merge the FASTQ files, these need to be merged using the sample order for both READ1 (*R1.fastq) and READ2 (*R2.fastq) files. We assume that adaptors have already been removed from the raw reads and the individual or merged files are quality processed reads (for example, processed by either cutadapt, trimmomatic, trim galore, fastp, or another QC tool)

Build bowtie-index for ref genomes:

Code Block
bowtie-build -f my_references.fasta my_references.fasta

Run mapping allowing no mismatches (identity = 100%)

Code Block
#!/bin/bash -l
#PBS -N bwa_mapping
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#define VARIABLES for the location of your FASTQ files and reference genome(s)
####################################################################
READ1=’/path/to/mySample_trimmed_R1.fastq’
READ2=’/path/to/mySample_trimmed_R2.fastq’
BOWTIEINDEX=’/path/to/bowtie-index/my_references.fasta’
#####################################################################

#run mapping:
bowtie --threads 8 -a -v 0 --best --strata --sam $BOWTIEINDEX \
        -1 $READ1 \
        -2 $READ2 > bowtie_v0_mapped_reads_onto_my_references.sam