/
Extracting unmapped sequences

Extracting unmapped sequences

  • Here we describe the process to fetch and extract unmapped reads onto a reference genome(s)/transcriptome using BWA.

  • This guide assumes that conda is already installed - if you have not yet done so, see here.

Pre-requisites

Strategy #1: BWA aligner

STEP 1.1 - (Optional) Create a conda environment with necessary tools if not yet available

Creating conda environment, and you can give it any name (i.e., related to the workflow or tool being installed). For simplicity, here we will call it ‘myanalyses’

conda create --name myanalyses python=3.7

Activate the conda environment.

conda activate myanalyses

 Let’s now necessary tools one at a time (faster than installing them all at once):

conda install -c bioconda bwa

To look for other bioinformatics tools to install, search the tool of interest at: https://anaconda.org

STEP 1.2 - Mapping of reads onto reference sequences (i.e., can be abundant genomic sequences found in an environmental sample).

Strategy #2: bowtie aligner

STEP 2.1 - (Optional) Create a conda environment with necessary tools if not yet available

(optional) If not yet created as per above - Create a conda environment, and you can give it any name (i.e., related to the workflow or tool being installed). For simplicity, here we will call it ‘myanalyses’

Activate the conda environment.

 Let’s now necessary tools one at a time (faster than installing them all at once):

STEP 1.2 - Mapping of reads onto reference sequences (i.e., can be abundant genomic sequences found in an environmental sample).

  • Here we aim to generate a mapped file (BAM) that can be used to identify unmapped sequences of interest to run a follow-up de novo genome assembly, for example.

  • See bowtie user manual for more details: https://bowtie-bio.sourceforge.net/manual.shtml

  • For the mapping, we can use either individual FASTQ files for each sample or merged files for a group of related samples - if you merge the FASTQ files, these need to be merged using the sample order for both READ1 (*R1.fastq) and READ2 (*R2.fastq) files. We assume that adaptors have already been removed from the raw reads and the individual or merged files are quality processed reads (for example, processed by either cutadapt, trimmomatic, trim galore, fastp, or another QC tool)

Build bowtie-index for ref genomes:

Run mapping allowing no mismatches (identity = 100%)

Related content

RNAseq - Star 2 pass approach (Ronin)
RNAseq - Star 2 pass approach (Ronin)
More like this
DKE 121 genome assembly
DKE 121 genome assembly
More like this
3. Fetch public RNA-seq data
3. Fetch public RNA-seq data
More like this
Version 3.1.1 - WES variant analysis
Version 3.1.1 - WES variant analysis
More like this
nf-eresearch/ONTprocessing - NextFlow pipeline for Oxford Nanopore de novo assembly and ref guided consensus
nf-eresearch/ONTprocessing - NextFlow pipeline for Oxford Nanopore de novo assembly and ref guided consensus
More like this
nf-core/rnaseq: downstream analysis in R
nf-core/rnaseq: downstream analysis in R
More like this