Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data

...

Experiment Accession

...

sample

...

FASTQ

...

Experiment Title

...

Organism Name

...

Instrument

...

Submitter

...

Study Accession

...

Study Title

...

Sample Accession

...

Total Size, Mb

...

Total Spots

...

Total Bases

...

Library Strategy

...

Library Source

...

Library Selection

...

SRX14748451

...

S1

...

SRR18645307

...

Homo sapiens

...

Homo sapiens

...

MinION

...

Drexel University

...

SRP367676

...

Multiplex structural variant detection by whole-genome mapping and nanopore sequencing.

...

SRS12509856

...

821.1

...

348226

...

972620520

...

OTHER

...

GENOMIC

...

other

...

SRX19406878

...

S2

...

SRR23513621

...

NA12878 DNA sequencing from nanopore WSG consortium - basecalled sequences (Guppy 6.1.3 super accuracy)

...

Homo sapiens

...

MinION

...

Garvan Institute of Medical Research

...

SRP421403

...

Curated publicly available nanopore datasets

...

SRS16801715

...

78526.8

...

11173458

...

97545895593

...

WGS

...

GENOMIC

...

RANDOM

...

ERX8211413

...

S3

...

ERR8578833

...

MinION sequencing

...

Homo sapiens

...

MinION

...

the university of hong kong

...

ERP135493

...

Target enrichment sequencing and variant calling on medical exome using ONT MinION

...

ERS10590135

...

8961.02

...

9636172

...

10382057986

...

Targeted-Capture

...

GENOMIC

...

PCR

...

ERX8211414

...

S4

...

ERR8578834

...

MinION sequencing

...

Homo sapiens

...

MinION

...

the university of hong kong

...

ERP135493

...

Target enrichment sequencing and variant calling on medical exome using ONT MinION

...

ERS10590135

...

10669.72

...

10644000

...

12212807287

...

Targeted-Capture

...

GENOMIC

...

PCR

...

SRX13322984

...

S5

...

SRR17138639

...

Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences

...

Homo sapiens

...

MinION

...

Garvan Institute of Medical Research

...

SRP349335

...

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

...

SRS11230712

...

6629.97

...

5513156

...

7815960904

...

WGS

...

GENOMIC

...

other

...

SRX13323057

...

S6

...

SRR17138566

...

Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences

...

Homo sapiens

...

MinION

...

Garvan Institute of Medical Research

...

SRP349335

...

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

...

SRS11230747

...

17107.98

...

12278391

...

20238395479

...

WGS

...

GENOMIC

...

other

Mapping

Let’s run the --help option of the pipeline to get information on the available parameters

Code Block
module load java
nextflow run epi2me-labs/wf-alignment -profile singularity --help
Code Block
N E X T F L O W  ~  version 23.12.0-edge
Launching `https://github.com/epi2me-labs/wf-alignment` [nostalgic_galileo] DSL2 - revision: e1fd7a51dc [master]
WARN: Config setting `prov.formats` is not defined, no provenance reports will be produced

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-alignment v1.1.2-ge1fd7a5
--------------------------------------------------------------------------------
Typical pipeline command:

  nextflow run epi2me-labs/wf-alignment \ 
        --fastq 'wf-alignment-demo/fastq' \ 
        --references 'wf-alignment-demo/references'

Input Options
  --fastq                [string]  FASTQ files to use in the analysis.
  --bam                  [string]  BAM or unaligned BAM (uBAM) files to use in the analysis.
  --analyse_unclassified [boolean] Analyse unclassified reads from input directory. By default the workflow will not process reads in the unclassified 
                                   directory. 
  --references           [string]  Path to a directory containing FASTA reference files.
  --reference_mmi_file   [string]  Path to an MMI index file to be used as reference.
  --counts               [string]  Path to a CSV file containing expected counts as a control.

Sample Options
  --sample_sheet         [string]  A CSV file used to map barcodes to sample aliases. The sample sheet can be provided when the input data is a directory 
                                   containing sub-directories with FASTQ files. 
  --sample               [string]  A single sample name for non-multiplexed data. Permissible if passing a single .fastq(.gz) file or directory of .fastq(.gz) 
                                   files. 

Output Options
  --out_dir              [string]  Directory for output of all workflow results. [default: output]
  --prefix               [string]  Optional prefix attached to each of the output filenames.

Advanced options
  --depth_coverage       [boolean] Calculate depth coverage statistics and include them in the report. [default: true]
  --minimap_preset       [choice]  Pre-defined parameter sets for `minimap2`, covering most common use cases. [default: dna]
                                   * dna
                                   * rna
  --minimap_args         [string]  String of command line arguments to be passed on to `minimap2`.

Miscellaneous Options
  --threads              [integer] Number of CPU threads to use for the alignment step. [default: 4]
  --disable_ping         [boolean] Enable to prevent sending a workflow ping.

Other parameters
  --monochrome_logs      [boolean] null
  --validate_params      [boolean] null [default: true]
  --show_hidden_params   [boolean] null

!! Hiding 4 params, use --show_hidden_params to show them !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-alignment for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

Variant calling

Code Block
nextflow run epi2me-labs/wf-human-variation -profile singularity --help

...

Overview of today’s session:

  • Learn to use CONDA to install tools and create conda environments

  • Hands-on exercises:

    • Quality Control (QC) of raw Nanopore data

    • Mapping of processed Nanopore data onto a reference genome

    • Run the epi2me-labs/wf-human-variation nextflow pipeline

Public Nanopore Datasets

Experiment Accession

sample

FASTQ

Experiment Title

Organism Name

Instrument

Submitter

Study Accession

Study Title

Sample Accession

Total Size, Mb

Total Spots

Total Bases

Library Strategy

Library Source

Library Selection

SRX14748451

S1

SRR18645307

Homo sapiens

Homo sapiens

MinION

Drexel University

SRP367676

Multiplex structural variant detection by whole-genome mapping and nanopore sequencing.

SRS12509856

821.1

348226

972620520

OTHER

GENOMIC

other

ERX8211413

S3

ERR8578833

MinION sequencing

Homo sapiens

MinION

the university of hong kong

ERP135493

Target enrichment sequencing and variant calling on medical exome using ONT MinION

ERS10590135

8961.02

9636172

10382057986

Targeted-Capture

GENOMIC

PCR

SRX13322984

S5

SRR17138639

Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences

Homo sapiens

MinION

Garvan Institute of Medical Research

SRP349335

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

SRS11230712

6629.97

5513156

7815960904

WGS

GENOMIC

other

SRX13323057

S6

SRR17138566

Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences

Homo sapiens

MinION

Garvan Institute of Medical Research

SRP349335

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

SRS11230747

17107.98

12278391

20238395479

WGS

GENOMIC

other

What is conda?

Conda is a powerful command line tool for package and environment management that runs on Windows, macOS, and Linux.

Installing conda

Source: https://conda.io/projects/conda/en/latest/user-guide/install/index.html

To install conda, you must first pick the right installer for you. The following are the most popular installers currently available:

Miniconda

Miniconda is a minimal installer provided by Anaconda. Use this installer if you want to install most packages yourself.

Anaconda Distribution

Anaconda Distribution is a full featured installer that comes with a suite of packages for data science, as well as Anaconda Navigator, a GUI application for working with conda environments.

NOTE: if you have already installed conda then you do not need to do the steps below

Download Miniconda installer for your system https://docs.anaconda.com/free/miniconda/

As we are working with the HPC (Linux) copy and paste the following to your terminal to install Miniconda:

Code Block
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

After installing initialise the newly installed Miniconda by running the following:

Code Block
~/miniconda3/bin/conda init bash

Now close your terminal and open it again to be able to use conda.

Once logged in you will be able to access the conda “base” environment

Configure conda channels https://bioconda.github.io/

Code Block
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Source for information below: https://astrobiomike.github.io/unix/conda-intro

...

Base environment

The “base” conda environment is, like it sounds, kind of our home base inside conda. We wouldn’t want to install lots of complicated programs here, as the more things added, the more likely something is going to end up having a conflict. But the base environment is somewhere we might want to install smaller programs that we tend to use a lot (example below).

...

Making a new environment

The simplest way we can create a new conda environment is like so:

Code Block
conda create -n new-env

Where the base command is conda create, then we are specifying the name of our new environment with -n (here “new-env”). It will check some things out and tell us where it is going to put it, when we hit yand enter, it will be created.

...

Entering an environment

To enter that environment, we need to execute:

Code Block
conda activate new-env

And now we can see our prompt has changed to have (new-env) at the front, telling us we are in that environment.

If we had forgotten the name, or wanted to see all of our environments, we can do so with:

Code Block
conda env list

Which will print out all of the available conda environments, and have an asterisk next to the one we are currently in.

...

Exiting an environment

We can exit whatever conda environment we are currently in by running:

Code Block
conda deactivate

...

Making an environment with a specific python version

By default, the conda create command will use the python version that the base conda environment is running. But we can specify a different one in the command if we’d like:

Code Block
conda create -n python-v2.7 python=2.7

Breakdown

  • conda create – this is our base command

  • -n python-v2.7 – we are naming the environment “python-v2.7”

  • python=2.7 – here we are specifying the python version to use within the environment

...

Removing an environment

And here is how we can remove an environment, by providing its name to the -n flag:

Code Block
conda deactivate # we can't be inside the environment we want to remove

conda env remove -n python-v2.7

...