Overview of today’s session:
Learn to use CONDA to install tools and create conda environments
Hands-on exercises:
Quality Control (QC) of raw Nanopore data
Mapping of processed Nanopore data onto a reference genome
Run the epi2me-labs/wf-human-variation nextflow pipeline
Public Nanopore Datasets
Experiment Accession | sample | FASTQ | Experiment Title | Organism Name | Instrument | Submitter | Study Accession | Study Title | Sample Accession | Total Size, Mb | Total Spots | Total Bases | Library |
Strategy | Library Source | Library Selection |
SRX19406880
SRX14748451 | S1 | SRR18645307 | Homo sapiens | Homo sapiens | MinION | Drexel University | SRP367676 | Multiplex structural variant detection by whole-genome mapping and nanopore sequencing. | SRS12509856 | 821.1 | 348226 | 972620520 | OTHER | GENOMIC | other |
ERX8211413 | S3 | ERR8578833 | MinION sequencing | Homo sapiens | MinION | the university of hong kong | ERP135493 | Target enrichment sequencing and variant calling on medical exome using ONT MinION | ERS10590135 | 8961.02 | 9636172 | 10382057986 | Targeted-Capture | GENOMIC | PCR |
SRX13322984 | S5 | SRR17138639 | Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences | Homo sapiens | MinION | Garvan Institute of Medical Research |
SRP421403
Curated publicly available nanopore datasets
SRS16801714
16129.79
17584007
19592288842
GM12878_cDNA_fastq_guppy_6.1.3_hac
RNA-Seq
TRANSCRIPTOMIC
cDNA
SRX19406878
SRP349335 | Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing | SRS11230712 | 6629.97 | 5513156 | 7815960904 | WGS | GENOMIC | other |
SRX13323057 | S6 | SRR17138566 | Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences | Homo sapiens | MinION | Garvan Institute of Medical Research |
SRP421403
Curated publicly available nanopore datasets
SRS16801715
78526.8
11173458
97545895593
NA12878_DNA_fastq_guppy_6.1.3_sup
WGS
GENOMIC
RANDOM
SRX19406876
GM12878 directRNA sequencing from nanopore WSG consortium - basecalled sequences (Guppy 6.1.3 high accuracy)
Homo sapiens
MinION
Garvan Institute of Medical Research
SRP421403
Curated publicly available nanopore datasets
SRS16801714
8763.24
9729457
9869880442
GM12878_directRNA_fastq_guppy_6.1.3_hac
RNA-Seq
TRANSCRIPTOMIC
SRP349335 | Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing | SRS11230747 | 17107.98 | 12278391 | 20238395479 | WGS | GENOMIC | other |
What is conda?
Conda is a powerful command line tool for package and environment management that runs on Windows, macOS, and Linux.
Installing conda
Source: https://conda.io/projects/conda/en/latest/user-guide/install/index.html
To install conda, you must first pick the right installer for you. The following are the most popular installers currently available:
Miniconda
Miniconda is a minimal installer provided by Anaconda. Use this installer if you want to install most packages yourself.
Anaconda Distribution
Anaconda Distribution is a full featured installer that comes with a suite of packages for data science, as well as Anaconda Navigator, a GUI application for working with conda environments.
NOTE: if you have already installed conda then you do not need to do the steps below |
---|
Download Miniconda installer for your system https://docs.anaconda.com/free/miniconda/
As we are working with the HPC (Linux) copy and paste the following to your terminal to install Miniconda:
Code Block |
---|
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh |
After installing initialise the newly installed Miniconda by running the following:
Code Block |
---|
~/miniconda3/bin/conda init bash |
Now close your terminal and open it again to be able to use conda.
Once logged in you will be able to access the conda “base” environment
Configure conda channels https://bioconda.github.io/
Code Block |
---|
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict |
Source for information below: https://astrobiomike.github.io/unix/conda-intro
...
Base environment
The “base” conda environment is, like it sounds, kind of our home base inside conda. We wouldn’t want to install lots of complicated programs here, as the more things added, the more likely something is going to end up having a conflict. But the base environment is somewhere we might want to install smaller programs that we tend to use a lot (example below).
...
Making a new environment
The simplest way we can create a new conda environment is like so:
Code Block |
---|
conda create -n new-env |
Where the base command is conda create
, then we are specifying the name of our new environment with -n
(here “new-env”). It will check some things out and tell us where it is going to put it, when we hit y
and enter, it will be created.
...
Entering an environment
To enter that environment, we need to execute:
Code Block |
---|
conda activate new-env |
And now we can see our prompt has changed to have (new-env)
at the front, telling us we are in that environment.
If we had forgotten the name, or wanted to see all of our environments, we can do so with:
Code Block |
---|
conda env list |
Which will print out all of the available conda environments, and have an asterisk next to the one we are currently in.
...
Exiting an environment
We can exit whatever conda environment we are currently in by running:
Code Block |
---|
conda deactivate |
...
Making an environment with a specific python version
By default, the conda create
command will use the python version that the base conda environment is running. But we can specify a different one in the command if we’d like:
Code Block |
---|
conda create -n python-v2.7 python=2.7 |
Breakdown
conda create
– this is our base command
-n python-v2.7
– we are naming the environment “python-v2.7”
python=2.7
– here we are specifying the python version to use within the environment
...
Removing an environment
And here is how we can remove an environment, by providing its name to the -n
flag:
Code Block |
---|
conda deactivate # we can't be inside the environment we want to remove
conda env remove -n python-v2.7 |
...