Variant (vcf) analysis using R in Jupyter
Â
Requirements
Â
A vcf file generated by a variant calling workflow.
The Jupyter Notebook has been designed specifically to use the output from https://nf-co.re/sarek, but can easily use any vcf file as input.
A HPC account. If you don’t have a HP account, request one through HEAT (https://heat2.qut.edu.au/HEAT/Modules/SelfService/#home)
The Jupyter Notebooks outlined in this guide can be used on any Jupyter Hub, however. There are various freely accessible options outside QUT and, with some configuration, a Jupyter Hub can be set up on your local PC, where you can run the Notebooks directly. Setting up a local Jupyter Hub is beyond the scope of this guide. Guides on how to do this are online.
Purpose
Â
The purpose is primarily an educational tool for researchers to learn how to use R packages to analyse variant data. It is thereby targeted at users who are beginners or intermediate R users, but may also be useful to advanced R users, who want to learn variant analysis.
Environment
Â
The workflow is packaged in a Jupyter Notebook (https://jupyter.org/) which allows a web-based interface for running code. It allows code to be embedded in a Notebook, along with notes and text written in Markdown (https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet ) and HTML.
An individual Jupyter Notebook runs a specific programming language or environment ‘kernel’. In this case we are running the R kernel, allowing for analysis of variant data using various R packages.
Setting up Jupyter
Â
In this guide you will be using the Jupyter Hub on the QUT compute cluster. Access the Jupyter Hub here (recall that you need a HPC account to access this):
https://jupyterhub.eres.qut.edu.au/hub/spawn
You will see a launcher tab on the right and the contents of your HPC home directory on the left.
The launcher tab will currently have two options for creating new Notebook: a Python Notebook and a Bash Notebook.
We need to first install R so we can create R Notebooks. We need to run a few simple commands on the terminal first to do this. Open up a terminal window by clicking on the ‘terminal’ button:
The run the following commands, one at a time (copy and paste these into the prompt and hit enter).
We’ll be creating an ‘environment’ using mamba
First, load the mamba module
module load mamba
Then, create a new environment called ‘rnotebook’
mamba create --name rnotebook
Then activate that environment
conda activate rnotebook
Then install R on that environment.
conda install -c r r-irkernel
This will take a while. Once it finishes, go back to the launcher tab and you’ll see a new R notebook option:
Now you’re ready to run R notebooks.
Â
Â
Â
Â
Gary dataset:
/work/liver/nextflow/sarek/individual/sarek_VCFs_annotation
All VCF files and annotation files are here.
Â
Â
Â
Â
Â
Â
Â
Â