/
Variant (vcf) analysis using R in Jupyter

Variant (vcf) analysis using R in Jupyter

 

Requirements

 

  • A vcf file generated by a variant calling workflow.

The Jupyter Notebook has been designed specifically to use the output from https://nf-co.re/sarek, but can easily use any vcf file as input.

The Jupyter Notebooks outlined in this guide can be used on any Jupyter Hub, however. There are various freely accessible options outside QUT and, with some configuration, a Jupyter Hub can be set up on your local PC, where you can run the Notebooks directly. Setting up a local Jupyter Hub is beyond the scope of this guide. Guides on how to do this are online.

Purpose

 

The purpose is primarily an educational tool for researchers to learn how to use R packages to analyse variant data. It is thereby targeted at users who are beginners or intermediate R users, but may also be useful to advanced R users, who want to learn variant analysis.

Environment

 

The workflow is packaged in a Jupyter Notebook (https://jupyter.org/) which allows a web-based interface for running code. It allows code to be embedded in a Notebook, along with notes and text written in Markdown (https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet ) and HTML.

An individual Jupyter Notebook runs a specific programming language or environment ‘kernel’. In this case we are running the R kernel, allowing for analysis of variant data using various R packages.

Setting up Jupyter

 

In this guide you will be using the Jupyter Hub on the QUT compute cluster. Access the Jupyter Hub here (recall that you need a HPC account to access this):

https://jupyterhub.eres.qut.edu.au/hub/spawn

You will see a launcher tab on the right and the contents of your HPC home directory on the left.

The launcher tab will currently have two options for creating new Notebook: a Python Notebook and a Bash Notebook.

We need to first install R so we can create R Notebooks. We need to run a few simple commands on the terminal first to do this. Open up a terminal window by clicking on the ‘terminal’ button:

The run the following commands, one at a time (copy and paste these into the prompt and hit enter).

We’ll be creating an ‘environment’ using mamba

First, load the mamba module

module load mamba

Then, create a new environment called ‘rnotebook’

mamba create --name rnotebook

Then activate that environment

conda activate rnotebook

Then install R on that environment.

conda install -c r r-irkernel

This will take a while. Once it finishes, go back to the launcher tab and you’ll see a new R notebook option:

Now you’re ready to run R notebooks.

 

 

 

 

Gary dataset:

/work/liver/nextflow/sarek/individual/sarek_VCFs_annotation

All VCF files and annotation files are here.