Copy of Copy of Hands-on smRNAseq training

Public small RNA-seq data

Public human small RNAseq data:

https://www.ebi.ac.uk/ena/browser/view/PRJNA861019 Integrative analysis of renal microRNA and mRNA to identify hub genes and pivotal pathways associated with Cyclosporine-induced acute kidney injury in mice

Work in the HPC

Work in the HPC

Before we start using the HPC, let’s start an interactive session:

qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb

Get a copy of the scripts to be used in this module

Use the terminal to log into the HPC and create a /RNAseq/ folder to run the nf-core/rnaseq pipeline. For example:

mkdir -p $HOME/workshop/small_RNAseq/scripts cp /work/training/small_rnaseq/scripts/* $HOME/workshop/small_RNAseq/scripts/ ls -l $HOME/workshop/small_RNAseq/scripts/
  • Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/

  • Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/

Copy public data to your $HOME

mkdir -p $HOME/workshop/small_RNAseq/data cp /work/training/smallRNAseq/data/* $HOME/workshop/small_RNAseq/data/ # list the content of the $HOME/workshop/small_RNAseq/data/
  • Line 1: The first command creates the folder /scripts/

  • Line 2: Copies all files from /work/datasets/workshop/scripts/ folder as noted by an asterisk to newly created $HOME/workshop/scripts/ folder

  • Line 3: a quick challenge - see the previous section for hints

Create a folder for running the nf-core small RNA-seq pipeline

Let’s create a “runs” folder to run the nf-core/rnaseq pipeline:

  • Lines 1-4: create sub-folders for each exercise

  • Line 5: change the directory to the folder “run1_test”

  • Line 6: print the current working directory

Exercise 1: Running a test with nf-core sample data

First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.

Copy the launch_nf-core_RNAseq_test.pbs to the working directory

View the content of the script as follows:

Download Reference microRNA sequences from miRBase

First, let’s download a copy of miRBAse reference sequences, including hairpin and mature microRNA sequences.

microRNA mature sequences:

Hairpin sequences:

Fetch the genomic coordinated for precursors and mature sequences:

Alternatively, submit the following PBS Pro script to the cluster. Before running the script, create a ‘reference’ folder (i.e., /myteam/data/reference/ ).

Run a test

Before running the pipeline with real data, run the following test:

To submit the above command to the HPC cluster, prepare the following script:

Submitting the job

Once you have created the samplesheet.csv file and have a copy of the launch_nf-core_smallRNAseq_test.pbs script, submit the job to the HPC as follows:

Monitoring the Run

Use the command

to check on the job that you are running. Note, that Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Preparing a sample metadata file

Now let’s prepare a samplesheet.csv file that specifies the name of your samples and the location of the raw FASTQ files

To generate the above file, let’s use the following shell script (i.e., called “create_nf-core_smallRNAseq_samplesheet.sh”)

Assign to the “DIR” variable above the path where the raw FASTQ files are located. For example:

Copy and paste the path to the above script using VI or VIM (check prerequisites above).

Run the nextflow nf-core/smRNAseq pipeline.

Create a launch_nfsmRNAseq.pbs file that has the following information:

Submit the job to the HPC cluster:

Monitor the progress:

 

R differential expression script