Copy of Copy of Hands-on smRNAseq training
Public small RNA-seq data
Public human small RNAseq data:
https://www.ebi.ac.uk/ena/browser/view/PRJNA861019 Integrative analysis of renal microRNA and mRNA to identify hub genes and pivotal pathways associated with Cyclosporine-induced acute kidney injury in mice
Work in the HPC |
---|
Before we start using the HPC, let’s start an interactive session:
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb
Get a copy of the scripts to be used in this module
Use the terminal to log into the HPC and create a /RNAseq/ folder to run the nf-core/rnaseq pipeline. For example:
mkdir -p $HOME/workshop/small_RNAseq/scripts
cp /work/training/small_rnaseq/scripts/* $HOME/workshop/small_RNAseq/scripts/
ls -l $HOME/workshop/small_RNAseq/scripts/
Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/
Copy public data to your $HOME
mkdir -p $HOME/workshop/small_RNAseq/data
cp /work/training/smallRNAseq/data/* $HOME/workshop/small_RNAseq/data/
# list the content of the $HOME/workshop/small_RNAseq/data/
Line 1: The first command creates the folder /scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ folder as noted by an asterisk to newly created $HOME/workshop/scripts/ folder
Line 3: a quick challenge - see the previous section for hints
Create a folder for running the nf-core small RNA-seq pipeline
Let’s create a “runs” folder to run the nf-core/rnaseq pipeline:
Lines 1-4: create sub-folders for each exercise
Line 5: change the directory to the folder “run1_test”
Line 6: print the current working directory
Exercise 1: Running a test with nf-core sample data
First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.
Copy the launch_nf-core_RNAseq_test.pbs
to the working directory
View the content of the script as follows:
Download Reference microRNA sequences from miRBase
First, let’s download a copy of miRBAse reference sequences, including hairpin and mature microRNA sequences.
microRNA mature sequences:
Hairpin sequences:
Fetch the genomic coordinated for precursors and mature sequences:
Alternatively, submit the following PBS Pro script to the cluster. Before running the script, create a ‘reference’ folder (i.e., /myteam/data/reference/ ).
Run a test
Before running the pipeline with real data, run the following test:
To submit the above command to the HPC cluster, prepare the following script:
Submitting the job
Once you have created the samplesheet.csv file and have a copy of the launch_nf-core_smallRNAseq_test.pbs script, submit the job to the HPC as follows:
Monitoring the Run
Use the command
to check on the job that you are running. Note, that Nextflow will launch additional jobs during the run.
You can also check the .nextflow.log file for details on what is going on.
Preparing a sample metadata file
Now let’s prepare a samplesheet.csv file that specifies the name of your samples and the location of the raw FASTQ files
To generate the above file, let’s use the following shell script (i.e., called “create_nf-core_smallRNAseq_samplesheet.sh”)
Assign to the “DIR” variable above the path where the raw FASTQ files are located. For example:
Copy and paste the path to the above script using VI or VIM (check prerequisites above).
Run the nextflow nf-core/smRNAseq pipeline.
Create a launch_nfsmRNAseq.pbs file that has the following information:
Submit the job to the HPC cluster:
Monitor the progress:
R differential expression script