Table of Contents |
---|
Public small RNA-seq data
...
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/scripts cp /work/training/2024/smallRNAseq/scripts/* $HOME/workshop/small_RNAseq/scripts/ ls -l $HOME/workshop/small_RNAseq/scripts/ |
Line 1: The -p indicates create parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/training/2024/smallRNAseq/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/small_RNAseq/scripts/
Line 3: List the files in the script folder
Copy multiple subdirectories and files using rsync
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/data/ rsync -rv /work/training/2024/smallRNAseq/data/ $HOME/workshop/small_RNAseq/data/ |
Line 1: The first command creates the folder /data/
Line 2: rsync copies all subfolders and files from the specified source folder to the selected destination folder. The -r = recursively will copy directories and files; -v = verbose messages of the transfer of files
Create a folder for running the nf-core small RNA-seq pipeline
...
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/run1_test mkdir -p $HOME/workshop/small_RNAseq/run2_smallRNAseq_human cd $HOME/workshop/small_RNAseq/ |
Lines 1-2: create sub-folders for each exercise
Line 3: change the directory to the folder “small_RNAseq”
Exercise 1: Running a test with nf-core sample data
First, let’s assess the execution of the nf-core/smrnaseq pipeline by running a test using sample data.
...
Code Block |
---|
cat launch_nf-core_smallRNAseq_human_test.pbs |
#!/bin/bash -l #PBS -N sRNAseq_test #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #PBS -m abe cd $PBS_O_WORKDIR #work in current directory (folder) module load java #load java and set up memory settings to run nextflow export NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/smrnaseq -r 2.3.1 \ -profile test,singularity \ --outdir results # run the test |
---|
where:
nextflow command: nextflow run
pipeline name: nf-core/smrnaseq
pipeline version: -r 2.3.1
container type and sample data: -profile test,singularity
output directory: --outdir results
Submitting the job
Now we can submit the small RNAseq test job to the HPC scheduler:
...
Monitoring the Run
Code Block |
---|
qjobs |
Exercise 2: Running the small RNA pipeline using public human data
The pipeline requires preparing at least 2 files:
Metadata file (samplesheet.csv) thatspecifies the “sample name” and “location of FASTQ files” ('Read 1').
PBS Pro script (launch_nf-core_smallRNAseq_human.pbs) with instructions to run the pipeline
How to create the metadata file (samplesheet.csv):
Change to the data folder directory:
...
Code Block |
---|
cp /work/training/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/small_RNAseq/data/human |
Note: you could replace ‘$HOME/workshop/small_RNAseq/data/human’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
#!/bin/bash -l #User defined variables. ########################################################## DIR='$HOME/workshop/small_RNAseq/data/human' INDEX='samplesheet.csv' ########################################################## #load python module module load python/3.10.8-gcccore-12.2.0 #fetch the script to create the sample metadata table wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py chmod +x fastq_dir_to_samplesheet.py #generate initial sample metadata file ./fastq_dir_to_samplesheet.py $DIR index.csv \ --strandedness auto \ --read1_extension .fastq.gz #format index file cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX} #Remove intermediate files: rm index.csv fastq_dir_to_samplesheet.py |
---|
...
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_human.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/small_RNAseq/data/human/samplesheet.csv $HOME/workshop/small_RNAseq/run2_smallRNAseq_human cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_human.pbs $HOME/workshop/small_RNAseq/run2_smallRNAseq_human cd $HOME/workshop/small_RNAseq/run2_smallRNAseq_human |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: Move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_human.pbs |
#!/bin/bash -l #PBS -N sRNAseq_human #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #PBS -m abe cd $PBS_O_WORKDIR #run the tasks in the current working directory module load java #load java and assign up to 4GB RAM memory for nextflow to use export NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/smrnaseq -r 2.3.1 \ -profile singularity \ --outdir results \ --input samplesheet.csv \ --genome GRCh38-local \ --mirtrace_species hsa \ --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \ --fastp_min_length 18 \ --fastp_max_length 30 \ --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \ --mature /work/training/smallRNAseq/data/mirbase/mature.fa \ --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \ -resume #run the small RNAseq pipeline |
---|
Submit the job to the HPC cluster:
...
Code Block |
---|
/work/training/2024/smallRNAseq/runs/run2_smallRNAseq_human |
The results of the miRNA profiling can be found in the folder call “edger”“mirna_quant/edger_qc”:
Code Block |
---|
├── results/ │ ├── bowtie_index edger│ ├── fastp │ ├── fastqc ├── genome│ ├── index ├── mirdeepmirna_quant │ ├── mirdeep2mirtrace ├── mirtop| ├── mirtrace ├── multiqc ├──| pipeline_info ├── samtools └── unmappedpipeline_info |
inside the “edger” “mirna_quant/edger_qc” folder find the “mature_counts.csv” file:
...
Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:
...
Code Block |
---|
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt |
Differential expression analysis using RStudio
Differential expression analysis for smRNA-Seq is similar to regular RNA-Seq. Since you have already done the step-wise analysis in session 3, in this session we will streamline the analysis by running a single R script.
...
a. Open Windows Explorer.
b. Go to: H:\workshop\small_RNAseq
c. Create a new folder here called ‘DESeq2’ (NOTE: R is case-sensitive, so it must be named exactly like this)
...
c. Hit the save button and save this file in the working directory you created above (H:\workshop\small_RNAseq\DESeq2). Name the R script ‘DESeq2.R’.
...
Using R Studio, create a Text File and paste in the contents of this script.
Save it as launch_R.pbs in H:\workshop\small_RNAseq\DESeq2 (Same folder as DESeq2.R (Remember, H: is pointed at your HPC Home Folder.
...