Version 2.1.0 hands-on exercises

Download Reference microRNA sequences from miRBase

Fetch a copy of microRNA mature sequences:

wget https://www.mirbase.org/download/CURRENT/hairpin.fa
gzip -c hairpin.fa > hairpin.fa.gz

Hairpin sequences:

wget https://www.mirbase.org/download/CURRENT/mature.fa
gzip -c mature.fa.gz

Run a test

Before running the pipeline with real data, run the following test:

nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

To submit the above command to the HPC cluster prepare the following script:

#!/bin/bash -l
#PBS -N nfsmrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR
#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

Submitting the job

Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.

Submit the run with this command

qsub launch.pbs

Monitoring the Run

You can use the command

qstat -u $USER

Alternatively, use the command

qjobs

to check on the jobs, you are running. Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Preparing a sample metadata file

Now let’s prepare a samplesheet.csv file that specifies the name of your samples and the location of the raw FASTQ files

sample,fastq_1
SRR24302008,/path/to/raw/FASTQ/files/SRR24302008.fastq.gz
SRR24302009,/path/to/raw/FASTQ/files/SRR24302009.fastq.gz
SRR24302010,/path/to/raw/FASTQ/files/SRR24302010.fastq.gz
SRR24302011,/path/to/raw/FASTQ/files/SRR24302011.fastq.gz
SRR24302012,/path/to/raw/FASTQ/files/SRR24302012.fastq.gz
SRR24302013,/path/to/raw/FASTQ/files/SRR24302013.fastq.gz
SRR24302014,/path/to/raw/FASTQ/files/SRR24302014.fastq.gz
SRR24302015,/path/to/raw/FASTQ/files/SRR24302015.fastq.gz
SRR24302016,/path/to/raw/FASTQ/files/SRR24302016.fastq.gz
SRR24302017,/path/to/raw/FASTQ/files/SRR24302017.fastq.gz
SRR24302018,/path/to/raw/FASTQ/files/SRR24302018.fastq.gz
SRR24302019,/path/to/raw/FASTQ/files/SRR24302019.fastq.gz
SRR24302020,/path/to/raw/FASTQ/files/SRR24302020.fastq.gz
SRR24302021,/path/to/raw/FASTQ/files/SRR24302021.fastq.gz
SRR24302022,/path/to/raw/FASTQ/files/SRR24302022.fastq.gz
SRR24302023,/path/to/raw/FASTQ/files/SRR24302023.fastq.gz
SRR24302024,/path/to/raw/FASTQ/files/SRR24302024.fastq.gz
SRR24302025,/path/to/raw/FASTQ/files/SRR24302025.fastq.gz
SRR24302026,/path/to/raw/FASTQ/files/SRR24302026.fastq.gz
SRR24302027,/path/to/raw/FASTQ/files/SRR24302027.fastq.gz

To generate the above file, let’s use the following PBS Pro script (i.e., called “launch_create_smRNAseq_samplesheet.pbs”)

#!/bin/bash -l
#PBS -N samplesheet
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=12:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#User defined variables
##########################################################
DIR='/path/to/raw/FASTQ/files'
INDEX='samplesheet.csv'
##########################################################

#load python module
module load python/3.10.8-gcccore-12.2.0

#fetch the script to create the sample metadata table
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py

#generate initial sample metadata file
./fastq_dir_to_samplesheet.py  $DIR index.csv \
        --strandedness auto \
        --read1_extension .fastq.gz

#format index file
cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX}

#Remove intermediate files:
rm index.csv fastq_dir_to_samplesheet.py