Fetch a copy of microRNA mature sequences:
wget https://www.mirbase.org/download/CURRENT/hairpin.fa gzip -c hairpin.fa > hairpin.fa.gz |
Hairpin sequences:
wget https://www.mirbase.org/download/CURRENT/mature.fa gzip -c mature.fa.gz |
Before running the pipeline with real data, run the following test:
nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0 |
To submit the above command to the HPC cluster prepare the following script:
#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0 |
Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.
Submit the run with this command
qsub launch.pbs |
You can use the command
qstat -u $USER |
Alternatively, use the command
qjobs |
to check on the jobs, you are running. Nextflow will launch additional jobs during the run.
You can also check the .nextflow.log file for details on what is going on.
Now let’s prepare a samplesheet.csv file that specifies the name of your samples and the location of the raw FASTQ files
sample,fastq_1 SRR24302008,/path/to/raw/FASTQ/files/SRR24302008.fastq.gz SRR24302009,/path/to/raw/FASTQ/files/SRR24302009.fastq.gz SRR24302010,/path/to/raw/FASTQ/files/SRR24302010.fastq.gz SRR24302011,/path/to/raw/FASTQ/files/SRR24302011.fastq.gz SRR24302012,/path/to/raw/FASTQ/files/SRR24302012.fastq.gz SRR24302013,/path/to/raw/FASTQ/files/SRR24302013.fastq.gz SRR24302014,/path/to/raw/FASTQ/files/SRR24302014.fastq.gz SRR24302015,/path/to/raw/FASTQ/files/SRR24302015.fastq.gz SRR24302016,/path/to/raw/FASTQ/files/SRR24302016.fastq.gz SRR24302017,/path/to/raw/FASTQ/files/SRR24302017.fastq.gz SRR24302018,/path/to/raw/FASTQ/files/SRR24302018.fastq.gz SRR24302019,/path/to/raw/FASTQ/files/SRR24302019.fastq.gz SRR24302020,/path/to/raw/FASTQ/files/SRR24302020.fastq.gz SRR24302021,/path/to/raw/FASTQ/files/SRR24302021.fastq.gz SRR24302022,/path/to/raw/FASTQ/files/SRR24302022.fastq.gz SRR24302023,/path/to/raw/FASTQ/files/SRR24302023.fastq.gz SRR24302024,/path/to/raw/FASTQ/files/SRR24302024.fastq.gz SRR24302025,/path/to/raw/FASTQ/files/SRR24302025.fastq.gz SRR24302026,/path/to/raw/FASTQ/files/SRR24302026.fastq.gz SRR24302027,/path/to/raw/FASTQ/files/SRR24302027.fastq.gz |
To generate the above file, let’s use the following PBS Pro script (i.e., called “launch_create_smRNAseq_samplesheet.pbs”)
#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=12:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #User defined variables ########################################################## DIR='/path/to/raw/FASTQ/files' INDEX='samplesheet.csv' ########################################################## #load python module module load python/3.10.8-gcccore-12.2.0 #fetch the script to create the sample metadata table wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py chmod +x fastq_dir_to_samplesheet.py #generate initial sample metadata file ./fastq_dir_to_samplesheet.py $DIR index.csv \ --strandedness auto \ --read1_extension .fastq.gz #format index file cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX} #Remove intermediate files: rm index.csv fastq_dir_to_samplesheet.py |