Prepared by the eResearch Office, QUT.
...
Code Block |
---|
nextflow run nf-core/rnaseq \ --input samplesheetindex.csv \ --genome GRCm38 \ --aligner star_salmon \ -profile singularity \ -r 3.3 |
Note, if the running was interrupted or did not complete a particular step or you want to modify a parameter for a particular step, instead of re-running all process again nextflow enables to “-resume” the workflow.
Code Block |
---|
nextflow run nf-core/rnaseq \ --input samplesheetindex.csv \ --genome GRCm38 \ --aligner star \ -profile singularity \ -r 3.3 \ -resume |
Preparing a ‘samplesheet.csv’ file
A samplesheetPrepare an index.csv file tells containing the workflow the location of the read 1 (R1), read 2 (R2) and other information about the samples including ‘group’ (i.e., control or infected), replicate number and the orientation of the reads (i.e., forward, reverse, unstranded). Example samplesheetinformation of the samples to be processed. See below examples of index.csv files.
Example index.csv (previous versions):
...
To run this on the HPC a PBS submission script needs to be created .In the folder you have created for this run create using a text editor. For example, create a file called launch.pbs using a text editor of choice (i.e., vim, vi or nano) and then copy and paste the code below:
Code Block |
---|
#!/bin/bash -l #PBS -N nfrna2 #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 cd $PBS_O_WORKDIR module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq -profile singularity -r 3.3 --input index.csv --genome GRCm38 --aligner star_salmon |
Additional options:We recommend running the nextflow nf-core/rnaseq pipeline once and then assess the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Then, we can use the PBS script below to tell the pipeline to remove a defined number of bases from the 5'-end (--clip_r1
or --clip_r2
) or 3'-end (--three_prime_clip_r1
or --three_prime_clip_r2
). Also we can specify to remove ribosomal RNA as these sets of sequences are non-informative.
Code Block |
---|
#!/bin/bash -l #PBS -N nfrna2 #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' #run the rnaseq pipeline #with-dag can output files in .png, .pdf, .svg or .html nextflow run nf-core/rnaseq -profile conda --input samplesheet.csv \ --genome GRCm38 \ --aligned star_rsem \ --min_mapped_reads 5 \ --clip_r1 10 \ --clip_r2 10 \ --three_prime_clip_r1 2 \ --three_prime_clip_r2 2 \ --remove_ribo_rna \ -dump-channels \ -with-dag flowchart.png |
...