Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Running the pipeline using custom data

Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows:

Code Block
#download script and make executable
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py

#generate the samplesheet.csv file
fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \
    --strandedness reverse \
    --read1_extension R1.fastq.gz \
    --read2_extension R2.fastq.gz

Example of a typical command to run an RNA-seq analysis for mouse samples:

...

Preparing a ‘samplesheet.csv’ file

Prepare an index.csv file containing the information of the samples to be processed. See below examples of index.csv files.

Example index.csv (previous versions):

Code Block
group,replicate,fastq_1,fastq_2,strandedness
control,1,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded
control,2,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded
control,3,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded
infected,1,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded
infected,2,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded
infected,3,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded

Index format for current version 3.3:

...

a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows:

Code Block
#download script and make executable
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py

#generate the samplesheet.csv file
fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \
    --strandedness reverse \
    --read1_extension R1.fastq.gz \
    --read2_extension R2.fastq.gz

Example index.csv (Version 3.10.1):

Code Block
sample,fastq_1,fastq_2,strandedness
control_rep11,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded
control_rep22,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded
control_rep33,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded
infected_rep11,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded
infected_rep22,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded
infected_rep33,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded

...

Code Block
#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/rnaseq -profile singularity -r 3.310.1 --input indexsamplesheet.csv --genome GRCm38 --aligner star_salmon

We recommend running the nextflow nf-core/rnaseq pipeline once and then assess assessing the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Then, we can use the PBS script below to tell the pipeline to remove a defined number of bases from the 5'-end (--clip_r1 or --clip_r2) or 3'-end (--three_prime_clip_r1 or --three_prime_clip_r2). Also, we can specify to remove ribosomal RNA as these sets of sequences are non-informative.

Code Block
#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the rnaseq pipeline
#with-dag can output files in
.png, .pdf, .svg or .html
nextflow run nf-core/rnaseq -profile conda  --input samplesheet.csv \
        --genomeoutdir GRCm38results \
        --aligner star_salmonr 3.10.1 \
        --min_mapped_reads 5genome GRCh38 \
        --clip_r1 10profile singularity \
        --clipaligner star_r2rsem 10 \
        --three_prime_clip_r1 210 \
        --three_prime_clip_r2 210 \
        --removethree_prime_riboclip_rna \
        -dump-channels r1 1 \
        -with-dag flowchart.png-three_prime_clip_r2 1

Submitting the job

Once you have created the folder for the run, the input.tsv file, nextflow.config, and launch.pbs, you are ready to submit.

Submit the run with this command (On Lyra)

...

Code Block
qstat -u $USER

Alternatively, use the following command:

Code Block
qjobs

To check on the jobs, you are running. Nextflow will launch additional jobs during the run.

...

Finally, if you have configured the connection to the NFTower, you can logon log on and check your run.