...
Running the pipeline using custom data
Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows:
Code Block |
---|
#download script and make executable
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py
#generate the samplesheet.csv file
fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \
--strandedness reverse \
--read1_extension R1.fastq.gz \
--read2_extension R2.fastq.gz |
Example of a typical command to run an RNA-seq analysis for mouse samples:
...
Preparing a ‘samplesheet.csv’ file
Prepare an index.csv file containing the information of the samples to be processed. See below examples of index.csv files.
Example index.csv (previous versions):
Code Block |
---|
group,replicate,fastq_1,fastq_2,strandedness
control,1,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded
control,2,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded
control,3,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded
infected,1,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded
infected,2,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded
infected,3,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded |
Index format for current version 3.3:
...
a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows:
Code Block |
---|
#download script and make executable
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py
#generate the samplesheet.csv file
fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \
--strandedness reverse \
--read1_extension R1.fastq.gz \
--read2_extension R2.fastq.gz |
Example index.csv (Version 3.10.1):
Code Block |
---|
sample,fastq_1,fastq_2,strandedness control_rep11,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded control_rep22,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded control_rep33,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded infected_rep11,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded infected_rep22,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded infected_rep33,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded |
...
Code Block |
---|
#!/bin/bash -l #PBS -N nfrna2 #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 cd $PBS_O_WORKDIR module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq -profile singularity -r 3.310.1 --input indexsamplesheet.csv --genome GRCm38 --aligner star_salmon |
We recommend running the nextflow nf-core/rnaseq pipeline once and then assess assessing the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Then, we can use the PBS script below to tell the pipeline to remove a defined number of bases from the 5'-end (--clip_r1
or --clip_r2
) or 3'-end (--three_prime_clip_r1
or --three_prime_clip_r2
). Also, we can specify to remove ribosomal RNA as these sets of sequences are non-informative.
Code Block |
---|
#!/bin/bash -l #PBS -N nfrna2 #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' #run the rnaseq pipeline #with-dag can output files in .png, .pdf, .svg or .html nextflow run nf-core/rnaseq -profile conda --input samplesheet.csv \ --genomeoutdir GRCm38results \ --aligner star_salmonr 3.10.1 \ --min_mapped_reads 5genome GRCh38 \ --clip_r1 10profile singularity \ --clipaligner star_r2rsem 10 \ --three_prime_clip_r1 210 \ --three_prime_clip_r2 210 \ --removethree_prime_riboclip_rna \ -dump-channels r1 1 \ -with-dag flowchart.png-three_prime_clip_r2 1 |
Submitting the job
Once you have created the folder for the run, the input.tsv file, nextflow.config, and launch.pbs, you are ready to submit.
Submit the run with this command (On Lyra)
...
Code Block |
---|
qstat -u $USER |
Alternatively, use the following command:
Code Block |
---|
qjobs |
To check on the jobs, you are running. Nextflow will launch additional jobs during the run.
...
Finally, if you have configured the connection to the NFTower, you can logon log on and check your run.