...
This link specifically describes the steps to take to load Java and install Nextflow on our local HPC at QUT (Lyra): Nextflow
3B. Installing a suitable environment management system
...
By default the pipeline expects a single quality-filtered fastq file per sample.
If you want to provide raw fastq files, samples have to be specifically prepared with the QIAGEN QIAseq miRNA library kit. If you want to run the initial quality filtering step on your raw fastq files, you will need to set the
--qualityfilter
paramater totrue
in the config.file and specify the path to the directory which holds the required bowtie indices (using the--bowtie_db_dir
parameter) to: 1) filter non-informative reads (using the blacklist bowtie indices for the DERIVE_USABLE_READS process) and 2) optionally derive the origin of the filtered reads obtained (RNA_SOURCE_PROFILE process).The required fasta files are available at https://github.com/maelyg/bowtie_indices.git and bowtie indices can be built from these using the command:
Code Block git clone https://github.com/maelyg/bowtie_indices.git gunzip blacklist_v2.fasta.gz #you might need to activate your environment cached in either your conda or singularity environment in order to run bowtie #for example conda activate /path_to_cached_environment/virreport-77d02f3abe1d8ba5f8dfdff194142de9 #then run the bowtie command bowtie-build -f blacklist_v2.fasta blasklistblacklist
The directory in which the bowtie indices are located will need to be specified in the nextflow.config file:
Code Block params { bowtie_db_dir = '/path_to_bowtie_idx_directory' }
If you are interested to derive an RNA source profile of your fastq files you will need to specify:
Code Block params { rna_source_profile = true }
And build the other indices from the fasta files included in https://github.com/maelyg/bowtie_indices.git (i.e. rRNA, plant_tRNA, plant_noncoding, plant_pt_mt_other_genes, artefacts, miRNA, virus).
The quality filtering step will create the 00_quality_filtering folder under the results folder:
Code Block results/ ├── 00_quality_filtering └── sample_name │ ├── sample_name_18-25nt_cutadapt.log │ ├── sample_name_fastqc.html │ ├── sample_name_fastqc.zip │ ├── sample_name_21-22nt_cutadapt.log │ ├── sample_name_21-22nt.fastq.gz │ ├── sample_name_24nt_cutadapt.log │ ├── sample_name_blacklist_filter.log │ ├── sample_name_fastp.html │ ├── sample_name_fastp.json │ ├── sample_name_qual_filtering_cutadapt.log │ ├── sample_name_quality_trimmed_fastqc.html │ ├── sample_name_quality_trimmed_fastqc.zip │ ├── sample_name_quality_trimmed.fastq.gz │ ├── sample_name_read_length_dist.pdf │ ├── sample_name_read_length_dist.txt │ ├── sample_name_truseq_adapter_cutadapt.log │ └── sample_name_umi_tools.log └── qc_report ├── read_origin_counts.txt ├── read_origin_detailed_pc.txt ├── read_origin_pc_summary.txt ├── run_qc_report.txt └── run_read_size_distribution.pdf
If your sequencing run was split on multiple lanes, you might have several raw fastq files per sample, and you can directly feed these to the pipeline and specify the
--merge-lane
parameter. The fastq files will be collapsed to one fastq file before performing downstream analysis. The sample name used will be thesampleid
provided in the index.csv file. In the example below 2 fastq files were generated for 1 sample named CT103:
...