Page Comparison

...

This link specifically describes the steps to take to load Java and install Nextflow on our local HPC at QUT (Lyra): Nextflow

3B. Installing a suitable environment management system

...

By default the pipeline expects a single quality-filtered fastq file per sample.

If you want to provide raw fastq files, samples have to be specifically prepared with the QIAGEN QIAseq miRNA library kit. If you want to run the initial quality filtering step on your raw fastq files, you will need to set the --qualityfilter paramater to true in the config.file and specify the path to the directory which holds the required bowtie indices (using the --bowtie_db_dir parameter) to: 1) filter non-informative reads (using the blacklist bowtie indices for the DERIVE_USABLE_READS process) and 2) optionally derive the origin of the filtered reads obtained (RNA_SOURCE_PROFILE process).

The required fasta files are available at https://github.com/maelyg/bowtie_indices.git and bowtie indices can be built from these using the command:

Code Block

git clone https://github.com/maelyg/bowtie_indices.git
gunzip blacklist_v2.fasta.gz
#you might need to activate your environment cached in either your conda or singularity environment in order to run bowtie
#for example
conda activate /path_to_cached_environment/virreport-77d02f3abe1d8ba5f8dfdff194142de9
#then run the bowtie command
bowtie-build -f blacklist_v2.fasta blasklistblacklist

The directory in which the bowtie indices are located will need to be specified in the nextflow.config file:

Code Block
params { bowtie_db_dir = '/path_to_bowtie_idx_directory' }

If you are interested to derive an RNA source profile of your fastq files you will need to specify:

Code Block
params { rna_source_profile = true }

And build the other indices from the fasta files included in https://github.com/maelyg/bowtie_indices.git (i.e. rRNA, plant_tRNA, plant_noncoding, plant_pt_mt_other_genes, artefacts, miRNA, virus).

The quality filtering step will create the 00_quality_filtering folder under the results folder:

Code Block

results/
├── 00_quality_filtering
    └── sample_name
    │   ├── sample_name_18-25nt_cutadapt.log
    │   ├── sample_name_fastqc.html
    │   ├── sample_name_fastqc.zip
    │   ├── sample_name_21-22nt_cutadapt.log
    │   ├── sample_name_21-22nt.fastq.gz
    │   ├── sample_name_24nt_cutadapt.log
    │   ├── sample_name_blacklist_filter.log
    │   ├── sample_name_fastp.html
    │   ├── sample_name_fastp.json
    │   ├── sample_name_qual_filtering_cutadapt.log
    │   ├── sample_name_quality_trimmed_fastqc.html
    │   ├── sample_name_quality_trimmed_fastqc.zip
    │   ├── sample_name_quality_trimmed.fastq.gz
    │   ├── sample_name_read_length_dist.pdf
    │   ├── sample_name_read_length_dist.txt
    │   ├── sample_name_truseq_adapter_cutadapt.log
    │   └── sample_name_umi_tools.log
    └── qc_report
        ├── read_origin_counts.txt
        ├── read_origin_detailed_pc.txt
        ├── read_origin_pc_summary.txt
        ├── run_qc_report.txt
        └── run_read_size_distribution.pdf

If your sequencing run was split on multiple lanes, you might have several raw fastq files per sample, and you can directly feed these to the pipeline and specify the --merge-lane parameter. The fastq files will be collapsed to one fastq file before performing downstream analysis. The sample name used will be the sampleid provided in the index.csv file. In the example below 2 fastq files were generated for 1 sample named CT103:

...

Versions Compared

Old Version 17

New Version Current

Key

3B. Installing a suitable environment management system