Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This link specifically describes the steps to take to load Java and install Nextflow on our local HPC at QUT (Lyra): Nextflow

3B. Installing a suitable environment management system

...

  • By default the pipeline expects a single quality-filtered fastq file per sample.

  • If you want to provide raw fastq files, samples have to be specifically prepared with the QIAGEN QIAseq miRNA library kit. If you want to run the initial quality filtering step on your raw fastq files, you will need to set the --qualityfilter paramater to true in the config.file and specify the path to the directory which holds the required bowtie indices (using the --bowtie_db_dir parameter) to: 1) filter non-informative reads (using the blacklist bowtie indices for the DERIVE_USABLE_READS process) and 2) optionally derive the origin of the filtered reads obtained (RNA_SOURCE_PROFILE process).

    The required fasta files are available at https://github.com/maelyg/bowtie_indices.git and bowtie indices can be built from these using the command:

    Code Block
    git clone https://github.com/maelyg/bowtie_indices.git
    gunzip blacklist_v2.fasta.gz
    #you might need to activate your environment cached in either your conda or singularity environment in order to run bowtie
    #for example
    conda activate /path_to_cached_environment/virreport-77d02f3abe1d8ba5f8dfdff194142de9
    #then run the bowtie command
    bowtie-build -f blacklist_v2.fasta blasklistblacklist

    The directory in which the bowtie indices are located will need to be specified in the nextflow.config file:

    Code Block
    params {
      bowtie_db_dir = '/path_to_bowtie_idx_directory'
    }

    If you are interested to derive an RNA source profile of your fastq files you will need to specify:

    Code Block
    params {
      rna_source_profile = true
    }

    And build the other indices from the fasta files included in https://github.com/maelyg/bowtie_indices.git (i.e. rRNA, plant_tRNA, plant_noncoding, plant_pt_mt_other_genes, artefacts, miRNA, virus).

    The quality filtering step will create the 00_quality_filtering folder under the results folder:

    Code Block
    results/
    ├── 00_quality_filtering
        └── sample_name
        │   ├── sample_name_18-25nt_cutadapt.log
        │   ├── sample_name_fastqc.html
        │   ├── sample_name_fastqc.zip
        │   ├── sample_name_21-22nt_cutadapt.log
        │   ├── sample_name_21-22nt.fastq.gz
        │   ├── sample_name_24nt_cutadapt.log
        │   ├── sample_name_blacklist_filter.log
        │   ├── sample_name_fastp.html
        │   ├── sample_name_fastp.json
        │   ├── sample_name_qual_filtering_cutadapt.log
        │   ├── sample_name_quality_trimmed_fastqc.html
        │   ├── sample_name_quality_trimmed_fastqc.zip
        │   ├── sample_name_quality_trimmed.fastq.gz
        │   ├── sample_name_read_length_dist.pdf
        │   ├── sample_name_read_length_dist.txt
        │   ├── sample_name_truseq_adapter_cutadapt.log
        │   └── sample_name_umi_tools.log
        └── qc_report
            ├── read_origin_counts.txt
            ├── read_origin_detailed_pc.txt
            ├── read_origin_pc_summary.txt
            ├── run_qc_report.txt
            └── run_read_size_distribution.pdf

    If your sequencing run was split on multiple lanes, you might have several raw fastq files per sample, and you can directly feed these to the pipeline and specify the --merge-lane parameter. The fastq files will be collapsed to one fastq file before performing downstream analysis. The sample name used will be the sampleid provided in the index.csv file. In the example below 2 fastq files were generated for 1 sample named CT103:

...

Code Block
#!/bin/bash -l
#PBS -N VirReport
#PBS -l select=1:ncpus=2:mem=8gb
#PBS -l walltime=05:00:00


cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run eresearchqut/VirReport -profile singularity -resume --indexfile index.csv \
                                    --merge_lane --qualityfilter --rna_source_profile \
                                    --bowtie_db_dir /path_to_bowtie_indices \
                                    --dedup \
                                    --virreport_ncbi --blast_viral_db_path /path_to_ncbi_databases \
                                    --deteciondetection_reporting_nt

Example 3:

In the PBS job below, homology searches will be run against NCBI and the PVirDB. The pipeline will also run VirusDetect in parallel.

Code Block
breakoutModefull-width
#!/bin/bash -l
#PBS -N VirReport
#PBS -l select=1:ncpus=2:mem=8gb
#PBS -l walltime=05:00:00


cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run eresearchqut/VirReport -profile singularity -resume --indexfile index.csv \
                                    --merge_lane --qualityfilter --rna_source_profile \
                                    --bowtie_db_dir /path_to_bowtie_indices \
                                    --dedup \
                                    --virreport_ncbi --blast_viral_db_path /path_to_ncbi_databases --detecion_reporting_viral_nt \
                                    --virreport_viral_db --blast_viral_db_path /path_to_local_viral_database --deteciondetection_reporting_viral_db \
                                    --virusdetect --virusdetect_db_path
                                    
                                    
                                    
sampleid,samplepath
MT500,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/ELL11002/ELL11002A3/MT500_S3_L001_R1_001.fastq.gz
MT500,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/ELL11002/ELL11002A3/MT500_S3_L002_R1_001.fastq.gz
MT502,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/ELL11002/ELL11002A5/MT502_S5_L001_R1_001.fastq.gz
MT502,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/ELL11002/ELL11002A5/MT502_S5_L002_R1_001.fastq.gz
MT512,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/LEL11109/LEL11109A1/Fn1_S25_L001_R1_001.fastq.gz
MT512,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/LEL11109/LEL11109A1/Fn1_S25_L002_R1_001.fastq.gz
MT524,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/LEL11109/LEL11109A13/FraD3_S28_L001_R1_001.fastq.gz
MT524,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/LEL11109/LEL11109A13/FraD3_S28_L002_R1_001.fastq.gz
CT113,/work/hia_mt18005/raw_data/20220629_RAMACIOTTI_DES10730/DES10730A20/CT_113_S20_L001_R1_001.fastq.gz
CT113,/work/hia_mt18005/raw_data/20220629_RAMACIOTTI_DES10730/DES10730A20/CT_113_S20_L002_R1_001.fastq.gz
CT140,/work/hia_mt18005/raw_data/20220629_RAMACIOTTI_DES10730/DES10730A47/CT_140_S47_L001_R1_001.fastq.gz
CT140,/work/hia_mt18005/raw_data/20220629_RAMACIOTTI_DES10730/DES10730A47/CT_140_S47_L002_R1_001.fastq.gz
MT515,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/LEL11109/LEL11109A4/Cn1_S19_L001_R1_001.fastq.gz
MT515,/work/hia_mt18005/raw_data/20220915_RAMACIOTTI_ELL11002_LEL11109/LEL11109/LEL11109A4/Cn1_S19_L002_R1_001.fastq.gz
MT005,/work/hia_mt18005/raw_data/20210618_RAMACIOTTI_ELL9278/ELL9278/ELL9278A04/MT005_S4_L001_R1_001.fastq.gz
MT005,/work/hia_mt18005/raw_data/20210618_RAMACIOTTI_ELL9278/ELL9278/ELL9278A04/MT005_S4_L002_R1_001.fastq.gz
2223PEQ041,/work/hia_mt18005/raw_data/20221018_RAMACIOTTI_LEL11294/LEL11294/LEL11294A15/2223PEQ041_S15_L001_R1_001.fastq.gz
2223PEQ041,/work/hia_mt18005/raw_data/20221018_RAMACIOTTI_LEL11294/LEL11294/LEL11294A15/2223PEQ041_S15_L002_R1_001.fastq.gz
MT447,/work/hia_mt18005/raw_data/20220218_RAMACIOTTI_LEL10024/MT447_S40_L001_R1_001.fastq.gz
MT447,/work/hia_mt18005/raw_data/20220218_RAMACIOTTI_LEL10024/MT447_S40_L002_R1_001.fastq.gz
MT449,/work/hia_mt18005/raw_data/20220218_RAMACIOTTI_LEL10024/MT449_S33_L001_R1_001.fastq.gz
MT449,/work/hia_mt18005/raw_data/20220218_RAMACIOTTI_LEL10024/MT449_S33_L002_R1_001.fastq.gz
2223PEQ012,/work/hia_mt18005/raw_data/20221018_RAMACIOTTI_LEL11291/LEL11291/LEL11291A12/2223PEQ012_S12_L001_R1_001.fastq.gz
2223PEQ012,/work/hia_mt18005/raw_data/20221018_RAMACIOTTI_LEL11291/LEL11291/LEL11291A12/2223PEQ012_S12_L002_R1_001.fastq.gz

...