Overview
Create a metadata “samplesheet.csv” for small RNAseq datasets.
Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.
Preparing the pipeline inputs
The pipeline requires preparing at least 2 files:
...
PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline
...
Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data
A. Create the metadata file (samplesheet.csv):
Change to the data folder directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Copy the bash script to the working folder
...
Overview
Create a metadata “samplesheet.csv” for small RNAseq datasets.
Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.
Preparing the pipeline inputs
The pipeline requires preparing at least 2 files:
Metadata file (samplesheet.csv) thatspecifies the name of the samples, location of FASTQ files ('Read 1' and ‘Read 2’), and strandedness (forward, reverse, or auto. Note: auto is used when the strandedness of the data is unknown)
PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline
Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data
A. Create the metadata file (samplesheet.csv):
Change to the data folder directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Copy the bash script to the working folder
Code Block |
---|
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
...
NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc
Let’s generate the metadata file by running the following command:
Code Block |
---|
sh create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Check the newly created samplesheet.csv file:
Code Block |
---|
cat samplesheet.csv |
sample,fastq_1 ERR409878,/work/training/2024/smallRNAseq/data/human_disease/ERR409878.fastq.gz ERR409879,/work/training/2024/smallRNAseq/data/human_disease/ERR409879.fastq.gz ERR409880,/work/training/2024/smallRNAseq/data/human_disease/ERR409880.fastq.gz ERR409881,/work/training/2024/smallRNAseq/data/human_disease/ERR409881.fastq.gz ERR409882,/work/training/2024/smallRNAseq/ |
---|
...
data/ |
---|
...
human_disease/ERR409882.fastq.gz ERR409883,/work/training/2024/smallRNAseq/data/human_disease/ERR409883.fastq.gz ERR409884,/work/training/2024/smallRNAseq/data/human_disease |
---|
...
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
...
NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc
Let’s generate the metadata file by running the following command:
...
/ERR409884.fastq.gz ERR409885,/work/training/2024/smallRNAseq/data/human_disease/ERR409885.fastq.gz ERR409886,/work/training/2024/smallRNAseq/data/human_disease/ERR409886.fastq.gz ERR409887,/work/training/2024/smallRNAseq/data/human_disease/ERR409887.fastq.gz ERR409888,/work/training/2024/smallRNAseq/data/human_disease/ERR409888.fastq.gz ERR409889,/work/training/2024/smallRNAseq/data/human_disease |
---|
...
Check the newly created samplesheet.csv file:
Code Block |
---|
cat samplesheet.csv |
sample,fastq_1 ERR409878/ERR409889.fastq.gz ERR409890,/work/training/2024/smallRNAseq/data/human_disease/ERR409890.fastq.gz ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409878ERR409891.fastq.gz ERR409879ERR409892,/work/training/2024/smallRNAseq/data/human_disease/ERR409879ERR409892.fastq.gz ERR409880ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409880ERR409893.fastq.gz ERR409881ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409881ERR409894.fastq.gz ERR409882ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409882ERR409895.fastq.gz ERR409883ERR409896,/work/training/2024/smallRNAseq/data/human_disease/ERR409883ERR409896.fastq.gz ERR409884ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409884ERR409897.fastq.gz ERR409885ERR409898,/work/training/2024/smallRNAseq/data/human_disease/ERR409885ERR409898.fastq.gz ERR409886ERR409899,/work/training/2024/smallRNAseq/data/human_disease/ERR409886ERR409899.fastq.gz ERR409887ERR409900,/work/training/2024/smallRNAseq/data/human_disease/ERR409887ERR409900.fastq.gzERR409888,/work/training/2024/ |
---|
B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/ |
...
ERR409889,/work/training/2024/smallRNAseq/data/human_disease/ERR409889.fastq.gz
...
samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase cp /work/training/2024/smallRNAseq/ |
...
ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409891.fastq.gz
...
scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase cp /work/training/2024/smallRNAseq/ |
...
ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409893.fastq.gz
ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409894.fastq.gz
ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409895.fastq.gz
ERR409896,/work/training/2024/smallRNAseq/data/human_disease/ERR409896.fastq.gz
ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409897.fastq.gz
ERR409898,/work/training/2024/smallRNAseq/data/human_disease/ERR409898.fastq.gz
ERR409899,/work/training/2024/smallRNAseq/data/human_disease/ERR409899.fastq.gz
ERR409900,/work/training/2024/smallRNAseq/data/human_disease/ERR409900.fastq.gz
B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: Copy the nextflow.config file from shared folder to my working directory.
Line 4: move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
catscripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: Copy the nextflow.config file from shared folder to my working directory.
Line 4: move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_miRBase.pbs |
#!/bin/bash -l #PBS -N nfsmallRNAseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #run the tasks in the current working directory cd $PBS_O_WORKDIR #load java and assign up to 4GB RAM memory for nextflow to use module load java export NXF_OPTS='-Xms1g -Xmx4g'
#run the small RNAseq pipeline nextflow run nf-core/smrnaseq -r 2.3.1 \ -profile singularity \ --outdir results \ --input samplesheet.csv \ --genome GRCh38-local \ --mirtrace_species hsa \ --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \ --fastp_min_length 18 \ --fastp_max_length 30 \ --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \ --mature /work/training/smallRNAseq/data/mirbase/mature.fa \ --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \ -resume |
---|
Print the “nextflow.config” file (see below). Note: if a config file is placed in the working folder it can override parameters define by the global ~/.nextflow/config file or the config file define as part of the pipeline.
Code Block |
---|
singularity {
runOptions = '-B $HOME/.nextflow/assets/nf-core/smrnaseq/assets'
} |
Submit the job to the HPC cluster:
Code Block |
---|
qsub launch_nf-core_smallRNAseq_miRBase.pbs |
#!/bin/bash -l
#PBS -N nfsmallRNAseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
#run the tasks in the current working directory
cd $PBS_O_WORKDIR
#load java and assign up to 4GB RAM memory for nextflow to use
module load java
export NXF_OPTS='-Xms1g -Xmx4g'
#run the small RNAseq pipeline
nextflow run nf-core/smrnaseq -r 2.3.1 \
-profile singularity \
--outdir results \
--input samplesheet.csv \
--genome GRCh38-local \
--mirtrace_species hsa \
--three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \
--fastp_min_length 18 \
--fastp_max_length 30 \
--hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \
--mature /work/training/smallRNAseq/data/mirbase/mature.fa \
--mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \
-resume
Print the “nextflow.config” file (see below). Note: if a config file is placed in the working folder it can override parameters define by the global ~/.nextflow/config file or the config file define as part of the pipeline.
Code Block |
---|
singularity {
runOptions = '-B $HOME/.nextflow/assets/nf-core/smrnaseq/assets'
} |
Submit the job to the HPC cluster:
Code Block |
---|
qsub launch_nf-core_smallRNAseq_miRBase.pbs |
Monitor the progress:
Code Block |
---|
qjobs |
The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.
Outputs
Code Block |
---|
results/
├── bowtie_index
│ ├── mirna_hairpin
│ └── mirna_mature
├── fastp
│ └── on_raw
├── fastqc
│ ├── raw
│ └── trimmed
├── mirna_quant
│ ├── bam
│ ├── edger_qc <----- Expression mature miRNA (mature_counts.csv) and precursor-miRNAs (haripin_counts.csv) counts can be found in this subfolder.
│ ├── mirtop
│ ├── reference
│ └── seqcluster
├── mirtrace
│ ├── mirtrace-report.html
│ ├── mirtrace-results.json
│ ├── mirtrace-stats-contamination_basic.tsv
│ ├── mirtrace-stats-contamination_detailed.tsv
│ ├── mirtrace-stats-length.tsv
│ ├── mirtrace-stats-mirna-complexity.tsv
│ ├── mirtrace-stats-phred.tsv
│ ├── mirtrace-stats-qcstatus.tsv
│ ├── mirtrace-stats-rnatype.tsv
│ ├── qc_passed_reads.all.collapsed
│ └── qc_passed_reads.rnatype_unknown.collapsed
├── multiqc
│ ├── multiqc_data
│ ├── multiqc_plots
│ └── multiqc_report.html
└── pipeline_info
├── execution_report_2024-08-20_16-55-53.html
├── execution_timeline_2024-08-20_16-55-53.html
├── execution_trace_2024-08-20_16-55-53.txt
├── nf_core_smrnaseq_software_mqc_versions.yml
├── params_2024-08-20_16-56-04.json
└── pipeline_dag_2024-08-20_16-55-53.html |
STEP1: copy metadata (samplesheet.csv) into the working folder (run2_RNAseq)
Code Block |
---|
cp $HOME/workshop/2024-2/session4_RNAseq/data/mouse/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: move to the working directory
Copy the PBS Pro script to run the nf-core/rnaseq pipeline:
Code Block |
---|
cp $HOME/workshop/2024-2/session4_RNAseq/scripts/launch_nf-core_RNAseq_pipeline.pbs $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq |
NOTE: if you had issues with the above lines. Alternatively, run the following code to copy the sample sheet.csv and launch files:
Code Block |
---|
cp /work/training/2024/rnaseq/data/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
cp /work/training/2024/rnaseq/scripts/launch_nf-core_RNAseq_pipeline.pbs
cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq |
Adjusting the Trim Galore (read trimming) options
Print the content of the launch_RNAseq.pbs
script:
Code Block |
---|
cat launch_nf-core_RNAseq_pipeline.pbs |
Monitor the progress:
Code Block |
---|
qjobs |
The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.
Outputs
The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:
Code Block |
---|
results/
├── bowtie_index
│ ├── mirna_hairpin
│ └── mirna_mature
├── fastp
│ └── on_raw
├── fastqc
│ ├── raw
│ └── trimmed
├── mirna_quant
│ ├── bam
│ ├── edger_qc <----- Expression mature miRNA (mature_counts.csv) and precursor-miRNAs (haripin_counts.csv) counts can be found in this subfolder.
│ ├── mirtop
│ ├── reference
│ └── seqcluster
├── mirtrace
│ ├── mirtrace-report.html
│ ├── mirtrace-results.json
│ ├── mirtrace-stats-contamination_basic.tsv
│ ├── mirtrace-stats-contamination_detailed.tsv
│ ├── mirtrace-stats-length.tsv
│ ├── mirtrace-stats-mirna-complexity.tsv
│ ├── mirtrace-stats-phred.tsv
│ ├── mirtrace-stats-qcstatus.tsv
│ ├── mirtrace-stats-rnatype.tsv
│ ├── qc_passed_reads.all.collapsed
│ └── qc_passed_reads.rnatype_unknown.collapsed
├── multiqc
│ ├── multiqc_data
│ ├── multiqc_plots
│ └── multiqc_report.html
└── pipeline_info
├── execution_report_2024-08-20_16-55-53.html
├── execution_timeline_2024-08-20_16-55-53.html
├── execution_trace_2024-08-20_16-55-53.txt
├── nf_core_smrnaseq_software_mqc_versions.yml
├── params_2024-08-20_16-56-04.json
└── pipeline_dag_2024-08-20_16-55-53.html |
The quantification of the mature miRNA and hairpin expressions can be found in the /results/mirna_quant/edger_qc directory.
Code Block |
---|
cd /results/mirna_quant/edger_qc |
Code Block |
---|
hairpin_counts.csv
hairpin_CPM_heatmap.pdf
hairpin_edgeR_MDS_distance_matrix.txt
hairpin_edgeR_MDS_plot_coordinates.txt
hairpin_edgeR_MDS_plot.pdf
hairpin_log2CPM_sample_distances_dendrogram.pdf
hairpin_log2CPM_sample_distances_heatmap.pdf
hairpin_log2CPM_sample_distances.txt
hairpin_logtpm.csv
hairpin_logtpm.txt
hairpin_normalized_CPM.txt
hairpin_unmapped_read_counts.txt
mature_counts.csv <----- Expression mature miRNA. This file will be used to identify differentially expressed miRNAs (Session 7)
mature_CPM_heatmap.pdf
mature_edgeR_MDS_distance_matrix.txt
mature_edgeR_MDS_plot_coordinates.txt
mature_edgeR_MDS_plot.pdf
mature_log2CPM_sample_distances_dendrogram.pdf
mature_log2CPM_sample_distances_heatmap.pdf
mature_log2CPM_sample_distances.txt
mature_logtpm.csv
mature_logtpm.txt
mature_normalized_CPM.txt
mature_unmapped_read_counts.txt |
Submitting the job
Code Block |
---|
qsub launch_nf-core_RNAseq_pipeline.pbs |
Monitoring the Run
Code Block |
---|
qjobs |
Outputs
The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:
...
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/scripts cp /work/training/smallRNAseq/scripts/* $HOME/workshop/small_RNAseq/scripts/ ls -l $HOME/workshop/small_RNAseq/scripts/ |
Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/
Line 3: List the files in the script folder
Copy multiple subdirectories and files using rsync
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/data/ rsync -rv /work/training/smallRNAseq/data/ $HOME/workshop/small_RNAseq/data/ |
Line 1: The first command creates the folder /scripts/
Line 2: rsync copies all subfolders and files from the specified source folder to the selected destination folder. The -r = recursively will copy directories and files; -v = verbose messages of the transfer of files
Create a folder for running the nf-core small RNA-seq pipeline
...
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq mkdir $HOME/workshop/small_RNAseq/run1_test mkdir $HOME/workshop/small_RNAseq/run2_smallRNAseq_human cd $HOME/workshop/small_RNAseq/ |
Lines 1-4: create sub-folders for each exercise
Line 5: change the directory to the folder “small_RNAseq”
Exercise 1: Running a test with nf-core sample data
First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.
...
Code Block |
---|
cat launch_nf-core_smallRNAseq_test.pbs |
#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java export NXF_OPTS='-Xms1g -Xmx4g' # run the test nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0 |
---|
where:
nextflow command: nextflow run
pipeline name: nf-core/smrnaseq
pipeline version: -r 2.1.0
container type and sample data: -profile test,singularity
output directory: --outdir results
Submitting the job
Now we can submit the small RNAseq test job to the HPC scheduler:
...
Monitoring the Run
Code Block |
---|
qjobs |
Exercise 2: Running the small RNA pipeline using public human data
The pipeline requires preparing at least 2 files:
Metadata file (samplesheet.csv) thatspecifies the “sample name” and “location of FASTQ files” ('Read 1').
PBS Pro script (launch_nf-core_smallRNAseq_human.pbs) with instructions to run the pipeline
Create the metadata file (samplesheet.csv):
Change to the data folder directory:
...
Code Block |
---|
cp /work/training/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/small_RNAseq/data/human |
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
#!/bin/bash -l #User defined variables. ########################################################## DIR='$HOME/workshop/small_RNAseq/data/human' INDEX='samplesheet.csv' ########################################################## #load python module module load python/3.10.8-gcccore-12.2.0 #fetch the script to create the sample metadata table wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py chmod +x fastq_dir_to_samplesheet.py #generate initial sample metadata file ./fastq_dir_to_samplesheet.py $DIR index.csv \ --strandedness auto \ --read1_extension .fastq.gz #format index file cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX} #Remove intermediate files: rm index.csv fastq_dir_to_samplesheet.py |
---|
...
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_human.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/small_RNAseq/data/human/samplesheet.csv $HOME/workshop/small_RNAseq/run2_smallRNAseq_human cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_human.pbs $HOME/workshop/small_RNAseq/run2_smallRNAseq_human cd $HOME/workshop/small_RNAseq/run2_smallRNAseq_human |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_human.pbs |
#!/bin/bash -l #PBS -N nfsmallRNAseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #PBS -m abe
#run the tasks in the current working directory cd $PBS_O_WORKDIR #load java and assign up to 4GB RAM memory for nextflow to use module load java export NXF_OPTS='-Xms1g -Xmx4g'
#run the small RNAseq pipeline nextflow run nf-core/smrnaseq -r 2.1.0 \ -profile singularity \ --outdir results \ --input samplesheet.csv \ --genome GRCh38-local \ --mirtrace_species hsa \ --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \ --fastp_min_length 18 \ --fastp_max_length 30 \ --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \ --mature /work/training/smallRNAseq/data/mirbase/mature.fa \ --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \ -resume |
---|
...
Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:
...