Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Samplesheet input
Nextflow pipelines generally need an input file, often referred to as a samplesheet, which contains information about the samples you would like to analyse.
...
The samplesheet has to be a comma-separated file with a minimum set of columns (which will vary depending of the pipeline you are interested to run), and a header row.
Examples of samplesheets
For the nf-core/smrnaseq pipeline, the samplesheet has to be a comma-separated file with the following 2 columns.
...
Column names has to be specified in a header row as shown in the samplesheet example below:
...
sample,fastq_1
Clone1_N1,s3://ngi-igenomes/test-data/smrnaseq/C1-N1-R1_S4_L001_R1_001.fastq.gz
Clone1_N3,s3://ngi-igenomes/test-data/smrnaseq/C1-N3-R1_S6_L001_R1_001.fastq.gz
Clone9_N1,s3://ngi-igenomes/test-data/smrnaseq/C9-N1-R1_S7_L001_R1_001.fastq.gz
Clone9_N2,s3://ngi-igenomes/test-data/smrnaseq/C9-N2-R1_S8_L001_R1_001.fastq.gz
Clone9_N3,s3://ngi-igenomes/test-data/smrnaseq/C9-N3-R1_S9_L001_R1_001.fastq.gz
Control_N1,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N1-R1_S1_L001_R1_001.fastq.gz
Control_N2,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N2-R1_S2_L001_R1_001.fastq.gz
Control_N3,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N3-R1_S3_L001_R1_001.fastq.gz
...
For the nf-core/rnaseq pipeline, the samplesheet has to be a comma-separated file with the following 4 columns:
...
Column names has to be specified in a header row as shown in the samplesheet example below:
...
sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto
...
Please note that in this example, the same sample (CONTROL_REP1) was sequenced across 3 lanes. The nf-core/sarek pipeline will concatenate the raw reads before performing any downstream analysis.
Exercise 1
The following samplesheet file for the nf-core/rnaseq pipeline consisting of both single- and paired-end data is ready for analysis.
...
Expand | ||
---|---|---|
| ||
There are 6 samples in total, as |
Exercise 2
Find what are the minimal columns required in the samplesheet to run nfcore/ampliseq
Expand | ||
---|---|---|
| ||
You will need to go to the usage page of nfcore/ampliseq which can be found at https://nf-co.re/ampliseq/2.9.0/docs/usage#samplesheet-input (make sure you are using the latest version of the pipeline). The input specification section will specify that the samplesheet must minimally contain 2 columns: |
Input folder
Some pipelines like nf-core/ampliseq will let you specify directly the path to the folder that contains your input FASTQ files, as an alternative to using a samplesheet.
...