Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
One of the core features of Nextflow is the ability to cache task executions and re-use them in subsequent runs to minimize duplicate work. Resumability is useful both for recovering from errors and for iteratively developing a pipeline.
Resume option
You can enable resumability in Nextflow with the -resume
flag when launching a pipeline with nextflow run
.
All task executions are automatically saved to the task cache, regardless of the -resume
option (so that you always have the option to resume later).
Structure of work folder
When nextflow Nextflow runs, it assigns a unique ID to each task. This unique ID is used to create a separate execution directory, within the work
directory, where the tasks are executed and the results stored. A task’s unique ID is generated as a 128-bit hash number.
...
When a task requires recomputation, iei.e. the conditions above are not fulfilled, the downstream tasks are automatically invalidated.
...
By default the pipeline results are cached in the directory work
where the pipeline is launched.
We can use the Bash tree
command to list the contents of the work directory. Note: By default tree does not print hidden files (those beginning with a dot .
). Use the -a
to view all files.
Code Block |
---|
tree -a work |
Provide a relevant example from test run
Example of work directoryExample of work directory from nf-core/smrnaseq pipeline:
Code Block |
---|
work/ ├── 1200 │ │ └── 5489f3c7dbd521c0e43f43b4c1f352ba88dbe5efa0127c9872d349b32714 ││ ├── .command.begin │ Clone9_N3.fastp.fastq.gz -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/36/1a0fc3ca0c220051ed3ca36d7d2974/Clone9_N3.fastp.fastq.gz │ ├── .command.err │Clone9_N3_mature.bam │ ├── .command.log │ Clone9_N3_mature.sam │ ├── .command.outbegin ││ ├── .command.runerr ││ ├── .command.shlog │ │ ├── .exitcodecommand.out ││ └── temp33_1_2.fq.gz -> /home/training/data/yeast/reads/temp33_1_2.fq.gz ├── 3b │ └── a3fb24ad3242e4cc8e5aa0c24d174b │ ├── .command.run │ ├── .command.beginsh │ │ ├── .command.errtrace │ │ ├── .command.logexitcode ││ ├── fasta_bidx.command1.outebwt │ -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/ef/afac598b79ec7f0bd3d8ee66506fa4/fasta_bidx.1.ebwt │ ├── fasta_bidx.command2.runebwt │ -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/ef/afac598b79ec7f0bd3d8ee66506fa4/fasta_bidx.2.ebwt │ ├── fasta_bidx.command3.shebwt │ -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/ef/afac598b79ec7f0bd3d8ee66506fa4/fasta_bidx.3.ebwt │ ├── .exitcode │ fasta_bidx.4.ebwt -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/ef/afac598b79ec7f0bd3d8ee66506fa4/fasta_bidx.4.ebwt │ └──├── temp33_2_fasta_bidx.rev.1.fq.gzebwt -> /mnt/hpccs01/home/training/data/yeast/reads/temp33_2_1.fq.gz ├── 4c │ └── 125b5e5a5ee144fa25dd9bccd467e9 │gauthiem/smrnaseq_cl/work/ef/afac598b79ec7f0bd3d8ee66506fa4/fasta_bidx.rev.1.ebwt │ ├── fasta_bidx.rev.command2.beginebwt │ -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/ef/afac598b79ec7f0bd3d8ee66506fa4/fasta_bidx.rev.2.ebwt │ ├── .command.err │unmapped │ │ ├── .command.log │└── Clone9_N3_mature_unmapped.fq.gz │ └── versions.yml ├── .command.out │ 02 │ └── 87c26bf3248f4fafea7bfd75f4dc8a │ ├── .command.run │ C1-N3-R1_S6_L001_R1_001.fastq.gz -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/stage-449030e3-e32d-4010-bc1c-9d6f1f62cc45/d2/11a8747c3a99874284cb3e80c0fa33/C1-N3-R1_S6_L001_R1_001.fastq.gz │ ├── .command.sh │ Clone1_N3.raw_fastqc.html │ ├── .exitcode │Clone1_N3.raw_fastqc.zip │ ├── └── temp33_3_1.fqClone1_N3.raw.gz -> /home/training/data/yeast/reads/temp33_3_1.fq.gz ├── 54 │ └── eb9d72e9ac24af8183de569ab0b977 │C1-N3-R1_S6_L001_R1_001.fastq.gz │ ├── .command.begin ││ ├── .command.err │ │ ├── .command.log │ │ ├── .command.out ││ ├── .command.run ││ ├── .command.sh │ │ ├── .exitcodecommand.trace ││ └── temp33_2_2.fq.gz -> /home/training/data/yeast/reads/temp33_2_2.fq.gz ├── e9 │├── .exitcode │ └── 31f28c291481342cc45d4e176a200aversions.yml │├── 04 │ ├── 4597052a2b5e6fcc4266f2461b0884 │ │ ├── .command.begin ││ │ ├── .command.err ││ │ ├── .command.log │ │ │ ├── .command.out ││ │ ├── .command.run ││ │ ├── .command.sh │ │ ├── .command.shtrace ││ │ ├── Control_N2.raw_fastqc.html │ │ ├── .exitcode │ └── temp33_1_1.fqControl_N2.raw_fastqc.zip │ │ ├── Control_N2.raw.gz -> Ctl-N2-R1_S2_L001_R1_001.fastq.gz │ │ ├── Ctl-N2-R1_S2_L001_R1_001.fastq.gz -> /mnt/hpccs01/home/training/data/yeast/reads/temp33_1_1.fq.gz └── fa └── cd3e49b63eadd6248aa357083763c1 gauthiem/smrnaseq_cl/work/stage-449030e3-e32d-4010-bc1c-9d6f1f62cc45/0a/a104e9b9dbac753afb050de1d079eb/Ctl-N2-R1_S2_L001_R1_001.fastq.gz │ │ ├── .exitcode │ │ └── versions.yml │ └── 9e40d148b1fe948d43fc18132875a0 │ ├── .command.begin Clone1_N1_mature_hairpin.bam -> /mnt/hpccs01/home/gauthiem/smrnaseq_cl/work/66/7553250d29e8612d19d1feb0347a58/Clone1_N1_mature_hairpin.bam │ ├── Clone1_N1_mature_hairpin.sorted.bam │ ├── .command.errbegin │ ├── .command.err │ ├── .command.log │ ├── .command.out │ ├── .command.run │ ├── .command.sh │ ├── .command.trace │ ├── .exitcode │ └── temp33_3_2.fq.gz├── hairpin.fa_igenome.fa_idx.fa -> /mnt/hpccs01/home/training/data/yeast/reads/temp33_3_2.fq.gzgauthiem/smrnaseq_cl/work/95/6d0db42195231bb9861871c7700798/hairpin.fa_igenome.fa_idx.fa │ └── versions.yml .... |
Task execution directory
Within the work
directory there are multiple task execution directories. There is one directory for each time a process is executed. These task directories are identified by the process execution hash. For example the task directory fa/cd3e49b63eadd6248aa357083763c1
would be location for the process identified by the hash fa/cd3e49
.
...
.command.sh
: The command script..command.run
: The file is a bash script that Nextflow generates to execute the .command.sh script, handling the necessary environment setup and command execution details..command.out
: The complete job standard output..command.err
: The complete job standard error..command.log
: The wrapper execution output..command.begin
: A file created as soon as the job is launched..exitcode
: A file containing the task exit code.Any task input files (symlinks)
Any task output files
Specifying another work directory
Depending on your script, this work folder can take a lot of disk space. You can specify another work directory using the command line option -w
. Note Using a different work directory will mean that any jobs will need to re-run from the beginning.
Clean the work directory
If you are sure you won’t resume your pipeline execution, you can clean the work folder using the nextflow clean
command. It is good practice to do so regularly.
Code Block |
---|
nextflow clean [run_name|session_id] [options] |
If no run name or session id is provided, it will clean the latest run.