/
25S1W1 - Preparing working directory

25S1W1 - Preparing working directory

Prior running the nf-core/sarek pipeline with real data, we will first prepare the working directory copy scripts and data that we will need to do the exercises.

Work on the HPC (aqua)

Work on the HPC (aqua)

Before we start using the HPC, let’s start an interactive session:

qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb

You should be in your home directory, if unsure you can run the following command:

cd ~

List the existing files and folders:

ls -l

Let’s create a folder for the workshop:

Option #1: recommend

mkdir -p $HOME/workshop/2025/S1W1/variant_calling

Option #2: making directories one at a time:

mkdir $HOME/workshop mkdir $HOME/workshop/2025/ mkdir $HOME/workshop/2025/S1W1/ mkdir $HOME/workshop/2025/S1W1/variant_calling

 

Get a copy of the scripts to be used in this module

Now let’s create a ‘scripts’ folder and copy all scripts that we will using in the session:

mkdir -p $HOME/workshop/2025/S1W1/variant_calling/scripts cp /work/training/2025/S1W1/session2_variant_calling/scripts/* $HOME/workshop/2025/S1W1/variant_calling/scripts/
  • Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/sarek/scripts/

  • Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/sarek/scripts/

Let’s check the list of files copied:

ls -l $HOME/workshop/2025/S1W1/variant_calling/scripts/
. ├── create_samplesheet_nf-core_sarek.py ├── launch_nf-core_sarek_liver.pbs ├── launch_nf-core_sarek_trio.pbs ├── run_create_sarek_samplesheet.sh └── samplesheet.csv

Create folders for running the nf-core/sarek pipeline

Let’s create an “RNAseq” folder to run the nf-core/rnaseq pipeline and move into it. For example:

mkdir -p $HOME/workshop/2025/S1W1/variant_calling/runs/run1_trio mkdir -p $HOME/workshop/2025/S1W1/variant_calling/runs/run2_liver cd $HOME/workshop/2025/S1W1/variant_calling
  • Lines 1-3: create sub-folders for each exercise

  • Line 4: change the directory to the folder “run1_trio”

(Optional): Running a test with nf-core sample data

First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.

Copy the launch_nf-core_sarek_test.pbs to the working directory

mkdir -p $HOME/workshop/2025/S1W1/variant_calling/runs/run_test cd $HOME/workshop/2025/S1W1/variant_calling/runs/run_test cp $HOME/workshop/2025/S1W1/variant_calling/scripts/launch_nf-core_sarek_test.pbs .

View the content of the script as follows:

cat launch_nf-core_sarek_test.pbs
image-20250316-230322.png
  • nextflow command: nextflow run

  • pipeline name: nf-core/sarek

  • pipeline version: -r 3.4.4

  • container type and sample data: -profile test,singularity

  • output directory: --outdir results

Submitting the job

Submit the test job to the HPC cluster as follows:

qsub launch_nf-core_sarek_test.pbs

Monitoring the Run

qjobs

Outputs:

The test run should take about ~14 min to complete. Find run outputs in the “results” folder:

results/ ├── csv │   ├── markduplicates.csv │   ├── markduplicates_no_table.csv │   ├── recalibrated.csv │   └── variantcalled.csv ├── multiqc │   ├── multiqc_data │   ├── multiqc_plots │   └── multiqc_report.html ├── pipeline_info │   ├── execution_report_2024-05-08_15-28-38.html │   ├── execution_timeline_2024-05-08_15-28-38.html │   ├── execution_trace_2024-05-08_15-28-38.txt │   ├── params_2024-05-08_15-41-30.json │   ├── pipeline_dag_2024-05-08_15-28-38.html │   └── software_versions.yml ├── preprocessing │   ├── markduplicates │   ├── recalibrated │   └── recal_table ├── reports │   ├── bcftools │   ├── fastqc │   ├── markduplicates │   ├── mosdepth │   ├── samtools │   └── vcftools ├── tabix │   ├── genome.bed.gz │   └── genome.bed.gz.tbi └── variant_calling └── strelka

Once the pipeline has finished running - Assess the QC report:

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

To browse the working folder in the HPC type in the file finder:

Windows PC

\\hpc-fs\work\training\rnaseq

Mac

smb://hpc-fs/work/training/rnaseq

Evaluate the nucleotide distributions in the 5'-end and 3'-end of the sequenced reads (Read1 and Read2). Look into the “MultiQC” folder and open the provided HTML report.

Go to next section: 25S1W1 - Case study 1: GiB family trio

Related content