Table of Contents |
---|
...
You should see several output directories and files have been created in your ‘ampliseq_test’ directory. These contain the test analysis results. Have a look through these, as they are similar to the output from a full ampliseq run (i.e. on your dataset).
Need instructions on setting up NextFlow tower
Q for Craig:
Do we need to add any of this to .nextflow/config file? Perhaps just for Tower?
process {
executor = 'pbspro'
scratch = 'true'
beforeScript = {
"""
mkdir -p /data1/whatmorp/singularity/mnt/session
source $HOME/.bashrc
source $HOME/.profile
"""
}
}
...
sampleID | forwardReads | reverseReads |
---|---|---|
groupA_1 | /home/myproject/fastq/sample1_S22_L001_R1.fastq.gz | /home/myproject/fastq/sample1_S22_L001_R2.fastq.gz |
groupA_2 | /home/myproject/fastq/sample2_S23_L001_R1.fastq.gz | /home/myproject/fastq/sample2_S23_L001_R2.fastq.gz |
etc… |
To create the manifest using awk, paste, sed:
List all the fastq files in the directory (both read pairs)
ls *_R1*.fastq.gz -lh | awk '{print $9}' > read1
ls *_R2*.fastq.gz -lh | awk '{print $9}' > read2
List the sample IDs. If the sample names are in the sample files, they can be extracted using sed. For example:
cat read1 | sed 's/_S.*//' > ID
The sample file names in this case are like such: ‘Raw8h_S10_L001_R1_001.fastq.gz’
The sample ID is ‘Raw8h’. The above sed command strips the characters after ‘_S’, leaving just the ID name. Depending on how your sample files are named, you can create a list of your sample IDs by modifying the above sed command.
Paste these together with the sample file directory prepended and tab delimiters. Output as ‘manifest.txt’.
paste ID read1 read2 | awk '{print $1 "\t" "/path/to/your/nextflow/myproject/fastq/" $2 "\t" "/path/to/your/nextflow/myproject/fastq/" $3}' > manifest.txt
Make sure you add the 3 column names at the top of each column: ‘sampleID’, ‘forwardReads’ and ‘reverseReads’
Metadata file
This is a tab separated values file (.tsv) that is required by QIIME2 to compare taxonomic diversity with phenotype (e.g. how diversity varies per experimental treatment). It contains the same sample IDs found in the manifest file and a column for each category of metadata you have for the samples. This may include sequence barcodes, experimental treatment group (e.g. high fat vs low fat) and any other measurements taken, such as age, date collected, tissue type, sex, collection location, weight, length, etc, etc, etc). QIIME2 will compare every metadata column with taxonomic results, then calculate and plot correlations and diversity indices. See here for more details:
...
If you haven’t been set up or have used the HPC previously, click on this link for information on how to get access to and use the HPC:
Need a link here for HPC access and usage
Creating a shared workspace on the HPC
...
To request a node using PBS, submit a shell script containing your RAM/CPU/analysis time requirements and the code needed to run your analysis. For an overview of submitting a PBS job, see here:
Need a link here for creating PBS jobs
Alternatively, you can start up an ‘interactive’ node, using the following:
...