Table of Contents |
---|
...
If you haven’t been set up or have used the HPC previously, click on this link for information on how to get access to and use the HPC:
Need a link here for HPC access and usage
Creating a shared workspace on the HPC
...
To request a node using PBS, submit a shell script containing your RAM/CPU/analysis time requirements and the code needed to run your analysis. For an overview of submitting a PBS job, see here:
Need a link here for creating PBS jobs
Alternatively, you can start up an ‘interactive’ node, using the following:
...
ccs generates a report file, called ‘ccs_report.txt’ by default. You should give it a unique name by adding --report-file <sample_name>_ccs_report.txt
(e.g. ‘samples_pool_1_ccs_report.txt') to the ccs command, or else when you run ccs on another sample the previous ccs_report.txt will be overwritten.
ccs with Q20 (default)
...
m54105_201123_035001
...
m54105_201125_211823
...
m54105_201125_005856
...
m54105_201126_173825
...
ZMWs input (A)
...
330737
...
263116
...
217071
...
ZMWs generating CCS (B)
...
251126 (75.93%)
...
208122 (79.10%)
...
168434 (77.59%)
...
138080 (80.85%)
...
ZMWs filtered (C)
...
79611 (24.07%)
...
54994 (20.90%)
...
48637 (22.41%)
...
32713 (19.15%)
...
Exclusive ZMW counts for (C)
...
Median length filter
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Below SNR threshold
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Lacking full passes
...
54483 (68.44%)
...
39690 (72.17%)
...
32964 (67.78%)
...
22249 (68.01%)
...
Heteroduplex insertions
...
470 (0.59%)
...
306 (0.56%)
...
280 (0.58%)
...
191 (0.58%)
...
Coverage drops
...
82 (0.10%)
...
34 (0.06%)
...
53 (0.11%)
...
32 (0.10%)
...
Insufficient draft cov
...
1045 (1.31%)
...
889 (1.62%)
...
635 (1.31%)
...
610 (1.86%)
...
Draft too different
...
809 (1.02%)
...
337 (0.61%)
...
430 (0.88%)
...
160 (0.49%)
...
Draft generation error
...
3212 (4.03%)
...
1570 (2.85%)
...
1767 (3.63%)
...
1354 (4.14%)
...
Draft above --max-length
...
2 (0.00%)
...
1 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Draft below --min-length
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Reads failed polishing
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Empty coverage windows
...
14 (0.02%)
...
13 (0.02%)
...
11 (0.02%)
...
10 (0.03%)
...
CCS did not converge
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
1 (0.00%)
...
CCS below minimum RQ
...
19496 (24.49%)
...
12155 (22.10%)
...
12497 (25.69%)
...
8106 (24.78%)
...
Unknown error
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
Q30
...
m54105_201123_035001
...
m54105_201125_211823
...
m54105_201125_005856
...
m54105_201126_173825
...
ZMWs input (A)
...
330737
...
263116
...
217071
...
170793
...
ZMWs generating CCS (B)
...
209855 (63.45%)
...
177981 (67.64%)
...
138732 (63.91%)
...
116886 (68.44%)
...
ZMWs filtered (C)
...
120882 (36.55%)
...
85135 (32.36%)
...
78339 (36.09%)
...
53907 (31.56%)
...
Exclusive ZMW counts for (C)
...
Median length filter
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Below SNR threshold
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Lacking full passes
...
54483 (45.07%)
...
39690 (46.62%)
...
32964 (42.08%)
...
22249 (41.27%)
...
Heteroduplex insertions
...
470 (0.39%)
...
306 (0.36%)
...
280 (0.36%)
...
191 (0.35%)
...
Coverage drops
...
82 (0.07%)
...
34 (0.04%)
...
53 (0.07%)
...
32 (0.06%)
...
Insufficient draft cov
...
1045 (0.86%)
...
889 (1.04%)
...
635 (0.81%)
...
610 (1.13%)
...
Draft too different
...
809 (0.67%)
...
337 (0.40%)
...
430 (0.55%)
...
160 (0.30%)
...
Draft generation error
...
3212 (2.66%)
...
1570 (1.84%)
...
1767 (2.26%)
...
1354 (2.51%)
...
Draft above --max-length
...
2 (0.00%)
...
1 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Draft below --min-length
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Reads failed polishing
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
Empty coverage windows
...
14 (0.01%)
...
13 (0.02%)
...
11 (0.01%)
...
10 (0.02%)
...
CCS did not converge
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
1 (0.00%)
...
CCS below minimum RQ
...
60767 (50.27%)
...
42296 (49.68%)
...
42199 (53.87%)
...
29300 (54.35%)
...
Unknown error
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
...
0 (0.00%)
DemultiplexingDemultiplexing
A complete guide for demultiplexing is found here:
https://github.com/PacificBiosciences/barcoding
The ccs output is a bam file containing your multiplexed (i.e. pooled) samples. Each sample will have been barcoded. To extract each sample from the pool, run lima
You will need to find out from your sequencing provider which barcode set was used. This is provided as a fasta file containing the full list of barcodes. The barcodes fasta file used as input to run lima, where each barcode in the file is matched against each read. Note that you can provide a fasta file with just the barcodes you know were used in the current pool you are demultiplexing, or you can provide the full
A few of the main barcode sets used by PacBio are:
Barcoded F/R Universal Primers Plate - 96 v1
https://www.pacb.com/wp-content/uploads/Sequel_RSII_96_barcodes_v1.fasta_.zip
Barcoded F/R Universal Primers Plate - 96 v2
https://www.pacb.com/wp-content/uploads/Sequel_96_barcodes_v2.zip
Barcoded Overhang Adapter Kit – 8A/8B (16 pooled samples)
https://www.pacb.com/wp-content/uploads/Sequel_16_Barcodes_v3.zip
It’s likely that your samples were barcoded with one of these sets (and you can run your data against each of these to see if your reads match any of the barcodes), but again, check with the person/company that sequenced your data to be sure.
Run lima on each of your samples pools (bam files output from the ccs step) like so:
Code Block |
---|
lima --same --ccs --split-bam-named <ccs_output>.bam Sequel_96_barcodes_v1.fasta /path/to/output/pool1/pool1_demux.bam |
In the above example symmetrical barcodes were used - i.e. the same barcode on the 3' and 5' read ends, thus the --same
parameter was used.
If your reads used asymmetrical barcodes, use the --different
parameter.
--split-bam-named
tells lima to output separate bam files, one for each sample. Note that bam files are output for every barcode match, which, if using the --different
parameter can produce 1000’s of bam files, one for each read that match a barcode pair combination. Most of these files will only have a few reads. The true sample files will be the largest ones (which should match the samples and barcodes table provided to you by your sequencing company).
<ccs_output>.bam
is the circular consensus sequence error corrected bam file generated by ccs (see the ccs section in this guide).
Sequel_96_barcodes_v1.fasta
is the fasta of the barcodes used for your samples. Again, check with your sequencing company to get the correct barcodes set.
/path/to/output/pool1/pool1_demux.bam
is a user-provided path and suffix for your samples. In this case I called it ‘pool 1’ and suffixed all the output bam files with ‘pool1_demux’. Create a directory path and suffix that is suitable and meaningful for your data.
Demultiplexing output
Every barcode in the provided fasta file will be matched against the reads in your provided bam file containing your pooled samples.
This will often generate several output bam files where a barcode matched to just a few reads. These are ‘false positives’. The largest files by far, with the majority of reads, should be the correct samples. Check that these match the samples table provided to you from your sequencing company (which should have barcodes used per sample). These are the files you want to use in downstream analysis.
The output bam files are named according to the barcode pair that matched the reads contained within, with the suffix you provided in your lima command.
Additional files are a table of read counts per primer pair, e.g.
IdxFirst | IdxCombined | IdxFirstNamed | IdxCombinedNamed | Counts | MeanScore |
3 | 3 | bc1004 | bc1004 | 2 | 56 |
5 | 5 | bc1006 | bc1006 | 2 | 56 |
7 | 7 | bc1008 | bc1008 | 1 | 25 |
13 | 13 | bc1014 | bc1014 | 1416 | 96 |
16 | 16 | bc1017 | bc1017 | 8254 | 96 |
17 | 17 | bc1018 | bc1018 | 7647 | 95 |
18 | 18 | bc1019 | bc1019 | 8615 | 95 |
You can see that this is the output of a symmetrically barcoded dataset, as the barcodes are the same.
You can also see that barcode ‘bc1004’ matched just 2 reads (thus doesn’t represent a true sample) and barcode bc1017 is a true sample with a large amount of matched reads.
In addition to the table of reads counts is a summary of matched reads, e.g.
ZMWs input (A) : 209855
ZMWs above all thresholds (B) : 200943 (96%)
ZMWs below any threshold (C) : 8912 (4%)
ZMW marginals for (C):
Below min length : 0 (0%)
Below min score : 0 (0%)
Below min end score : 0 (0%)
Below min passes : 0 (0%)
Below min score lead : 455 (5%)
Below min ref span : 361 (4%)
Without SMRTbell adapter : 0 (0%)
Undesired diff pairs : 8407 (94%)
ZMWs for (B):
With same pair : 200943 (100%)
Coefficient of correlation : 52.75%
ZMWs for (A):
Allow diff pair : 209855 (100%)
Allow same pair : 209855 (100%)
Reads for (B):
Above length : 209322 (100%)
Below length : 0 (0%)