Exercise 2: Run nf-core/sarek using a second family trio data (HapMap; Genome in a Bottle)
The pipeline requires preparing at least 2 files:
...
Family trio:
NIST ID | NCBI BioSample | Description | Biological sample source |
---|---|---|---|
HG005 (Chinese Son) | SAMN03283350 | National Institute of Standards and Technology (NIST) RM 8393—male of East Asian (Chinese) ancestry Participant (hu91BD69) in the Personal Genome Project: http://www.personalgenomes.org history of: eczema; lactose intolerance; nearsightedness; same subject as GM26107 (stem cell); father is GM24694 (Lymph); mother is GM24695 (Lymph). | https://tsapps.nist.gov/srmext/msds/8393(QTY10)-MSDS.pdf https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24631&Product=DNA |
HG006 (Chinese Father) | SAMN03283348 | Participant (huCA017E) in the Personal Genome Project: http://www.personalgenomes.org; history of: blood clot, hepatitis B, high cholesterol, prostate gland enlargement, and vitamin D deficiency; father of GM24631 (lymph) / GM26107 (stem cell; mother is GM24695 (lymph); whole genome sequence data is available on the PGP website. | https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24694&Product=DNA |
HG007 (Chinese Mother) | SAMN03283349 | Participant (hu38168C) in the Personal Genome Project: http://www.personalgenomes.org; osteopenia; heterozygous AP3B1 gene variant c.280-6dupT (Chr 5, genomic start 77524069, transcript NM_003664.4); heterozygous PRKAR1A gene variant c.349-5dupT (Chr 17, genomic start 66519861, transcript NM_002734.4); heterozygous STAT3 gene variant c.1601-8dupT (Chr 17, genomic start 40475651, transcript NM_139276.2); medication: Fosamax; mother of GM24631 (lymph) / GM26107 (stem cell); father is GM24694 (lymph); whole genome sequence data is available on the PGP website. | https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24695&Product=DNA |
We have learned so far to run the nf-core/sarek pipeline using two cases studies: i) East European family trio (NA12878, NA12891 and NA12892) and ii) liver samples. Let’s now prepare to run the nf-core/sarek pipeline for the above family trio.
Find the data location and the pipeline launch script below.
Data location:
Code Block |
---|
/work/training/sarek/data/trio2 |
Launch script:
Code Block |
---|
/work/training/sarek/scripts/launch_sarek_trio2.pbs |
Hints:
Create a new run directory (e.g., run4_trio2)
Create a samplesheet.csv metadata file for the FASTQ files in the above data folder
Copy the sample sheet.csv and launch PBS script to the run directory and submit the job
NOTE: If you have any questions, please ask to one of our friendly trainers.