Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Exercise 2: Run nf-core/sarek using a second family trio data (HapMap; Genome in a Bottle)

 

The pipeline requires preparing at least 2 files:

...

Family trio:

NIST ID

NCBI BioSample

Description

Biological sample source

HG005 (Chinese Son)

SAMN03283350

National Institute of Standards and Technology (NIST) RM 8393—male of East Asian (Chinese) ancestry

Participant (hu91BD69) in the Personal Genome Project: http://www.personalgenomes.org history of: eczema; lactose intolerance; nearsightedness; same subject as GM26107 (stem cell); father is GM24694 (Lymph); mother is GM24695 (Lymph).

 https://tsapps.nist.gov/srmext/msds/8393(QTY10)-MSDS.pdf

https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24631&Product=DNA

HG006 (Chinese Father)

SAMN03283348

Participant (huCA017E) in the Personal Genome Project: http://www.personalgenomes.org; history of: blood clot, hepatitis B, high cholesterol, prostate gland enlargement, and vitamin D deficiency; father of GM24631 (lymph) / GM26107 (stem cell; mother is GM24695 (lymph); whole genome sequence data is available on the PGP website.

https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24694&Product=DNA

HG007 (Chinese Mother)

SAMN03283349

Participant (hu38168C) in the Personal Genome Project: http://www.personalgenomes.org; osteopenia; heterozygous AP3B1 gene variant c.280-6dupT (Chr 5, genomic start 77524069, transcript NM_003664.4); heterozygous PRKAR1A gene variant c.349-5dupT (Chr 17, genomic start 66519861, transcript NM_002734.4); heterozygous STAT3 gene variant c.1601-8dupT (Chr 17, genomic start 40475651, transcript NM_139276.2); medication: Fosamax; mother of GM24631 (lymph) / GM26107 (stem cell); father is GM24694 (lymph); whole genome sequence data is available on the PGP website.

https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA24695&Product=DNA

We have learned so far to run the nf-core/sarek pipeline using two cases studies: i) East European family trio (NA12878, NA12891 and NA12892) and ii) liver samples. Let’s now prepare to run the nf-core/sarek pipeline for the above family trio.

Find the data location and the pipeline launch script below.

Data location:

Code Block
/work/training/sarek/data/trio2

Launch script:

Code Block
/work/training/sarek/scripts/launch_sarek_trio2.pbs

Hints:

  • Create a new run directory (e.g., run4_trio2)

  • Create a samplesheet.csv metadata file for the FASTQ files in the above data folder

  • Copy the sample sheet.csv and launch PBS script to the run directory and submit the job

NOTE: If you have any questions, please ask to one of our friendly trainers.