/
Batch rename fasta headers

Batch rename fasta headers

Aim:

We will use the seqkit replace tool to rename the fasta headers using a list with desired names

Requirements

If not yet available install seqkit as follows

#STEP1: Activate the ConsGenome environment conda activate ConsGenome #STEP2: install seqki as follows conda install -c bioconda seqkit

Input files

  1. FASTA file: prepare an input fasta file with the sequence Accession Number as the header. For example:

>KY709128 TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG >MG894709 TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACATCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACTTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG >KY818102 TGAGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGAGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG >KT827367 TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTTAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCACGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG >JF967937 TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACATACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCCTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG >FJ687476 TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTCAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTGTCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG

2. New Names List: also prepare a list with for example the following two columns: Column1 (Accession Number) and Column2 (AccessionNumber_CollectionYear_Country)

KY709128 KY709128_2010_Philippines MG894709 MG894709_2012_Philippines KY818102 KY818102_2011_Philippines KT827367 KT827367_2010_China JF967937 JF967937_2010_Philippines FJ687476 FJ687476_2007_South_Korea

Batch rename script

Place both INPUT files (names.txt and sample.fasta) in the same folder along with the ‘launch_batch_rename.pbs’ script. Then run on the HPC as follows:

Related content

DKE 121 genome assembly
DKE 121 genome assembly
More like this
Data download using module load sra-tools
Data download using module load sra-tools
More like this
RNAseq - Star 2 pass approach (Ronin)
RNAseq - Star 2 pass approach (Ronin)
More like this
5. RNA-seq pipeline
5. RNA-seq pipeline
More like this
3. Fetch public RNA-seq data
3. Fetch public RNA-seq data
More like this