/
Batch rename fasta headers
Batch rename fasta headers
Aim:
We will use the seqkit replace tool to rename the fasta headers using a list with desired names
Requirements
If not yet available install seqkit as follows
#STEP1: Activate the ConsGenome environment
conda activate ConsGenome
#STEP2: install seqki as follows
conda install -c bioconda seqkit
Input files
FASTA file: prepare an input fasta file with the sequence Accession Number as the header. For example:
>KY709128
TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG
>MG894709
TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACATCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACTTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG
>KY818102
TGAGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGAGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG
>KT827367
TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTTAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCACGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG
>JF967937
TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACATACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCCTAAAAGGGGTATCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG
>FJ687476
TGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGTCGGGAGCTACGTGGGTGGATGTAGTGCTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACCAACACTGGACATTGAACTCCTGAAGACGGAGGTCACAAACCCTGCAGTCCTGCGCAAACTGTGCATTGAAGCTAAAATATCAAATACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAGCAGGACACGAACTTTGTGTGCCGACGAACGCTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTTGGAAAAGGTAGCTTAATAACGTGTGCTAAGTTCAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATATGAAAACTTAAAATATTCAGTCATAGTCACCGTACACACTGGAGACCAACACCAAGTTGGAAATGAGACCACAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCTACGTCGGAAATACAGCTGACAGACTACGGAGCTCTAACACTGGATTGTTCACCTAGAACAGGACTAGACTTTAATGAGATGGTGTTGTTGACGATGAAAGAAAAATCATGGCTCGTCCACAAACAATGGTTTCTGGACCTACCACTGCCTTGGACCTCAGGGGCCTCAACATCCCAAGAGACTTGGAATAGACAAGACCTGCTGGTCACATTCAAGACAGCTCATGCAAAAAAGCAGGAAGTAGTCGTGCTAGGATCACAAGAAGGAGCAATGCACACTGCGCTGACTGGAGCGACAGAAATCCAAACGTCTGGAACGACAACAATTTTTGCAGGGCACCTGAAATGCAGACTAAAAATGGATAAACTGACCTTAAAAGGGGTGTCATATGTAATGTGCACAGGGTCATTCAAGCTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAACTGTTCTAGTGCAAGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAGAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCAACATTGAAGCGGAGCCACCTTTTGGGGAGAGCTACCTTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACTAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGCGGAGCACGAAGGATGGCCATCCTGGGAGACACCGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACATCTGTGGGAAAACTGATACACCAGATTTTTGGGACTGCGTATGGAGTCTTGTTCAGCGGGGTTTCTTGGACCATGAAAATAGGAATAGGGATTCTGCTGACATGGCTAGGATTAAATTCAAGGAGCACATCCCTTTCAATGACGTGTATCGCAGTCGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCG
2. New Names List: also prepare a list with for example the following two columns: Column1 (Accession Number) and Column2 (AccessionNumber_CollectionYear_Country)
KY709128 KY709128_2010_Philippines
MG894709 MG894709_2012_Philippines
KY818102 KY818102_2011_Philippines
KT827367 KT827367_2010_China
JF967937 JF967937_2010_Philippines
FJ687476 FJ687476_2007_South_Korea
Batch rename script
Place both INPUT files (names.txt and sample.fasta) in the same folder along with the ‘launch_batch_rename.pbs’ script. Then run on the HPC as follows:
, multiple selections available,
Related content
DKE 121 genome assembly
DKE 121 genome assembly
More like this
Data download using module load sra-tools
Data download using module load sra-tools
More like this
RNAseq - Star 2 pass approach (Ronin)
RNAseq - Star 2 pass approach (Ronin)
More like this
5. RNA-seq pipeline
5. RNA-seq pipeline
More like this
3. Fetch public RNA-seq data
3. Fetch public RNA-seq data
More like this