Aim:
This page provides tips on how to cluster oligonucleotide sequences (i.e., aptamers, miRNAs, etc) based on their sequence identity using two strategies: 1) mapper.pl script from the mirdeep2 package, and 2) cd-hit clustering approach.
...
Code Block |
---|
mapper.pl --help No config or reads file could be found /Users/barrero/anaconda3/bin/mapper.pl input_file_reads This script takes as input a file with deep sequencing reads (these can be in different formats, see the options below). The script then processes the reads and/or maps them to the reference genome, as designated by the options given. Options: Read input file: -a input file is seq.txt format -b input file is qseq.txt format -c input file is fasta format -e input file is fastq format -d input file is a config file (see miRDeep2 documentation). options -a, -b, -c or -e must be given with option -d. Preprocessing/mapping: -g three-letter prefix for reads (by default 'seq') -h parse to fasta format -i convert rna to dna alphabet (to map against genome) -j remove all entries that have a sequence that contains letters other than a,c,g,t,u,n,A,C,G,T,U,N -k seq clip 3' adapter sequence -l int discard reads shorter than int nts, default = 18 -m collapse reads -p genome map to genome (must be indexed by bowtie-build). The 'genome' string must be the prefix of the bowtie index. For instance, if the first indexed file is called 'h_sapiens_37_asm.1.ebwt' then the prefix is 'h_sapiens_37_asm'. -q map with one mismatch in the seed (mapping takes longer) -r int a read is allowed to map up to this number of positions in the genome default is 5 Output files: -s file print processed reads to this file -t file print read mappings to this file Other: -u do not remove directory with temporary files -v outputs progress report -n overwrite existing files -o number of threads to use for bowtie Example of use: /Users/barrero$HOME/anaconda3/bin/mapper.pl reads_seq.txt -a -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -p h_sapiens_37_asm -s reads.fa -t reads_vs_genome.arf -v |
...
Code Block |
---|
mapper.pl S32_19to21nt.rename.fasta -c -m -s S32_19to21nt.collapsed.fa |
Where:
-c input is a fasta file (see above for other input options)
-m merge identical sequences and generate its copy number
-s output filename
Example: Merged identical sequences showing copy number (i.e., _x57828)
...