Perl Practice--fasta Format File sequence GC content calculation &perl Array ordering How to get subscript or key

Source: Internet
Author: User



First, about the procedure:



Fun: Calculates the percent of content in G and C in each sequence in the Fasta file, the output maximum and its ID



Input:fasta Format Files


>seq1
CGCCGAGCGCTTGACCTCCAGCAAGACGCCGTCTGGCACATGCAACGAGCTGTAGCAGAC
>seq2
ATGCCTAGAACGTTCGAGACTTCTCGGGTGCGGTAGAATTAGCCATTCGACCGACTTCCA
GCATCTGCGAGCCGCCTGTTGATTGCATCCGCCGGGGACGCAACAAGGCAAGGCCCTAAC


OUTPUT: The highest level of sequence ID and its content (this is the result above)


seq163.333333%





Second, the concept of programming and code



When it is a comment line (...). ), get the sequence ID, and skip the secondary loop, and when you read a non-commented line that is a sequence row, record the "content of G and C" and "total content of the sequence" in this line, which can be implemented using the Perl context. (But here are some doubts-when the 14 line @num replaced $num will appear calculation error, know friends welcome message)


1 use strict;
  2 my% GC_content; # id => GC_content
  3 my% sequences; # id => sequence
  4 my ($ id, $ sum); # id, number of characters in each sequence
  5 my @num; # intermediate variable, used to store the content of a character in a single line
  6 while (my $ seq = <>) {
  7 chomp ($ seq);
  8 if ($ seq = ~ m / ^> (. *) /) {
  9 $ id = $ 1;
10 next;
11}
12 @num = ($ seq = ~ m / (G | C) / g);
13 $ GC_content {$ id} + = @num;
14 @num = ($ seq = ~ m /(.)/ g);
15 $ sequences {$ id} + = @num;
16}
17
18 foreach (keys (% GC_content)) {
19 $ GC_content {$ _} / = $ sequences {$ _};
20}
21 my @sort = sort ($ GC_content {$ b} <=> $ GC_content {$ a}} keys (% GC_content);
22 printf ("% s \ n% .6f% \ n", $ sort [0], $ GC_content {$ sort [0]} * 100); 








Third, skills



The magic of Perl, the magical sort!!



How to sort an array (or hash) to get the subscript:


 
# Number sort:
my @arr = qw (2 3 41 2 34);
my @ result1 = sort ($ a <=> $ b} @arr;
# Get the subscript:
my @ result2 = sort {$ arr [$ a] <=> $ arr [$ b]} 0 .. $ # arr;
# Get the key:
my% hash = (
     one => 1,
     two => 5,
     tree => 9
);
my @ result3 = sort ($ hash {$ a} <=> $ hash {$ b}} keys (% hash);
print "Number sorting: @ result1 \ nGet subscript: @ result2 \ nGet key: @ result3 \ n";  








Perl Practice--fasta Format File sequence GC content calculation &perl Array ordering How to get subscript or key


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.