First, about the procedure:
Fun: Calculates the percent of content in G and C in each sequence in the Fasta file, the output maximum and its ID
Input:fasta Format Files
>seq1
CGCCGAGCGCTTGACCTCCAGCAAGACGCCGTCTGGCACATGCAACGAGCTGTAGCAGAC
>seq2
ATGCCTAGAACGTTCGAGACTTCTCGGGTGCGGTAGAATTAGCCATTCGACCGACTTCCA
GCATCTGCGAGCCGCCTGTTGATTGCATCCGCCGGGGACGCAACAAGGCAAGGCCCTAAC
OUTPUT: The highest level of sequence ID and its content (this is the result above)
seq163.333333%
Second, the concept of programming and code
When it is a comment line (...). ), get the sequence ID, and skip the secondary loop, and when you read a non-commented line that is a sequence row, record the "content of G and C" and "total content of the sequence" in this line, which can be implemented using the Perl context. (But here are some doubts-when the 14 line @num replaced $num will appear calculation error, know friends welcome message)
1 use strict;
2 my% GC_content; # id => GC_content
3 my% sequences; # id => sequence
4 my ($ id, $ sum); # id, number of characters in each sequence
5 my @num; # intermediate variable, used to store the content of a character in a single line
6 while (my $ seq = <>) {
7 chomp ($ seq);
8 if ($ seq = ~ m / ^> (. *) /) {
9 $ id = $ 1;
10 next;
11}
12 @num = ($ seq = ~ m / (G | C) / g);
13 $ GC_content {$ id} + = @num;
14 @num = ($ seq = ~ m /(.)/ g);
15 $ sequences {$ id} + = @num;
16}
17
18 foreach (keys (% GC_content)) {
19 $ GC_content {$ _} / = $ sequences {$ _};
20}
21 my @sort = sort ($ GC_content {$ b} <=> $ GC_content {$ a}} keys (% GC_content);
22 printf ("% s \ n% .6f% \ n", $ sort [0], $ GC_content {$ sort [0]} * 100);
Third, skills
The magic of Perl, the magical sort!!
How to sort an array (or hash) to get the subscript:
# Number sort:
my @arr = qw (2 3 41 2 34);
my @ result1 = sort ($ a <=> $ b} @arr;
# Get the subscript:
my @ result2 = sort {$ arr [$ a] <=> $ arr [$ b]} 0 .. $ # arr;
# Get the key:
my% hash = (
one => 1,
two => 5,
tree => 9
);
my @ result3 = sort ($ hash {$ a} <=> $ hash {$ b}} keys (% hash);
print "Number sorting: @ result1 \ nGet subscript: @ result2 \ nGet key: @ result3 \ n";
Perl Practice--fasta Format File sequence GC content calculation &perl Array ordering How to get subscript or key