The following is the DNA sequence, stored in F: \ perl \ data.txt under window:
Copy the code:
AAAAAAAAAAAAAAGGGGGGGTTTTCCCCCCCC
CCCCCGTCGTAGTAAAGTATGCAGTAGCVG
CCCCCCCCCCGGGGGGGGAAAAAAAAAAAAAAATTTTTTAT
AAACG
Here is the program:
Copy the code:
#The following program is used to calculate the number of ATGC in a DNA sequence
#First define the number of four bases as 0
$ count_A = 0;
$ count_T = 0;
$ count_C = 0;
$ count_G = 0;
#First, you must merge the sequences into one line.
#First determine the path and file name of the file to be processed.
#f: \\ perl \\ data.txt
print "please input the Path just like this f: \\\\ perl \\\\ data.txt \ n";
chomp ($ dna_filename = <STDIN>);
#open a file
open (DNAFILENAME, $ dna_filename) || die ("can not open the file!");
#Assign files to an array
@ DNA = <DNAFILENAME>;
#The following two steps are to merge all lines into one line, and then remove all whitespace characters
$ DNA = join ('', @ DNA);
$ DNA = ~ s / \ s // g;
#Decompose DNA and assign to array
@ DNA = split ('', $ DNA);
#Then read the elements of the array in turn, and count the number of four bases
foreach $ base (@DNA)
{
if ($ base eq 'A')
{
$ count_A = $ count_A + 1;
}
elsif ($ base eq 'T')
{
$ count_T = $ count_T + 1;
}
elsif ($ base eq 'C')
{
$ count_C = $ count_C + 1;
}
elsif ($ base eq 'G')
{
$ count_G = $ count_G + 1;
}
else
{
print "error \ n"
}
}
#Output the final result
print "A = $ count_A \ n";
print "T = $ count_T \ n";
print "C = $ count_C \ n";
print "G = $ count_G \ n";
Here are the results of the run:
Copy the code:
F: \> perl \ a.pl
please input the Path just like this f: \\ perl \\ data.txt
f: \\ perl \\ data.txt
error
A = 40
T = 17
C = 27
G = 24
F: \>
You may observe an error. Why?
Take a closer look at the top original DNA sequence, which is marked with a special color, and you can see that there is a V, so it will output an error.
Here, the DNA sequence is integrated into one line, and after removing all whitespace characters, $ DNA is turned into an array by the split function, and then statistics are performed. Is there a better way?
In fact, there is a function in perl, substr.
Let's take a look at the usage of this function first. Substr is an operator for a large string (The substr function works with only a part of a larger string). The implication is to fragment a very long string. Take part of it. We use this feature here.
$ Small fragment = substr ($ large fragment, $ start position of the small fragment you want to intercept, $ length you want to intercept)
We are here to count the number of various bases in DNA, so the string to be processed is one base, so we need to set $ length to 1. Only in this way can we meet our needs.
Below we write the modified code:
Copy the code:
#The following program is used to calculate the number of ATGC in a DNA sequence
#First define the number of four bases as 0
$ count_A = 0;
$ count_T = 0;
$ count_C = 0;
$ count_G = 0;
#First, you must merge the sequences into one line.
#First determine the path and file name of the file to be processed (in the Windows system, write according to this example
#f: \\ perl \\ data.txt
print "please input the Path just like this f: \\\\ perl \\\\ data.txt \ n";
chomp ($ dna_filename = <STDIN>);
#open a file
open (DNAFILENAME, $ dna_filename) || die ("can not open the file!");
#Assign files to an array
@ DNA = <DNAFILENAME>;
#The following two steps are to merge all lines into one line, and then remove all whitespace characters
$ DNA = join ('', @ DNA);
$ DNA = ~ s / \ s // g;
#Then read the elements of the string in turn, and count the number of four bases
for ($ position = 0; $ position <length $ DNA; ++ $ position)
{
$ base = substr ($ DNA, $ position, 1);
if ($ base eq 'A')
{
$ count_A = $ count_A + 1;
}
elsif ($ base eq 'T')
{
$ count_T = $ count_T + 1;
}
elsif ($ base eq 'C')
{
$ count_C = $ count_C + 1;
}
elsif ($ base eq 'G')
{
$ count_G = $ count_G + 1;
}
else
{
print "error \ n"
}
}
#Output the final result
print "A = $ count_A \ n";
print "T = $ count_T \ n";
print "C = $ count_C \ n";
print "G = $ count_G \ n";
The results obtained are as follows:
Copy the code:
F: \> perl \ a.pl
please input the Path just like this f: \\ perl \\ data.txt
f: \\ perl \\ data.txt
error
A = 40
T = 17
C = 27
G = 24
F: \>
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.