The following is a DNA sequence stored in F: \ perl \ data.txt under the window:
Copy codeThe Code is as follows: aaaaaaaaaaaaaagggggttttcccccccc
CCCCCGTCGTAGTAAAGTATGCAGTAGCVG
Ccccccccggggggggaaaaaaaaaaaaattttttat
AAACG
The following is a program:
Copy codeThe Code is as follows: # The following program is used to calculate the number of ATGC in a DNA sequence.
# First define the number of four bases as 0
$ Count_A = 0;
$ Count_T = 0;
$ Count_C = 0;
$ Count_G = 0;
# First, merge the sequence into a row.
# First determine the path and file name of the file to be processed (in windows, follow the example below to write
# F: \ perl \ data.txt
Print "please input the Path just like this f :\\\ perl \\\ data.txt \ n ";
Chomp ($ DNA _filename = <STDIN> );
# Open a file
Open (DNA filename, $ DNA _ filename) | die ("can not open the file! ");
# Assign a file to an array
@ DNA = <dna filename>;
# Merge all rows into one row in the following two steps, and then remove all blank characters
$ DNA = join ('', @ DNA );
$ DNA = ~ S/\ s // g;
# Break down the DNA into and assign values to the array
@ DNA = split ('', $ DNA );
# Then read the elements of the array in sequence and count the number of the four bases
Foreach $ base (@ DNA)
{
If ($ base eq 'A ')
{
$ Count_A = $ count_A + 1;
}
Elsif ($ base eq 'T ')
{
$ Count_T = $ count_T + 1;
}
Elsif ($ base eq 'C ')
{
$ Count_C = $ count_C + 1;
}
Elsif ($ base eq 'G ')
{
$ Count_G = $ count_G + 1;
}
Else
{
Print "error \ n"
}
}
# Output the final result
Print "A = $ count_A \ n ";
Print "T = $ count_T \ n ";
Print "C = $ count_C \ n ";
Print "G = $ count_G \ n ";
The running result is as follows:Copy codeThe Code is as follows: F :\> perl \ a. pl
Please input the Path just like this f: \ perl \ data.txt
F: \ perl \ data.txt
Error
A = 40
T = 17
C = 27
G = 24
F: \>
We may have observed an error. Why?
Take a closer look at the top of the original DNA sequence, marked with special colors, you can see that there is a V, so it will output an error.
Here, the DNA sequence is integrated into one line, and then all the blank characters are removed. Then, $ DNA is converted into an array through the split function and then analyzed. Is there a better way?
In fact, there is a function in perl, substr.
Let's take a look at The usage of this function. The substr is an operator for a large string (The substr function works with only a part of a larger string). It refers to a long string, perform fragmented processing and take part of it. This feature is used here.
$ Little_string = substr ($ large_string, $ start_position, $ length)
$ Small fragment = substr ($ large fragment, $ starting position of the small fragment you want to intercept, $ length of the part you want to intercept)
Here we want to count the number of various bases in the DNA, so the string to be processed is a base, so we need to set $ length to 1. In this way, we can meet our needs.
Next we will write down the modified Code:
Copy codeThe Code is as follows: # The following program is used to calculate the number of ATGC in a DNA sequence.
# First define the number of four bases as 0
$ Count_A = 0;
$ Count_T = 0;
$ Count_C = 0;
$ Count_G = 0;
# First, merge the sequence into a row.
# First determine the path and file name of the file to be processed (in windows, follow the example below to write
# F: \ perl \ data.txt
Print "please input the Path just like this f :\\\ perl \\\ data.txt \ n ";
Chomp ($ DNA _filename = <STDIN> );
# Open a file
Open (DNA filename, $ DNA _ filename) | die ("can not open the file! ");
# Assign a file to an array
@ DNA = <dna filename>;
# Merge all rows into one row in the following two steps, and then remove all blank characters
$ DNA = join ('', @ DNA );
$ DNA = ~ S/\ s // g;
# Then read the elements of the string in sequence and count the number of the four bases
For ($ position = 0; $ position <length $ DNA; ++ $ position)
{
$ Base = substr ($ DNA, $ position, 1 );
If ($ base eq 'A ')
{
$ Count_A = $ count_A + 1;
}
Elsif ($ base eq 'T ')
{
$ Count_T = $ count_T + 1;
}
Elsif ($ base eq 'C ')
{
$ Count_C = $ count_C + 1;
}
Elsif ($ base eq 'G ')
{
$ Count_G = $ count_G + 1;
}
Else
{
Print "error \ n"
}
}
# Output the final result
Print "A = $ count_A \ n ";
Print "T = $ count_T \ n ";
Print "C = $ count_C \ n ";
Print "G = $ count_G \ n ";
The result is as follows:
Copy codeThe Code is as follows: F :\> perl \ a. pl
Please input the Path just like this f: \ perl \ data.txt
F: \ perl \ data.txt
Error
A = 40
T = 17
C = 27
G = 24
F: \>