Index
1. Statistics of the length of Fasta, FA and FASTQ files, the number of reads in the statistics FASTQ, the individual reads length, the total length of reads, and the number of Fasta in the statistics Contig file, listing the name, the length of a single, and the total length.
1. Statistics Fasta, FA and Fastq file length, statistics FASTQ reads number, single reads length, reads total length (mainly statistical total length, other under Linux is very simple implementation); Statistics Fasta number of files, listing names , a single length, and a total length.
Idea finishing: This is a typical progressive read file, take the field, calculate the length of the problem.
FASTQ Simple: Four line round, the solution is many, can read line by row, line-by-row matching, you can read two lines at a time, the output is small, single reads length, reads number, reads length sum, there is no other valuable information.
Fasta is slightly more complex: there is no regularity, because the sequence is cut into short, can only be read row by line, line by row, there is a problem, how to detect the end? (This is one of the biggest flaws of progressive reading, you can't manipulate the last time!!!) Therefore, you can only put the last time on the outside of the read loop, each time the output point is matching the title that place.
The code is as follows:
#!/usr/bin/perl
#Author:zxlee
#Function: compute the length of fastq or fasta, fa.
#usage: `perl script_name fastq/fasta/fa_file_name`, it will show the total length, also a detail file.
use strict;
use warnings;
my $infile = shift; #give 1st default para to it, you can go on shift to get the 2st para
open IN, "<$infile" or die $!;
open OUT, ">./result_len.txt" or die $!;
our $total_len = 0;
our $seq_num = 0;
our $len;
if($infile =~ /fastq$/){
while(<IN>){
next if not /^@\S+/;
my $seq = <IN>; #your cannot use $_ here!!!
chomp($seq);
$seq_num += 1;
$total_len += length($seq);
print OUT "\nreads_len = $total_len\n" if $seq_num == 1;
}
print OUT "Total num of reads is $seq_num\n";
}
elsif($infile =~ /(fasta|fa)$/){ # easy way, not use "OR"
my $chr_len = 0;
while(<IN>){
chomp;
my $line = $_;
if ($line =~ /^>(\S+)/){
print OUT "$chr_len\n" if $chr_len != 0;
print OUT "$1\t";
$chr_len = 0;
}else{
$len = length($line) if $total_len == 0;
$chr_len += length($line);
$total_len += length($line);
}
}
print OUT "$chr_len\n";
print OUT "one line has $len\n";
}
print "The total length is $total_len\n";
close(IN);
close(OUT);
Biological Information Perl Script combat