Biological Information Perl Script combat

Source: Internet
Author: User
Tags perl script


Index


1. Statistics of the length of Fasta, FA and FASTQ files, the number of reads in the statistics FASTQ, the individual reads length, the total length of reads, and the number of Fasta in the statistics Contig file, listing the name, the length of a single, and the total length.






























1. Statistics Fasta, FA and Fastq file length, statistics FASTQ reads number, single reads length, reads total length (mainly statistical total length, other under Linux is very simple implementation); Statistics Fasta number of files, listing names , a single length, and a total length.


Idea finishing: This is a typical progressive read file, take the field, calculate the length of the problem.

FASTQ Simple: Four line round, the solution is many, can read line by row, line-by-row matching, you can read two lines at a time, the output is small, single reads length, reads number, reads length sum, there is no other valuable information.

Fasta is slightly more complex: there is no regularity, because the sequence is cut into short, can only be read row by line, line by row, there is a problem, how to detect the end? (This is one of the biggest flaws of progressive reading, you can't manipulate the last time!!!) Therefore, you can only put the last time on the outside of the read loop, each time the output point is matching the title that place.


The code is as follows:


 
#!/usr/bin/perl
#Author:zxlee
#Function: compute the length of fastq or fasta, fa.
#usage: `perl script_name fastq/fasta/fa_file_name`, it will show the total length, also a detail file.

use strict;
use warnings;
my $infile = shift;  #give 1st default para to it, you can go on shift to get the 2st para
open IN, "<$infile" or die $!;
open OUT, ">./result_len.txt" or die $!;

our $total_len = 0;
our $seq_num = 0;
our $len;

if($infile =~ /fastq$/){
    while(<IN>){
        next if not /^@\S+/;
        my $seq = <IN>;  #your cannot use $_ here!!!
        chomp($seq);
        $seq_num += 1;
        $total_len += length($seq);
        print OUT "\nreads_len = $total_len\n" if $seq_num == 1;
    }
    print OUT "Total num of reads is $seq_num\n";
}
elsif($infile =~ /(fasta|fa)$/){ # easy way, not use "OR"
    my $chr_len = 0;
    while(<IN>){
        chomp;
        my $line = $_;
        if ($line =~ /^>(\S+)/){
            print OUT "$chr_len\n" if $chr_len != 0;
            print OUT "$1\t";
            $chr_len = 0;
        }else{
            $len = length($line) if $total_len == 0;
            $chr_len += length($line);
            $total_len += length($line);
        }
    }
    print OUT "$chr_len\n";
    print OUT "one line has $len\n";
}
print "The total length is $total_len\n";
close(IN);
close(OUT);


Biological Information Perl Script combat


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.