標籤:
索引
1.統計fasta、fa和fastq檔案的長度,統計fastq的reads個數,單個reads長度,reads總長度;統計fasta檔案中contig的個數,列出名稱,單條的長度,以及總長度。
1.統計fasta、fa和fastq檔案的長度,統計fastq的reads個數,單個reads長度,reads總長度(主要是統計總長度,其他在Linux下很簡單就實現了);統計fasta檔案中contig的個數,列出名稱,單條的長度,以及總長度。
思路整理:這是個典型的逐行讀檔案,取欄位,計算長度問題。
fastq簡單:四行一輪,解決辦法很多,可以逐行讀取,逐行匹配;也可以一次讀兩行;輸出也少,單reads長度,reads數量,reads長度總和,也沒有其他有價值的資訊。
fasta略微複雜:沒有規律,因為序列被切成短的,只能逐行讀,逐行匹配;逐行讀有個問題,怎麼檢測結束?(這是逐行讀的一個最大缺陷,你無法操縱最後一次!!!)因此,只能把最後一次放在讀迴圈的外面,每次輸出的點就在匹配title那個地方。
代碼如下:
#!/usr/bin/perl#Author:zxlee#Function: compute the length of fastq or fasta, fa.#usage: `perl script_name fastq/fasta/fa_file_name`, it will show the total length, also a detail file.use strict;use warnings;my $infile = shift; #give 1st default para to it, you can go on shift to get the 2st paraopen IN, "<$infile" or die $!;open OUT, ">./result_len.txt" or die $!;our $total_len = 0;our $seq_num = 0;our $len;if($infile =~ /fastq$/){ while(<IN>){ next if not /^@\S+/; my $seq = <IN>; #your cannot use $_ here!!! chomp($seq); $seq_num += 1; $total_len += length($seq); print OUT "\nreads_len = $total_len\n" if $seq_num == 1; } print OUT "Total num of reads is $seq_num\n";}elsif($infile =~ /(fasta|fa)$/){ # easy way, not use "OR" my $chr_len = 0; while(<IN>){ chomp; my $line = $_; if ($line =~ /^>(\S+)/){ print OUT "$chr_len\n" if $chr_len != 0; print OUT "$1\t"; $chr_len = 0; }else{ $len = length($line) if $total_len == 0; $chr_len += length($line); $total_len += length($line); } } print OUT "$chr_len\n"; print OUT "one line has $len\n";}print "The total length is $total_len\n";close(IN);close(OUT);
生物資訊 perl 指令碼實戰