Suppose there is a sequence like this:
1 2
1 2
2 1
1 3
1 4
1 5
4 1
We need to get the following results:
1 3
1 5
2 1
4 1
Therefore, use the following perl script to implement it.
Code 1:
Copy codeThe Code is as follows :#! /Bin/perl
Use strict;
Use warnings;
My $ filename;
My % hash;
My @ information;
My $ key1;
My $ key2;
Print "please put in the file like this f :\\\ perl \\\\ data.txt \ n ";
Chomp ($ filename = <STDIN> );
Open (IN, "$ filename") | die ("can not open ");
While (<IN>)
{
Chomp;
@ Information = split/\ s +/, $ _;
If (exists $ hash {$ information [0]} {$ information [1]})
{
Next;
}
Else
{
$ Hash {$ information [0]} {$ information [1]} = 'a ';
}
}
Close IN;
Open (IN, "$ filename") | die ("can not open ");
While (<IN>)
{
@ Information = split/\ s +/, $ _;
If (exists $ hash {$ information [1]} {$ information [0]})
{
Delete $ hash {$ information [0]} {$ information [1]}
}
Else
{
Next;
}
}
Close IN;
Open (OUT, "> f: \ A_ B _result.txt") | die ("can not open ");
Foreach $ key1 (sort {$ a <=> $ B} keys % hash)
{
Foreach $ key2 (sort {$ a <=> $ B} keys % {$ hash {$ key1 }})
{
Print OUT "$ key1 $ key2 \ n ";
}
}
Close OUT;
Code 2:
If there is a file with a data size of 10 Gb, but many rows are duplicated, We need to merge the repeated rows in the file into one row. How can we implement this?
Cat data | sort | uniq> new_data # This method can be implemented, but it takes several hours. The result is returned.
The following is a small tool that uses the perl script to complete this function. The principle is very simple. Create a hash. The content of each line is a key, and the value is filled by the number of occurrences of each line. The script is as follows;
Copy codeThe Code is as follows :#! /Usr/bin/perl
# Author: CaoJiangfeng
# Date: 2011-09-28
# Version: 1.0
Use warnings;
Use strict;
My % hash;
My $ script = $0; # Get the script name
Sub usage
{
Printf ("Usage: \ n ");
Printf ("perl $ script <source_file> <dest_file> \ n ");
}
# If the number of parameters less than 2, exit the script
If ($ # ARGV + 1 <2 ){
& Usage;
Exit 0;
}
My $ source_file = $ ARGV [0]; # File need to remove duplicate rows
My $ dest_file = $ ARGV [1]; # File after remove duplicates rows
Open (FILE, "<$ source_file") or die "Cannot open file $! \ N ";
Open (SORTED, "> $ dest_file") or die "Cannot open file $! \ N ";
While (defined (my $ line = <FILE> ))
{
Chomp ($ line );
$ Hash {$ line} + = 1;
# Print "$ line, $ hash {$ line} \ n ";
}
Foreach my $ k (keys % hash ){
Print SORTED "$ k, $ hash {$ k} \ n"; # print the number of times the column appears and the number of times it appears to the target file
}
Close (FILE );
Close (SORTED );
Code 3:
Use a perl script to delete repeated fields in a data group
Copy codeThe Code is as follows :#! /Usr/bin/perl
Use strict;
My % hash;
My @ array = (1 .. 10, 5, 20, 2, 3, 4, 5 );
# Grep save qualified elements
@ Array = grep {++ $ hash {$ _} <2} @ array;
Print join ("", @ array );
Print "\ n ";