Ruby writes a file content similarity comparison code

Source: Internet
Author: User
Tags first string

1. Definition of similarity

We define, then, we set, then, | C|=s, then the similarity of p=,p (0,1)

2. Design of similarity detection algorithm

Algorithm design:

Defines 4 characters as a string, divides the t1,t2 into strings, and if the remaining characters are less than 4, the blanks are fully complement. Count the split t1t2, and write down | t1|=n,| T2|=m,s=0; takes the first string in T1, detects if it is in T2, if present, s+1, and deletes the same string as the detected string, loops to T2 detection, until the detected string is not present in T2, loops to T1, presents the next detected string, detects in T2 , so loop detection until all the strings in T1 are detected or all the strings in the T2 are deleted, stop, write down the s at this time, and divide the resulting s by the largest number in N and M, resulting in a similarity of t1,t2. First take T1 as the detected template, detect, and then T2 for the detected template detection, the number of two similarity, take the minimum value.


The following are implemented in Ruby:

def fill_str (str,i=4)  return str if str.size%i = = 0  str<< "" * (4-str.size%i) enddef txt_cmp (f0,f1)  str _F0,STR_F1 = Fill_str (File.new (F0). Read), Fill_str (File.new (F1). Read)  a0,a1 = Str_f0.scan (/.{ 4}/m), Str_f1.scan (/.{ 4}/m)  n,m,s = a0.size,a1.size,0  a0.each do |txt|    If A1.include? (TXT)      Size = A1.size      s+=size-a1.keep_if {|item| item!=txt}.size    end Break    if a1.size = = 0  end  S/[n,m] . max.to_frescue =>e  puts "error: #{e.message}\n" << e.backtrace[0..2].join ("\ n") End (puts "You must CMP 2 T XT file "; exit" if argv.size! = 2r = txt_cmp (f0=argv[0],f1=argv[1]) puts "#{f0} and #{f1} semblance is #{r*100}%"


Here are 4 files for 1.txt 2.txt a.txt B.txt, respectively, as follows:

1.txt

NFC East rival quarterbacks Tony Romo (notes) of the Dallas Cowboys and Eli Manning (notes) of the New York giants now has Something else in common?? They ' ve used the same wedding planner to help them tie the knot. Todd Fiscus, the man and the plan, set up and what he called "Mans food" at Dallas ' Arlington Hall on Saturday when Romo Marr IED former Miss Missouri Candace Crawford. "I had a lot of football players to feed," said Fiscus, who had pizza and short ribs on the menu.

However, Romo apparently put all the tunes together. "Tony picked out every song, and if it plays, and what the keynote things is," Fiscus said.

Sounds like a very orderly occasion, but there is one wild card?? Whether Cowboys owner Jerry Jones would is able to attend. With the continued lockout, owners and players is not supposed to has any contact away from the negotiating table. But Jones received special dispensation from the NFL to attend, just as the Green Bay Packers recently were informed that They would, in fact, receive their Super Bowl rings in a June ceremony no matter what the labor situation are at that Tim E. Jones is there along with virtually all of Romo ' s teammates.

It is unknown whether Jones and Romo actually discussed all labor issues at the wedding?? We ' re guessing this were more of a "friendly", though Jones are one of the most powerful owners on the NFL ' s side of Things And Romo ' s marquee value gives him a lot of play in the other side.

"I ' ve gotten special permission," Jones recently told ESPN's Ed Werder. than anything, (I got the) right ticket from him and his fiance???? Romo ' s wife-to-be. (It ' s) one of prettiest invitations I ' ve ever seen.

"So, yes, I'll be there and (I ' m) proud for him." He ' s got the best end of this deal. "

Romo, who had been linked romantically before with Jessica Simpson and Carrie Underwood, proposed to Crawford last Decembe R. Crawford's brother Chace is known for his role on the TV show "Gossip Girl ' and have also been linked romantically with Underwood.

According to the new Mrs Romo, the lockout could play a part of the couple ' s plans for a honeymoon; Usually around this time of year, her husband would is participating in minicamps and other off-season workouts.

"This lockout have been quite a dent in the honeymoon idea," she told WFAA-TV. "We ll see." We Haven ' t really gotten there yet. We ' re taking a day @ a time with the lockout. We (IS) even sure if we ' re gonna get to go (on) one. "

2.txt

Officially, Memorial Day, observed on the last Monday of could (this year it's May), honors the war dead. Unofficially, the day honors the start of summer. (More on this in a moment.)

The upcoming Three-day weekend have prompted searches on Yahoo! For "When are Memorial Day," "What are Memorial Day," and "Me Morial Day. " The day is originally known as "Decoration Day" because the day is dedicated to the civil War dead, when mourners would Decorate gravesites as a remembrance.

The holiday is first widely observed on could, 1868, when 5,000 people helped decorate the gravesites of 20,000 Union an D Confederate soldiers buried at Arlington National Cemetery. (Some parts of the South still remember, the Confederate Army with Confederate Memorial day.)

After World War I, the observances were widened to honor the fallen from all American Wars--and in 1971, Congress declared Memorial Day a national holiday.

Towns across the country now honor military personnel with services, parades, and fireworks. A National Moment of Remembrance takes place at 3 p.m. At Arlington National Cemetery, headstones is graced with small American flags.

This-is-not-to-be-confused with Veterans Day, which are observed on November one to honor military veterans, both alive and dead.

However, confusion abounds anyway, with the weekend marking for many the kickoff of summer, and it's reserved for weekend Getaways, picnics, and sales. Searches on ' Memorial Day sales, ' ' Memorial Day Recipes, ' and ' Memorial Day weekend ' are just some of the lookups related to the festivities.

A.txt

23l4kj23 klgjdlskgj235 3lkj 0952ru lkfj LKQEJFG
2t34lktj3409t UJ34GJKLEJEGLEKJFDKLSAFJALSFJ
Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Sdgakdgjsdalgjaslfjsalkfjsadlf

B.txt

23l4kj23 klgjdlskgj235 3lkj 0952ru lkfj LKQEJFG
2t34lktj3409t UJ34GJKLEJEGLEKJFDKLSAFJALSFJ
Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Sdgakdgjsdalgjaslfjsalkfjsadlf

The test code is as follows:


ruby-eiso-8859-14 txtcmp.rb 1.txt 2.txt
1.txt and 2.txt semblance is 8.65384615384615%

Ruby Txtcmp.rb A.txt B.txt

A.txt and B.txt semblance is 79.5454545454545%


Because the 1.txt contains non-utf-8 characters, the default comparison is error, specifying an external encoding comparison.

Ruby writes a file content similarity comparison code

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.