Python implements line-based de-duplication of text,

Source: Internet
Author: User

Python implements line-based de-duplication of text,

Text:

Each row contains numbers after promotion. If these numbers are the same, they are considered to be the same row. For the same row, only one row is retained.

Ideas:

Cut by dictionary and string.

Create an empty dictionary.

Read the text and cut the first half of each line. When the text is read, it is searched in the dictionary cyclically. If the text is not found, the row is written to the dictionary. Otherwise, it indicates that the row has been written into the dictionary (that is, duplicate rows exist) and no dictionary is written. This achieves the goal of retaining only one row for duplicate rows.

The text is as follows:

/promotion/232 utm_source/promotion/237 LandingPage/borrowExtend/? ;/promotion/25113 LandingPage/mhd/promotion/25113 LandingPage/mhd/promotion/25199 com/LandingPage/promotion/254 LandingPage/mhd/mhd4/? ;/promotion/259 LandingPage/ydy/? ;/promotion/25113 LandingPage/mhd/promotion/25199 com/LandingPage/promotion/25199 com/LandingPage

The procedure is as follows:

Line_dict_uniq = dict () with open('1.txt ', 'R') as fd: for line in fd: key = line. split ('') [0] if key not in line_dict_uniq.values (): line_dict_uniq [key] = lineelse: continueprint line_dict_uniq print len (line_dict_uniq) # Here we print rows that are not repeated (only printed once). We can actually write this result into the file, and # We will not write this code into the file.

The execution efficiency of the above program is relatively low. It will be improved as follows:

line_dict_uniq = dict()with open('1.txt','r') as fd:for line in fd:key = line.split(' ')[0]if key not in line_dict_uniq.keys():line_dict_uniq[key] = lineelse:continueprint line_dict_uniqprint len(line_dict_uniq)

The above is a small series of Python text for everyone to deduplicate by line, I hope to help you, if you have any questions, please leave a message, the small series will reply to you in a timely manner. Thank you very much for your support for the help House website!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.