Text:
Each row contains a number after promotion, and if the numbers are the same, the same row is considered, and only one row is reserved for the same row.
Ideas:
Cut according to the dictionary and string.
Create an empty dictionary.
Reads the text and cuts the first half of each line, loops through the dictionary while reading the text, and writes the line to the dictionary if it is not found. Otherwise, it means that the line has been written to the dictionary (that is, a duplicate row appears) and no longer writes to the dictionary, which enables you to keep only one row for duplicate rows.
The text reads as follows:
/promotion/232 utm_source
/promotion/237 landingpage/borrowextend/?;
/promotion/25113 LANDINGPAGE/MHD
/promotion/25113 landingpage/mhd
/promotion/25199 com/LandingPage
/ promotion/254 landingpage/mhd/mhd4/?;
/promotion/259 landingpage/ydy/?;
/promotion/25113 LANDINGPAGE/MHD
/promotion/25199 com/landingpage
/promotion/25199 com/LandingPage
The procedure is as follows:
Line_dict_uniq = Dict ()
with open (' 1.txt ', ' R ') as FD: For line in
FD:
key = Line.split (") [0]
if key not In Line_dict_uniq.values ():
Line_dict_uniq[key] = line
else:
continue
print Line_dict_uniq
Print Len (line_dict_uniq)
# Here is a print of a duplicate line (duplicate print only once), actually write this result to the file on it,
# will not write this paragraph write the file code
The above program is less efficient to perform, and can be improved as follows:
Line_dict_uniq = Dict ()
with open (' 1.txt ', ' R ') as FD: For line in
FD:
key = Line.split (') [0]
if key Not in Line_dict_uniq.keys ():
Line_dict_uniq[key] = line
else:
continue
Print Line_dict_uniq
print len (line_dict_uniq)
The above is a small set to introduce Python to do the text by the line to heavy, I hope to help everyone, if you have any questions please give me a message, small series will promptly reply to everyone. Here also thank you very much for the cloud Habitat Community website support!