Below for you to share a python on multiple attributes of repeated data deduplication example, has a good reference value, I hope to be helpful to everyone. Come and see it together.
Repeat data deduplication steps in the Pandas module in Python:
1) Use the duplicated method in Dataframe to return a Boolean series that shows whether the rows have duplicate rows, that no duplicates are displayed as false, and that duplicate rows are displayed as true;
2) Reuse the Drop_duplicates method in Dataframe to return a dataframe that removes duplicate rows.
Comments:
If a parameter is not set in the duplicated method and the Drop_duplicates method, both methods will default to all, if the specified property name (or column name) is added to the two methods, for example: frame.drop_duplicates ([' State ']), specify a partial column (state column) to be judged for duplicates.
The concrete examples are as follows:
>>> import pandas as PD >>> data={' state ': [1,1,2,2], ' Pop ': [' A ', ' B ', ' C ', ' d ']} >>> frame=pd. DataFrame (data) >>> frame Pop State 0 A 1 1 b 1 2 C 2 3 D 2 >>> ISDUPLICATED=FRAME.D uplicated () >>> print isduplicated 0 false 1 False 2 false 3 false Dtype:bool >>> FRAME=FRAME.DROP_DUPL Icates ([' State ']) >>> frame POPs State 0 A 1 2 C 2 >>> isduplicated=frame.duplicated ([' State ']) >>> print isduplicated 0 false 2 false Dtype:bool >>>