Pip Install Pandaspip Install XLRD
When a lot of records, with Excel sorting processing more laborious, Excel program is not responsive , with pands perfect solution.
# We'll use data structures and data analysis tools provided in Pandas Libraryimp Ort pandas as pd# Import retail sales data from an Excel Workbook into a data frame# path = '/documents/analysis/python/ex Amples/2015sales.xlsx ' path = ' f:/python/an.xlsx ' xlsx = PD. Excelfile (path) df = Pd.read_excel (xlsx, ' Sheet1 ') # Let's add a new Boolean column to our dataframe that'll identify a du Plicated order line Item (False=not a duplicate; true=duplicate) df[' is_duplicated '] = df.duplicated ([' IP ']) # We can sum on a boolean column to get a count of Duplicate Ord ER line items# df[' is_duplicated '].sum () # Get the records of duplicated, If you need non-dup just use False insteaddf_dup = df.loc[df[' is_duplicated ' = = true]# Finally Let's save our cleaned up data to a CSV filedf_dup.to_csv (' dup.csv ', Encodi ng= ' Utf-8 ')
Ref:https://33sticks.com/python-for-business-identifying-duplicate-data/
Python pandas get Excel duplicate record