Python list deduplication method you should know, python list deduplication Method
Preface
List deduplication is a common problem when writing Python scripts, because no matter where the source data comes from, when we convert it into a list, the expected results may not be our final results, the most common thing is that the Meta in the list is repeated. At this time, we need to re-process the first thing.
Let's take a look at the simplest method, which is implemented using the python built-in data type set.
Assume that our list data is as follows:
level_names = [ u'Second Level', u'Second Level', u'Second Level', u'First Level', u'First Level']
Because the elements of a set cannot be repeated, repeated elements are automatically removed when the list is converted to a set. This is the basic principle. The Code is as follows:
>>> the_list = set(level_names)>>> print(the_list)set([u'Second Level', u'First Level'])
The disadvantage of this method is that the previous list order cannot be saved when you switch to a list. If you do not have this requirement, this method is the simplest answer. Some friends may think it is easy, is there no technical content? That's right, so it is generally written like this for removing the duplicate list in the interview questions:
Please write out the list deduplication method (set cannot be used)
People say they cannot use set. So, sometimes this trick is not available, and of course it is hard for us. We still have other methods.
We all know that the list can be traversed, and it is easy to traverse the problem. Let's define an empty list, traverse the list with data, and add a judgment after the traversal. If there is no empty list, the Code is as follows:
the_list = []for level in level_names: if level not in the_list: the_list.append(level)print(the_list)
Do you think this method is acceptable? But it is okay to deal with small lists in general. However, if you encounter a super large list, it will not work, because the the_list list is very large, it will affect the efficiency in the judgment, because the list is searched by index order, it will slow down when the data volume is large.
Maybe you have to ask, what should I do if I encounter a large list? Is there a better way? Of course, let's continue. Since using the list during judgment will affect efficiency, we will switch to another idea. If we use a set, you may have to ask, then the set will be faster? Yes, because the hash function used by the set finds the value. Although the set is unordered, the position is fixed. You only need to check whether a specific element exists once, some people have compared the list and set element search on the Internet. In the same data condition, it takes 16 minutes to use list and 52 seconds to use set. This shows the effect, I will not talk about anything more. paste the Code:
the_list = []the_set = set()for level in level_names: if level not in the_set: the_set.add(level) the_list.append(level)print(the_list)
Summary
The above is all about this article. I hope this article will help you in your study or work. If you have any questions, please leave a message.