Some basic Efficiency Improvement tips and Python3 tips for beginners in python3
Sometimes I ask myself why I don't know how to "do this" in a simpler way in Python 3. When I seek answers, over time, of course I found code that is more concise, effective, and with fewer bugs. In general (not just this article), the total number of "those" tasks exceeds what I imagined, but here is the first batch of non-obvious features, later, I found more effective/simple/maintainable code.
Dictionary
Keys () and items () in the dictionary ()
You can perform many interesting operations in dictionary keys and items, which are similar to set ):
aa = {‘mike': ‘male', ‘kathy': ‘female', ‘steve': ‘male', ‘hillary': ‘female'} bb = {‘mike': ‘male', ‘ben': ‘male', ‘hillary': ‘female'} aa.keys() & bb.keys() # {‘mike', ‘hillary'} # these are set-likeaa.keys() - bb.keys() # {‘kathy', ‘steve'}# If you want to get the common key-value pairs in the two dictionariesaa.items() & bb.items() # {(‘mike', ‘male'), (‘hillary', ‘female')}
It's so concise!
Verify the existence of a key in the dictionary
How many times have you written the following code?
dictionary = {}for k, v in ls: if not k in dictionary: dictionary[k] = [] dictionary[k].append(v)
This code is not that bad, but why do you always need to use the if statement?
from collections import defaultdictdictionary = defaultdict(list) # defaults to listfor k, v in ls: dictionary[k].append(v)
This makes it clearer that there is no redundant and fuzzy if statement.
Update a dictionary with another dictionary
from itertools import chaina = {‘x': 1, ‘y':2, ‘z':3}b = {‘y': 5, ‘s': 10, ‘x': 3, ‘z': 6} # Update a with bc = dict(chain(a.items(), b.items()))c # {‘y': 5, ‘s': 10, ‘x': 3, ‘z': 6}
This looks good, but not concise. See if we can do better:
c = a.copy()c.update(b)
Clearer and more readable!
Obtain the maximum value from a dictionary
If you want to obtain the maximum value in a dictionary, it may be as follows:
aa = {k: sum(range(k)) for k in range(10)}aa # {0: 0, 1: 0, 2: 1, 3: 3, 4: 6, 5: 10, 6: 15, 7: 21, 8: 28, 9: 36}max(aa.values()) #36
This is valid, but if you need a key, you need to find the key based on the value. However, we can use zip to make the display more flat, and return the following key-value format:
max(zip(aa.values(), aa.keys()))# (36, 9) => value, key pair
Similarly, if you want to traverse a dictionary from the largest to the smallest, you can do this:
sorted(zip(aa.values(), aa.keys()), reverse=True)# [(36, 9), (28, 8), (21, 7), (15, 6), (10, 5), (6, 4), (3, 3), (1, 2), (0, 1), (0, 0)]
Open any number of items in a list
We can use *'s magic to get any items and put it in the list:
def compute_average_salary(person_salary): person, *salary = person_salary return person, (sum(salary) / float(len(salary))) person, average_salary = compute_average_salary([“mike”, 40000, 50000, 60000])person # ‘mike'average_salary # 50000.0
This is not so interesting, but if I tell you it can also be like the following:
def compute_average_salary(person_salary_age): person, *salary, age = person_salary_age return person, (sum(salary) / float(len(salary))), age person, average_salary, age = compute_average_salary([“mike”, 40000, 50000, 60000, 42])age # 42
It looks simple!
When you think of a dictionary with a string type key and a list value, instead of traversing a dictionary, you can process the value in sequence, you can use a flat display (set list in list), as shown below:
# Instead of doing thisfor k, v in dictionary.items(): process(v) # we are separating head and the rest, and process the values# as a list similar to the above. head becomes the key valuefor head, *rest in ls: process(rest) # if not very clear, consider the following exampleaa = {k: list(range(k)) for k in range(5)} # range returns an iteratoraa # {0: [], 1: [0], 2: [0, 1], 3: [0, 1, 2], 4: [0, 1, 2, 3]}for k, v in aa.items(): sum(v) #0#0#1#3#6 # Insteadaa = [[ii] + list(range(jj)) for ii, jj in enumerate(range(5))]for head, *rest in aa: print(sum(rest)) #0#0#1#3#6
You can decompress the list into head, * rest, tail, and so on.
Collections as counter
Collections is one of my favorite libraries in python. in python, apart from the original default, you should check this if you need other data structures.
Part of my daily basic work is to calculate a large number of words, not very important words. Some people may say that you can use these words as the keys of a dictionary and their respective values as values. When I am not exposed to Counter in collections, I may agree with your practice (yes, it is because Counter ).
Assume that the python Wikipedia you read is converted into a string and placed in a list (marked in order ):
import reword_list = list(map(lambda k: k.lower().strip(), re.split(r'[;,:(.s)]s*', python_string)))word_list[:10] # [‘python', ‘is', ‘a', ‘widely', ‘used', ‘general-purpose', ‘high-level', ‘programming', ‘language', ‘[17][18][19]']
It looks good so far, but if you want to calculate the word in this list:
from collections import defaultdict # again, collections!dictionary = defaultdict(int)for word in word_list: dictionary[word] += 1
This is not so bad, but if you have Counter, you will save your time to do more meaningful things.
from collections import Countercounter = Counter(word_list)# Getting the most common 10 wordscounter.most_common(10)[(‘the', 164), (‘and', 161), (‘a', 138), (‘python', 138),(‘of', 131), (‘is', 102), (‘to', 91), (‘in', 88), (‘', 56)]counter.keys()[:10] # just like a dictionary[‘', ‘limited', ‘all', ‘code', ‘managed', ‘multi-paradigm',‘exponentiation', ‘fromosing', ‘dynamic']
It is very concise, but if we look at the available methods included in Counter:
dir(counter)[‘__add__', ‘__and__', ‘__class__', ‘__cmp__', ‘__contains__', ‘__delattr__', ‘__delitem__', ‘__dict__',‘__doc__', ‘__eq__', ‘__format__', ‘__ge__', ‘__getattribute__', ‘__getitem__', ‘__gt__', ‘__hash__',‘__init__', ‘__iter__', ‘__le__', ‘__len__', ‘__lt__', ‘__missing__', ‘__module__', ‘__ne__', ‘__new__',‘__or__', ‘__reduce__', ‘__reduce_ex__', ‘__repr__', ‘__setattr__', ‘__setitem__', ‘__sizeof__',‘__str__', ‘__sub__', ‘__subclasshook__', ‘__weakref__', ‘clear', ‘copy', ‘elements', ‘fromkeys', ‘get',‘has_key', ‘items', ‘iteritems', ‘iterkeys', ‘itervalues', ‘keys', ‘most_common', ‘pop', ‘popitem', ‘setdefault',‘subtract', ‘update', ‘values', ‘viewitems', ‘viewkeys', ‘viewvalues']
Do you see the _ add _ and _ sub _ methods? Yes, Counter supports addition and subtraction. Therefore, if you have a lot of text to calculate words, you don't need Hadoop. You can use Counter (as map) and add them together (equivalent to reduce ). In this way, you have mapreduce built on Counter. You may want to thank me later.
Flat nested lists
Collections also has the _ chain function, which can be used as a flat nested lists.
from collections import chainls = [[kk] + list(range(kk)) for kk in range(5)]flattened_list = list(collections._chain(*ls))
Open two files at the same time
If you are processing a file (such as one line and one line) and writing these processed rows into another file, you may not be able to write them as follows:
with open(input_file_path) as inputfile: with open(output_file_path, ‘w') as outputfile: for line in inputfile: outputfile.write(process(line))
In addition, you can open multiple files in the same line, as shown below:
with open(input_file_path) as inputfile, open(output_file_path, ‘w') as outputfile: for line in inputfile: outputfile.write(process(line))
This is more concise!
Locate Monday from a pile of data
If you want to standardize data (for example, before or after Monday), you may be as follows:
import datetimeprevious_monday = some_date - datetime.timedelta(days=some_date.weekday())# Similarly, you could map to next monday as wellnext_monday = some_date + date_time.timedelta(days=-some_date.weekday(), weeks=1)
This is the implementation method.
Process HTML
If you want to crawl a website out of interest or interest, you may always face html tags. To parse various html tags, you can use html. parer:
from html.parser import HTMLParser class HTMLStrip(HTMLParser): def __init__(self): self.reset() self.ls = [] def handle_data(self, d): self.ls.append(d) def get_data(self): return ‘'.join(self.ls) @staticmethod def strip(snippet): html_strip = HTMLStrip() html_strip.feed(snippet) clean_text = html_strip.get_data() return clean_text snippet = HTMLStrip.strip(html_snippet)
If you just want to avoid html:
escaped_snippet = html.escape(html_snippet) # Back to html snippets(this is new in Python 3.4)html_snippet = html.unescape(escaped_snippet)# and so forth ...