Python3 Study Notes 4 --- reference http://python3-cookbook.readthedocs.io/zh_CN/latest,

Source: Internet
Author: User

Python3 Study Notes 4 --- reference http://python3-cookbook.readthedocs.io/zh_CN/latest,
Data Structure and algorithm (4)

1.16 filter sequence elements

The simplest method to filter sequence elements is to use list derivation. For example:

>>> mylist = [1, 4, -5, 10, -7, 2, 3, -1]>>> [n for n in mylist if n > 0][1, 4, 10, 2, 3]>>> [n for n in mylist if n < 0][-5, -7, -1]>>>

A potential defect derived from the list is that if the input is very large, a very large result set will be generated, occupying a large amount of memory. If you are sensitive to memory,

Then you can use the generator expression to iterate to generate filter elements. For example:

>>> pos = (n for n in mylist if n > 0)>>> pos<generator object <genexpr> at 0x1006a0eb0>>>> for x in pos:... print(x)...141023>>>

Sometimes, filtering rules are complex and cannot be expressed simply in list Derivation or generator expressions. For example, assume that exceptions or other complex situations need to be handled during filtering. In this case, you can put the filter code into a function and then use the built-infilter()Function. Example:

values = ['1', '2', '-3', '-', '4', 'N/A', '5']def is_int(val):    try:        x = int(val)        return True    except ValueError:        return Falseivals = list(filter(is_int, values))print(ivals)# Outputs ['1', '2', '-3', '4', '5']

List derivation and generator expressions are usually the easiest way to filter data. In fact, they can also convert data during filtering. For example:

>>> mylist = [1, 4, -5, 10, -7, 2, 3, -1]>>> import math>>> [math.sqrt(n) for n in mylist if n > 0][1.0, 2.0, 3.1622776601683795, 1.4142135623730951, 1.7320508075688772]>>>

A variant of the filter operation is to replace non-conforming values with new values rather than discard them. For example, in a column of data, you may not only want to find a positive number, but also want to replace a number that is not a positive number with a specified number. By placing a filter condition in a condition expression, you can easily solve this problem, as shown in the following figure:

>>> clip_neg = [n if n > 0 else 0 for n in mylist]>>> clip_neg[1, 4, 0, 10, 0, 2, 3, 0]>>> clip_pos = [n if n < 0 else 0 for n in mylist]>>> clip_pos[0, 0, -5, 0, -7, 0, 0, -1]>>>

Another noteworthy filtering tool isitertools.compress(), It usesiterableObject and a correspondingBooleanThe selector sequence is used as the input parameter. Then outputiterableThe corresponding selector in the object isTrue. This function is useful when you need to use another associated sequence to filter a sequence. For example, if you have the following two columns of data:

 

addresses = [    '5412 N CLARK',    '5148 N CLARK',    '5800 E 58TH',    '2122 N CLARK',    '5645 N RAVENSWOOD',    '1060 W ADDISON',    '4801 N BROADWAY',    '1039 W GRANVILLE',]counts = [ 0, 3, 10, 4, 1, 7, 6, 1]

Now you wantcountIf all the addresses with a value greater than 5 are output, you can do this:

>>> from itertools import compress>>> more5 = [n > 5 for n in counts]>>> more5[False, False, True, False, False, True, True, False]>>> list(compress(addresses, more5))['5800 E 58TH', '1060 W ADDISON', '4801 N BROADWAY']>>>

  

The key point here is to first createBooleanSequence, indicating which elements meet the conditions. Thencompress()The function selects the corresponding output Location Based on this sequenceTrue.

Andfilter()Functions are similar,compress()It is also an iterator returned. Therefore, if you need to get a list, you need to uselist()To convert the result to the list type.

1.17 extract subset from dictionary

 

prices = {    'ACME': 45.23,    'AAPL': 612.78,    'IBM': 205.55,    'HPQ': 37.20,    'FB': 10.75}# Make a dictionary of all prices over 200p1 = {key: value for key, value in prices.items() if value > 200}# Make a dictionary of tech stockstech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}p2 = {key: value for key, value in prices.items() if key in tech_names}

In most cases, the dictionary can be deduced by creating a sequence of tuples and then passing itdict()Functions can also be implemented. For example:

p1 = dict((key, value) for key, value in prices.items() if value > 200)

However, the dictionary derivation method is clearer and runs faster (in this example, the actual test is almost the samedcit()Function Method ).

Sometimes there are multiple ways to accomplish the same thing. For example, the second subprogram can be rewritten as follows:

 

# Make a dictionary of tech stockstech_names = { 'AAPL', 'IBM', 'HPQ', 'MSFT' }p2 = { key:prices[key] for key in prices.keys() & tech_names }

However, the running time test results show that this solution is about 1.6 times slower than the first solution. If you have high requirements on program running performance, you need to take some time for timing testing.

1.18 map names to sequence elements

 collections.namedtuple()Functions help you solve this problem by using a common tuples object. This function is actually a factory method that returns a subclass of the standard tuples in Python. You need to pass a type name and the field you need to give it, and then it will return a class, you can initialize this class, pass the value for the defined field, etc. Sample Code:

>>> from collections import namedtuple>>> Subscriber = namedtuple('Subscriber', ['addr', 'joined'])>>> sub = Subscriber('jonesy@example.com', '2012-10-19')>>> subSubscriber(addr='jonesy@example.com', joined='2012-10-19')>>> sub.addr'jonesy@example.com'>>> sub.joined'2012-10-19'>>>

AlthoughnamedtupleThe instance looks like a normal class instance, but it is interchangeable with the tuples. It supports all common tuples, such as index and decompression. For example:

>>> len(sub)2>>> addr, joined = sub>>> addr'jonesy@example.com'>>> joined'2012-10-19'>>>

 

One of the main purposes of name tuples is to free your code from subscript operations. Therefore, if you return a large list of tuples from a database call and perform subscript operations on the elements, when you add a new column to the table, your code may fail. However, if you use the name tuples, you will not have such concerns.

For clarity, the following code uses the common tuples:

def compute_cost(records):    total = 0.0    for rec in records:        total += rec[1] * rec[2]    return total

Subscript operations usually make the code table ambiguous and highly dependent on the record structure. The versions using the name tuples are as follows:

from collections import namedtupleStock = namedtuple('Stock', ['name', 'shares', 'price'])def compute_cost(records):    total = 0.0    for rec in records:        s = Stock(*rec)        total += s.shares * s.price    return total

Another purpose of the name tuples is to replace the dictionary because the dictionary storage requires more memory space. If you need to build a very large data structure that contains the dictionary, it is more efficient to use the name tuples. However, unlike the dictionary, a name tuples cannot be changed. For example:

>>> s = Stock('ACME', 100, 123.45)>>> sStock(name='ACME', shares=100, price=123.45)>>> s.shares = 75Traceback (most recent call last):File "<stdin>", line 1, in <module>AttributeError: can't set attribute>>>

If you really need to change the attribute value, you can use_replace()Method, it creates a new name tuples and replaces the corresponding fields with new values. For example:

>>> s = s._replace(shares=75)>>> sStock(name='ACME', shares=75, price=123.45)>>>

 _replace()Another useful feature of the method is that when your name tuples have optional or missing fields, it is a very convenient way to fill data. You can create a prototype that contains the default value, and then use_replace()Method To create an instance with a new value updated. For example:

 

from collections import namedtupleStock = namedtuple('Stock', ['name', 'shares', 'price', 'date', 'time'])# Create a prototype instancestock_prototype = Stock('', 0, 0.0, None, None)# Function to convert a dictionary to a Stockdef dict_to_stock(s):    return stock_prototype._replace(**s)

The following is how to use it:

>>> a = {'name': 'ACME', 'shares': 100, 'price': 123.45}>>> dict_to_stock(a)Stock(name='ACME', shares=100, price=123.45, date=None, time=None)>>> b = {'name': 'ACME', 'shares': 100, 'price': 123.45, 'date': '12/17/2012'}>>> dict_to_stock(b)Stock(name='ACME', shares=100, price=123.45, date='12/17/2012', time=None)>>>

Finally, if your goal is to define an efficient data structure that requires updating many instance attributes, the name tuples are not your best choice. At this time, you should consider defining__slots__Method class

1.19 convert and calculate data simultaneously

A very elegant way to combine data computing and conversion is to use a generator expression parameter. For example, if you want to calculate the sum of squares, you can do the following:

nums = [1, 2, 3, 4, 5]s = sum(x * x for x in nums)

Here are more examples:

 

# Determine if any .py files exist in a directoryimport osfiles = os.listdir('dirname')if any(name.endswith('.py') for name in files):    print('There be python!')else:    print('Sorry, no python.')# Output a tuple as CSVs = ('ACME', 50, 123.45)print(','.join(str(x) for x in s))# Data reduction across fields of a data structureportfolio = [    {'name':'GOOG', 'shares': 50},    {'name':'YHOO', 'shares': 75},    {'name':'AOL', 'shares': 20},    {'name':'SCOX', 'shares': 65}]min_shares = min(s['shares'] for s in portfolio)

A small list may have nothing to do with it, but if the number of elements is very large, it creates a huge temporary data structure that is discarded only once. The generator scheme will convert data in an iterative manner, thus saving more memory.

When using clustering functions suchmin()Andmax()You may be more inclined to use the generator version, and the key keyword parameters they accept may be very helpful to you. For example, in the securities example above, you may consider the following implementation version:

# Original: Returns 20min_shares = min(s['shares'] for s in portfolio)# Alternative: Returns {'name': 'AOL', 'shares': 20}min_shares = min(portfolio, key=lambda s: s['shares'])

1.20 Merge multiple dictionaries or Mappings

Assume that you have the following two dictionaries:

a = {'x': 1, 'z': 3 }b = {'y': 2, 'z': 4 }

Now, assume that you must perform the search operation in two dictionaries (for examplea.b). A very simple solution is to usecollectionsModuleChainMapClass. For example:

from collections import ChainMapc = ChainMap(a,b)print(c['x']) # Outputs 1 (from a)print(c['y']) # Outputs 2 (from b)print(c['z']) # Outputs 3 (from a)

OneChainMapIt accepts multiple dictionaries and logically converts them into one dictionary. Then, these dictionaries are not actually merged,ChainMapClass only creates a list containing these dictionaries internally and redefines some common dictionary operations to traverse the list. Most dictionary operations can be used normally, for example:

>>> len(c)3>>> list(c.keys())['x', 'y', 'z']>>> list(c.values())[1, 2, 3]>>>

  

If duplicate keys appear, the ing value that appears for the first time will be returned. Therefore, in the example programc['z']Always Returns the dictionaryaInsteadb.

Updating or deleting a dictionary always affects the first dictionary in the list. For example:

>>> c['z'] = 10>>> c['w'] = 40>>> del c['x']>>> a{'w': 40, 'z': 10}>>> del c['y']Traceback (most recent call last):...KeyError: "Key not found in the first mapping: 'y'">>>

 ChainMapFor range variables in programming languages (for exampleglobals,locals) Is very useful. In fact, there are some ways to make it simple:

>>> values = ChainMap()>>> values['x'] = 1>>> # Add a new mapping>>> values = values.new_child()>>> values['x'] = 2>>> # Add a new mapping>>> values = values.new_child()>>> values['x'] = 3>>> valuesChainMap({'x': 3}, {'x': 2}, {'x': 1})>>> values['x']3>>> # Discard last mapping>>> values = values.parents>>> values['x']2>>> # Discard last mapping>>> values = values.parents>>> values['x']1>>> valuesChainMap({'x': 1})>>>

IsChainMapYou may consider usingupdate()Method to merge the two dictionaries. For example:

>>> a = {'x': 1, 'z': 3 }>>> b = {'y': 2, 'z': 4 }>>> merged = dict(b)>>> merged.update(a)>>> merged['x']1>>> merged['y']2>>> merged['z']3>>>

This can also work, but it requires you to create a completely different dictionary object (or destroy the existing dictionary structure ). At the same time, if the original dictionary is updated, this change will not be reflected in the new merged dictionary. For example:

 ChainMapUse the original dictionary. It does not create a new dictionary. Therefore, it does not produce the above results, such:

>>> a = {'x': 1, 'z': 3 }>>> b = {'y': 2, 'z': 4 }>>> merged = ChainMap(a, b)>>> merged['x']1>>> a['x'] = 42>>> merged['x'] # Notice change to merged dicts42>>>

  

 

  

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.