The flexibility of dynamic language is a double-edged sword--take the Python language as an example

Source: Internet
Author: User
Tags shallow copy

In this article, there are a few odds and ends, including two questions: (1) The variable object (the most common is List dict) was accidentally modified, (2) the parameter (parameter) check problem. Both of these issues are essentially due to the nature of dynamic language (dynamic type language), and the benefits of dynamic languages are not described in detail, and this article is to discuss some of the problems that are caused by the dynamics-this flexibility.

What is dynamic programming language, which is relative to static languages, defers many of the static Language compilation (compilation) periods to the runtime, modifying the behavior of the code at run time, such as adding new objects and functions, Modify the functionality of the existing code to change the type. Most dynamic languages are dynamic Typed, so-called dynamic types, which determine the data type at run time, do not require a type declaration before a variable is used, and usually the type of the variable is the type of the value being assigned. Python is a typical dynamic language.

The charm of dynamic languages is that developers are better able to focus on the problems they need to solve, rather than the miscellaneous language specifications, or write a class without doing anything. The behavior of changing code at run time is also useful, such as Python's hot update, which can be used to replace the logic of the code without shutting down the server, while static languages such as C + + are difficult to do. Some of the more complex points that I use in C + + and python,c++, such as templates (generic programming), design patterns (such as template method), are very natural to use in Python. I've seen some articles that show that design patterns are often patches for specific static languages-to compensate for language flaws or limitations.

The author's knowledge level is far from enough to evaluate the merits and demerits of dynamic language and static language. This article is also only recorded when I use Python's dynamic language, due to the flexibility of the language, due to the dynamic type, stepping on the pits, a little thinking, and confusion.

This address: http://www.cnblogs.com/xybaby/p/7208496.html

First question: The Mutable object was mistakenly changed

This is a bug that happened on the online environment.

It is easy to say afterwards that the server data (placed inside the dict) has been accidentally modified, but it also took a lot of time to verify that the pseudocode is as follows:

1 def Routine (DCT): 2     if high_propability: 3         SUB_ROUTINE_NO_CHANGE_DCT (DCT)4     Else : 5         SUB_ROUTINE_WILL_CHANGE_DCT (DCT)

The code is simple, the DCT is a dict, the maximal probability calls a sub-function that does not modify the DCT, the minimum probability will be called to the possible modification of the DCT sub-function. The problem is that the parameter calling the routine function is a server-side global variable that cannot theoretically be modified. Of course, the above code is simple enough to see the problem, but in the real world, the call chain has seven or eight layers, and, in the routine of the function of the doc, the declaration does not modify the DCT, the function itself does not modify the DCT, but the call of the child function or sub-function sub-function does not adhere to this Convention.

Look at this question from the Python language features

This section explains why the above code is a problem, simply two: Dict is the Mutable object, the Dict instance passes into the function as a parameter, and is modified by the function.

Everything in Python is an object (Evething is Object), whether it's an int str dict or a class. For example a =5, 5 is an integer type of object (instance); What is a, A is 5 object? No, A is just a name, and the name is temporarily pointing (bound, mapped) to 5 of this object. What does B = A mean, B is the object that points to a, that is, a, b all point to the integer 5 object

So what is mutable what is immutable, mutable is that this object can be modified, immutable is that the object is not modifiable (nonsense). Let's see what the official Python says.

Mutable objects can change their value but keep their id() .

Immutable:an object with a fixed value. Immutable objects include numbers, strings and tuples. Such An object cannot is altered. A new object has to is created if a different value has to be stored. They play an important role in places where a constant hash value is needed, for example as a key in a dictionary.

To take the above example (A = 5), the int type is immutable, you may say wrong, for example to a assignment, a=6, now A is not become 6? Yes, a now "becomes" 6, but the essence is that a points to 6 of this object--a no longer points to 5

The only criterion for examining objects is that the Id,id function returns the address of the object, each with a unique address. Look at the following two examples to find out.

>>> a = 5;id (a)35170056>>> a = 6;id (a)35170044>>> LST = [n/a]; ID (LST)39117168>>> Lst.append (4); ID (LST)39117168Alternatively, for non-mutable objects, there is no way to change the value on the memory address of the object during the lifetime of the object.

In Python, immutable objects include: int, long, float, bool, str, tuple, Frozenset, and other Dict list-custom objects are mutable objects. Note : STR is also an immutable object, which is why it is recommended to use join instead of + when working with multiple string connections

And Python has no mechanism to allow a mutable object to be modified (the analogy here is the const in C + +)

  Dict is a mutable object!

  

In Python, what does it mean to pass a parameter when calling a function, to pass a value, to pass a reference? In fact, it's not true, I don't know if there is a professional and unified statement, but the simple understanding is that the formal parameter (parameter) and the argument (argument) all point to the same object, that's all. Take a look at the following code:

  

1 defDouble (v):2     Print 'argument before', id (v)3V *= 24     Print 'argument after', id (v)5     returnv6 7 deftest_double (a):8     Print 'parameter Bdfore', ID (a), a9 double (a)Ten     Print 'parameter after', ID (a), a One  A if __name__=='__main__': -     Print 'test_double with int' -Test_double (1) the     Print 'test_double with List' -Test_double ([1])

Operation Result:

test_double with int
parameter Bdfore 30516936 1
argument before 30516936
argument after 30516924
parameter after 30516936 1


Test_double with List
Parameter Bdfore 37758256 [1]
Argument before 37758256
Argument after 37758256
Parameter after 37758256 [1, 1]

As you can see, a,v points to the same object (the same ID) when the child function double is just entered. For the example of test int, V is pointing to another object because of v*=2, but it has no effect on argument a. For Testlst, v*=2 modifies the object that V points to by V (which is also the object that a points to), so the object content that a points to has changed after the function call is complete.

How to prevent mutable objects from being incorrectly modified by functions:

To prevent the Mutable objects passed into the child function from being modified, it is easiest to copy a piece of data using the Copy module. Specifically, including Copy.copy, Copy.deepcopy, the former is a shallow copy, the latter is a deep copy. The difference between the two is:

The difference between shallow and deep copying are only relevant for compound objects (objects this contain other objects, Like lists or class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts Referen Ces into it-the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies Into it's the objects found in the original.

In simple terms, a deep copy is recursively copied, traversing any compound object and copying it, for example:

>>> LST = [1, [2]]
>>> Import Copy
>>> lst1 = copy.copy (LST)
>>> lst2 = copy.deepcopy (LST)
>>> Print ID (lst[1]), ID (lst1[1]), ID (lst2[1])
4402825264 4402825264 4402988816
>>> Lst[1].append (3)
>>> Print LST, lst1,lst2
[1, [2, 3]] [1, [2, 3]] [1, [2]]

The limitations of shallow copy can be seen from the example, in Python, the basic structure of the object is also a shallow copy, such as DCT = {1: [1]}; Dct1 = Dict (DCT)

It is because of the difference in nature between a shallow copy and a deep copy that the performance costs vary greatly, even for the objects being copied:

  

1 ImportCopy2 deftest_copy (INV):3returncopy.copy (INV)4 deftest_deepcopy (INV):5returncopy.deepcopy (INV)6DCT = {str (i): I forIinchXrange (100)}7 8 deftimeit_copy ():9ImportTimeitTen  OnePrintTimeit. Timer ('test_copy (DCT)','From __main__ import test_copy, DCT'). Timeit (100000) APrintTimeit. Timer ('test_deepcopy (DCT)','From __main__ import test_deepcopy, DCT'). Timeit (100000) -  - if__name__=='__main__': theTimeit_copy ()

Operation Result:

1.19009837668113.11954377

In the above example, the values of the DCT dict are of type int, the immutable object, because the deep copy effect is the same regardless of the shallow copy, but the time difference is huge. If there is a custom object in the DCT, the difference will be greater

For security reasons, a deep copy should be used; for performance, a shallow copy should be used. If the compound object contains elements that are immutable, then a shallow copy is both safe and efficient, but, for Python, a highly flexible language, it is likely that someone will join a mutable element one day.

Good API

The good API should be easy-to-use, hard-to-use wrong. The API should provide a contract that the API can achieve the desired effect if the user invokes it in a specific way.

In static languages such as C + +, function signing is the best contract.

In C + +, parameter passing has approximately three forms, passing values, passing pointers, and passing references (where rvalue references are not considered here). Pointers and references, while showing formal differences, are similar in effect, so the main consideration here is the value of the pass and the reference. For example, the following four function signatures:

int func (int a) int func (const int a) int func (int &a) int func (const int &a)

For the 1th and 2 functions, it is the same for the caller, because copies are made (deep copies), regardless of how the Func function is manipulated, without affecting the arguments. The difference between the two is whether a can be modified in the function, such as the ability to write a *= 2.

The 3rd function, a non-const reference, any modification to a will affect the argument. The caller sees the API and knows the expected behavior: the function changes the value of the argument.

A 4th function, a const reference, a function promise never modifies an argument, so the caller can confidently pass the reference without copying it.

From the above API, you can see that through the function signature, the caller can know that the function call has no effect on the parameters passed in.

Python is a dynamic type check, and there is no way to do any checking of parameters except when running. Some say that you can implement a contract by using a Python doc or variable name, such as:

def func (Dct_only_read):

"" "Param:dct_only_read would be only read, never upate " ""

But people are unreliable and unreliable, perhaps in the function of the sub-function (sub-function of child functions, ...). ) will modify this dict. What to do, Force copy (deepcopy) on the mutable type, but the copy is very time consuming ...

Second question: parameter checking

The previous section shows how uncomfortable it is for a function caller to have no signature, and this section shows how uncomfortable it is for a function provider to have no signature. No type check real egg pain, I have met someone for convenience, to a convention is an int type parameter passed in an int list, and terrible is the code does not error, just behave abnormal.

Take a look at an example:

1 def func (ARG): 2     if ARG: 3         Print ' Do lots of things here ' 4     Else : 5         Print ' Do anothers '

The code above is so bad that it is impossible to "look at the name," or any information about the parameter arg. But in fact such code is there, and there are more serious than this, such as trickery.

Here's a question that the function expects Arg to be some type, and whether it should write code to judge it, such as:isinstance (ARG, str). Because there is no compiler static to do the parameter check, then do not check, how to check is completely the function of the provider of things. If checked, then affect performance, it is also easy to violate Python's flexibility-duck typing, do not check, and easy to misuse.

But here, consider another question, look at the second line of code: if arg. In Python, almost everything can be evaluated as a Boolean expression, where arg can be any Python object, which can be bool, int, dict, list, and any custom object. Different types of "true" conditions, such as the value type (int float) is not 0 is true, the sequence type (str, list, dict) is true, and for custom objects, in python2.7 is to see whether __nonzero__, __len__ , if both functions are undefined, the boolean evaluation of the instance must return True.

In PEP8, the following specification for Boolean evaluation of sequences:

For sequences, (strings, lists, tuples), use the fact, empty sequences is false.

Yes:if not seq:     if Seq:No:if len (seq):    if not len (seq):

There is also a section in Google Python styleguide specifically about bool expressions that indicate " use implicit false whenever possible." For the sequence, the recommended method of judging is the same as PEP8, and the other two points are more interesting:

1 if you need to distinguish between false and none, you should use if not x and x is not None: a statement like this.

2 when working with integers, using implicit false may outweigh the gains (i.e. accidentally treating none as a%). You can compare a value with 0 that is known to be an integral type (and not a return result of Len ()).

2nd I personally agree, but 1th is very awkward, because such a statement is not intuitive, difficult to express its true purpose.

In Pep20 the Zen of Python, it is noted that:

Explicit is better than implicit.

This sentence is simple but practical! Code is written to people, and the intent to express the code clearly is more important than anything else. Perhaps some people think the code is very complex and obscure, such as Python nested several layers of list comprehension, and do not know this harm and harm himself.

Back to the question of Boolean expression evaluation, I think a lot of times direct use if arg: This form is not a good idea, because it is not intuitive and error-prone. For example, if the parameter is of type int,

def Handle_age (age):     if  not Age :         return    # Do lots with age

It is difficult to say when age=0 is not a reasonable input, the above code on the None, 0 non-discriminatory, look at the code of the person can not understand whether the incoming 0 is correct.

Another controversial example is the boolean evaluation of the sequence, which is recommended for direct use of the If SEQ: form, but this form violates the "Explicit is better than implicit." Because it simply cannot distinguish between none and the empty sequence. And this is often the difference between, many times, an empty sequence is a reasonable input, and none is not. This issue, StackOverflow also has a related discussion on "How to check the list is empty", admittedly, if it is written as Seq = = [] is not so good code, because it is not so flexible-if the SEQ is a tuple type code will not work. The Python language is a typical duck typing, and no matter what type you pass in, as long as you have the appropriate function, the code works, but it is entirely up to the user to work correctly. Individuals feel that there are broad constraints, such as the ABC in Python (abstract base class), both to meet the requirements of flexibility, and then to do some specification check.

Summarize

The above two questions are two of the many problems I have encountered since I used the Python language, and I have fallen twice in the same place. The Python language is known for its development efficiency, but I think a good specification is needed to ensure that it is used in large online projects. Moreover, I also tend to assume that people are unreliable and do not always adhere to the proposed specification, and do not update docstring after each code change ...

Therefore, in order to ensure the sustainable development of code, the following points need to be

First: Develop and comply with code specifications

Code specifications are best developed when the project starts, and can be referenced by PEP8 and Google Python styleguild. Many times the style is not good or bad, but it is important to ensure consistency within the project. And keep regular review, the new review!

Second: Static code Analysis

As long as the static can be found in the bug do not put on the line, such as the parameters, the return value of the check, in python3.x can use annotations (Function Annotations), python2.x can also encapsulate decorator to do their own inspection. For code behavior, you can either use Coverity's tall business software, or PYSONAR2, or use AST to write a simple check code.

Third: Unit Testing

The importance of unit testing presumably everyone knows that in Python there are official doctest, unittest, and many more powerful frameworks, such as nose, mock.

IV: 100% coverage test

For the dynamic language of Python, there is little execution code, and there is almost no better way to check the code errors, so coverage testing is very important. You can use Python native Sys.settrace, Sys.gettrace, and more advanced tools such as coverages.

Although I have been writing python for a few years, I still lack it in Python usage specifications. I also do not know in other companies, projects, how to use good python, how to avoid weaknesses. Welcome Pythoner Message Guide!

References

Dynamic programming language

instagram-pycon-2017

https://www.python.org/dev/peps/pep-0008/

Google python styleguide

The Zen of Python

Best-way-to-check-if-a-list-is-empty

The flexibility of dynamic language is a double-edged sword--take the Python language as an example

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.