A summary of the experience of transplanting items from python2.x to python3.x _python

Source: Internet
Author: User
Tags base64 eval iterable modifiers

After the pain of transplanting jinja2 to Python3, I put the project aside for a while because I was afraid of breaking python3 compatibility. My approach is to use only one Python2 code base, and then translate the 2to3 tool into Python3 at installation time. Unfortunately, even a little bit of change will break the iterative development. If you choose the Python version, you can concentrate on your work and be lucky to avoid the problem.

Thomas Waldmann from the MoinMoin project ran jinja2 through my python-modernize and unified the code base to run python2,6,2,7 and 3.3 at the same time. With just a little cleanup, our code is clear and running on all the Python versions, and it doesn't look like normal python code.

Inspired by him, I read the code over and over again and began merging other code to enjoy the thrill of a unified code base.

Let me share some tips to get a similar experience.

Give up Python 2.5 3.1 and 3.2

This is the most important point, it is easier to give up 2.5, because now basically no one to use, give up 3.1 and 3.2 also not too big problem, should be for the present python3 use of the people is very little pitiful. But why did you give up these versions? The answer is 2.6 and 3.3 have a lot of crossover. Syntax and features, the code can be compatible with these two versions.

    • String compatible. 2.6 and 3.3 support the same string syntax. You can use "foo" to represent the native string (2.x indicates that byte,3.x is Unicode), and U "foo" represents a Unicode string, and B "Foo" represents a native string or an array of bytes.
    • Print functions are compatible, and if you have fewer print statements, you can add "from __future__ import print_function" and then start using the print function instead of binding it to another variable to avoid a tricky problem.
    • The exception syntax that is compatible. The "except Exception as E" syntax introduced in Python 2.6 is also a 3.x exception capture syntax.
    • Class modifiers are valid. This can be used to modify the interface without leaving traces in the definition of the class structure. For example, you can modify the name of the iteration method by changing the next to __next__ or by changing __str__ to __unicode__ to be compatible with Python 2.x.
    • Built-in next calls __next__ or next. This is useful because they are about the same speed as the direct call method, so you don't have to think too much to join the runtime to check and wrap a function.
    • Python 2.6 joins the same ByteArray type as the Python 3.3 interface. This is also useful because 2.6 does not have a 3.3 byteobject type, although there is a built-in name but that is only the alias of STR, and the use of habits varies greatly.
    • Python 3.3 adds byte to byte and string to string encoding and decoding, which has been removed in 3.1 and 3.2, unfortunately, their interface is very complex, the alias is gone, but at least more than the previous 2.x version closer.

The last point is useful when encoding and decoding the stream, which was removed at 3.0 and not recovered until 3.3.


Yes, the six module allows you to go a little further, but don't underestimate the meaning of code cleanliness. During the Python3 transplant, I almost lost interest in JINJA2 because the code started abusing me. Even if you can unify the code base, but still look uncomfortable, the impact of visual (six.b (' foo ') and six.u (' foo ') fly around) will also be caused by the use of 2to3 iterative development of unnecessary trouble. Do not have to deal with these troubles, go back to the happy enjoyment of the code. JINJA2 Now the code is very clear, you do not have to beware of python2 and 3 compatibility issues, but there are some places to use such statements: if PY2:.

Then let's assume that these are the Python versions you want to support and try to support python2.5, which is a painful thing, and I strongly suggest you give it up. Support 3.2 There is also a bit of possibility that if you can wrap up the string when you call the function, consider the aesthetic and the performance, I do not recommend it.

Skip Six.

Six is a good thing, JINJA2 began to use, but finally did not give force, because the transplant to python3 really need it, but there are some features lost. You do need six, and if you want to support python2.5 at the same time, it's not necessary to use six since 2.6, JINJA2 has a compatible module that contains assistants. Includes few Python3 code, and the entire compatibility module is less than 80 lines.

Because other libraries or projects rely on the library for reasons that users want you to support different versions, this is six does save you a lot of trouble.

Start using modernize

Using python-modernize to transplant Python is a great way to generate code when he runs like 2to3. Of course, he still has a lot of bugs, the default option is not very reasonable, can avoid some annoying things, but you go farther. But you also need to check the results to get rid of some of the import statements and the discordant stuff.
Repair Test

Run the test before doing anything else and make sure the test passes. The python3.0 and 3.1 standard libraries have many problems that are caused by erratic test-habit changes.

Write a compatible module

So you're going to skip the six, can you completely get out of the help document? The answer is, of course, negative. You still need a small compatible module, but it's small enough that you can just put it in your package, and here's a basic example of what a compatible module looks like:

Import sys
PY2 = sys.version_info[0] = = 2
if not PY2:
 text_type = str
 string_types = (str,)
 UNICHR = C HR
Else:
 text_type = unicode
 string_types = (str, unicode)
 UNICHR = UNICHR

The exact content of that module depends on how much actual change you have. In Jinja2, I put a bunch of functions here. It includes IFilter, IMAP, and similar itertools functions, all of which are built into the 3.x. (I'm pestering the Python 2.x function to make it clearer to the reader that the iterator behavior is built-in rather than flawed).

Test for the 2.x version rather than 3.x

In general, the version of Python you are using now is in the 2.x or 3.x versions that need to be checked. In this case I recommend that you check whether the current version is Python2 and put the Python3 in a different branch of the judgment. So when Python4 comes out, the "surprise" you get will have a smaller impact on you.

A good deal:

If PY2:
 def __str__ (self): return
  self.__unicode__ (). Encode (' Utf-8 ')

In contrast to a passable treatment:


If not PY3:
 def __str__ (self): return
  self.__unicode__ (). Encode (' Utf-8 ')

String handling
the biggest change in Python 3 is undoubtedly a change to the Unicode interface. Unfortunately, these changes are painful in some places and have been handled inconsistently throughout the standard library. Most of the porting of time functions associated with string processing will be completely abolished. The topic of string processing itself can be written as a complete document, but here are the simple little copies of the transplant Jinja2 and Werkzeug:

The string in this form of ' Foo ' is always referred to as a native string. This string can be used in identification character, source code, filename, and other underlying functions. In addition, in 2.x, as long as the limit of this string can only use ASCII characters, it is allowed as a Unicode string constant.
This attribute is useful for the unified coding base, since Python 3 is in the normal direction of introducing Unicode to some interfaces that did not previously support Unicode, but this is never the case in turn. Because this string constant is "upgraded" to Unicode, and the 2.x still supports Unicode to some extent, this string constant can be used in any way.
For example, the Datetime.strftime function does not support Unicode strictly in Python2, and Unicode is supported only in 3.x. However, because the return value on 2.x is only ASCII encoding in most cases, functions like this do work well on both 2.x and 3.x.

 >>> u ' <p>current time:%s '% Datetime.datetime.utcnow (). Strftime ('%h:%m ')
 u ' <p>current time : 23:52 '

The string passed to Strftime is a native string (byte in 2.x and Unicode in 3.0). The return value is also a native string and is simply an ASCII-encoded character. So once the string is formatted on the 2.x and 3.x, the result must be a Unicode string.
U ' foo ' This form of string always refers to a Unicode string, and many of the 2.x libraries already have very good Unicode support, so such string constants should not be surprising to many people.

B ' Foo ' This form of string always refers to a string that is stored in bytes only. Since 2.6 does not have a byte object similar to Python 3.3, and Python 3.3 lacks a real byte string, the usability of this constant is really limited. When a byte array object with the same interface on 2.x and 3.x is bound together, it becomes more usable immediately.
Since such strings can be changed, the changes to the original byte are very effective, and then you again encapsulate the final result by using inbytes () to convert the result to a more readable string.


In addition to these basic rules, I have added text_type,unichr and string_types variables to my compatibility module above. There have been big changes through these:

    • Isinstance (x, basestring) becomes isinstance (x, String_types).
    • Isinstance (x, Unicode) becomes isinstance (x, Text_type).
    • Isinstance (x, str) to indicate the intent of capturing bytes, now becomes isinstance (x, bytes) or isinstance (x, (bytes, ByteArray)).

I also created a implements_to_string decoration class to help implement classes with __unicode__ or __STR__ methods:

If PY2:
 def implements_to_string (CLS):
  cls.__unicode__ = cls.__str__
  cls.__str__ = lambda x:x.__unicode__ (). Encode (' Utf-8 ') return
  cls
else:
 implements_to_string = lambda x:x

The idea is that you just press 2.x and 3.x to implement __STR__, let it return a Unicode string (yes, it looks a little strange in 2.x), the decoration class will automatically rename it to __unicode__ in 2.x, and then add a new __str__ to call __ UNICODE__ and returns the return value with UTF-8 encoding. In the past, this pattern has become quite common in 2.x modules. For example, JINJA2 and Django are used in this way.

Here is an example of this usage:

@implements_to_string
class User (object):
 def __init__ (self, username):
  self.username = Username
 def __str__ (self): return
  Self.username

Changes to the META class syntax
because Python 3 changes the syntax for defining the Meta class and invokes the Meta class in an incompatible way, this makes the migration a little harder than the unchanged one. Six has a with_metaclass function to solve this problem, but it produces a virtual class in the inheritance tree. The solution made me very uncomfortable with the Jinjia2 transplant, and I modified it slightly. This external API is the same, except that this method uses a temporary class to connect to the Meta class. The advantage is that you don't have to worry about performance when you use it and keep your inheritance tree perfect.
Such code is a bit difficult to understand. The basic idea is to take advantage of the idea that a meta class can customize the creation of a class and be selected by its parent class. This particular workaround is to use the Meta class to delete its own parent class from the inheritance tree during the creation of the subclass. The end result is that this function creates a virtual class with a virtual meta class. Once you have finished creating the virtual subclass, you can use the virtual meta class, and the virtual metamodel must have a construction method that creates a new class from the original parent class and the real-existence meta class. In this case, a class that is both a virtual class and a virtual meta class never appears.
The workaround looks like this:

def with_metaclass (Meta, *bases):
 class Metaclass (meta):
  __call__ = type.__call__ __init__
  = Type.__init_ _
  def __new__ (CLS, name, this_bases, D):
   if This_bases is None: Return
    type.__new__ (CLS, Name, (), D)
   ret Urn Meta (name, bases, D) return
 metaclass (' Temporary_class ', None, {})
below is how you use it:
 
class BaseForm (object) :
 Pass
 
class FormType (type):
 Pass
 
class Form (With_metaclass (FormType, BaseForm)):
 Pass


Dictionaries
one of the more annoying changes in Python 3 is the change to the dictionary iteration protocol. All dictionaries in Python2 have the keys (), values () and items () that return the list, and the Iterkeys (), Itervalues (), and Iteritems () that return the iterator. In Python3, none of the above methods exist. Instead, these methods are replaced with a new method that returns a View object.

The keys () return to the key view, which behaves like a read-only collection, and values () returns a read-only container and can be iterated (not an iterator!). , and items () return a read-only collection object for the class. Unlike a normal collection, however, it can also point to objects that are easily changed, in which case some methods will fail at run time.

On the positive side, because many people don't understand the view is not an iterator, in many cases, you just have to ignore it.
Werkzeug and Dijango implement a large number of custom dictionary objects, and in both cases the decision is simply to ignore the existence of the View object, and then let the keys () and their friends return to the iterator.

Because of the limitations of the Python interpreter, this is the only reasonable thing to do right now. But there are a few problems:

    • The fact that the view itself is not an iterator means that you do not normally have sufficient reason to create a temporary object.
    • The class set behavior of the built-in dictionary view is not replicated in pure python because of the limitations of the interpreter.
    • The implementation of the 3.x view and the implementation of the 2.x iterator mean that there is a large number of duplicate code.

The following are the cases where the JINJA2 encoding library often has iterations of a dictionary:

If PY2:
 Iterkeys = Lambda D:d.iterkeys ()
 itervalues = Lambda d:d.itervalues ()
 iteritems = Lambda d:d.iterit EMS ()
else:
 Iterkeys = Lambda d:iter (D.keys ())
 itervalues = Lambda d:iter (d.values ())
 Iteritems = Lambda D:iter (D.items ())

To implement a dictionary of similar objects, class modifiers are again a viable method:

If PY2:
 def implements_dict_iteration (CLS):
  Cls.iterkeys = cls.keys
  cls.itervalues = cls.values
  Cls.iteritems = cls.items
  Cls.keys = Lambda x:list (X.iterkeys ())
  cls.values = Lambda x:list (x.itervalues ()) 
   cls.items = Lambda x:list (x.iteritems ()) return
  cls
else:
 implements_dict_iteration = lambda x:x

In this case, all you have to do is implement the keys () and the Friend method as iterators, and the rest will be done automatically:

@implements_dict_iteration
class Mydict (object):
 ...
 
 def keys (self):
  for key, value in Iteritems (self):
   yield key
 
 def values (self):
  for key, value in ITER Items (self):
   yield value
 
 def items (self):
  ...

Changes to Universal iterators
because the iterator is changed in a general way, you need a little help to make this change painless. The real only change is the conversion from next () to __next__. Fortunately, this change has been transparently handled. The only thing you really need to change is the change from X.next () to Next (x), and the rest is done by the language.

If you plan to define an iterator, then the class modifier becomes a viable method again:

If PY2:
 def implements_iterator (CLS):
  cls.next = cls.__next__
  del cls.__next__ return
  CLS
else:
 implements_iterator = lambda x:x in
order to implement such a class, as long as the iteration step method is defined in all versions __next__ is OK:
 
@implements_iterator
Class Uppercasingiterator (object):
 def __init__ (self, iterable):
  self._iter = iter (iterable)
 def __ Iter__ (self): return
  self
 def __next__ (self): return
  Next (self._iter). Upper ()

Converting codecs
one of the good features of the Python 2 Encoding protocol is that it is not dependent on the type. If you are willing to convert a CSV file to a numpy array, then you can register an encoder like this. However, since the main public interface of the encoder is closely related to the string object, this feature is no longer known to everyone. Many of these features have been removed due to the more stringent codec converted in 3.x, but were later reintroduced in 3.3 because of the proven conversion codec. Basically, all Unicode to byte conversions or codecs of the opposite conversion are not available until 3.3. The hex and Base64 codes are among these codecs.

Here are two examples of using these encoders: One is an operation on a string and one is a stream based operation. The former is known as the 2.x Str.encode (), but if you want to support both 2.x and 3.x, then because of the change of the string API, now looks a little different:

>>> Import Codecs
>>> Codecs.encode (b ' hey! ', ' Base64_codec ')
' sgv5iq==\n '

Again, you will notice that in 3.3, the encoder does not understand the alias, requiring you to write the encoding alias "Base64_codec" instead of "base64".

(We prefer to choose these codecs instead of the functions in the Binascii module, because by adding encoding and decoding to these encoders, we can support the increased encoding based on stream operations.) )

Other considerations
There are still a few places where I haven't had a good solution, or it's often annoying to deal with these places, but there will be fewer and less of these places. Unfortunately, some of these places are now part of the Python 3 API and are hard to spot until you trigger a marginal situation to discover it.

Processing file systems and file IO Access on Linux is still annoying, because it is not based on Unicode. The Open () function and the layers of the file system have the default options specified by the dangerous platform. For example, if I ssh from a de_at machine to a en_us machine, Python would like to fall back to the ASCII code for file system and file operations.

I've noticed that usually the most reliable way to work with text on the Python3 is to open the file in binary mode only, and then decode it explicitly. Alternatively, you can use the Codec.open or Io.open functions on the 2.x and the open function with the encoded arguments built into Python 3.

URLs in the standard library cannot be properly represented in Unicode, which makes some URLs not handled correctly in 3.x.

Because the syntax is changed, tracing the exception generated by the object requires an auxiliary function. This is usually very rare and easy to handle. Here is one of the situations where you have changed the syntax, in which case you will have to move the code into the exec block.


 If PY2:
  exec (' def reraise (TP, value, TB): \ Raise TP, value, TB ')
 else:
  def reraise (TP, value, TB):
   Raise Value.with_traceback (TB)

If you have some code that relies on a different syntax, it is often useful to use the preceding exec technique. But now that exec has a different syntax, you can't use it to perform any of the namespace operations. The code snippet given below is not a big problem because eval, which uses compile as an embedded function, runs on two versions. Alternatively, you can start an EXEC function by using exec itself.

Exec_ = lambda s, *a:eval (compile (S, ' <string> ', ' exec '), *a)

If you write the C module on the Python C API, commit suicide. From the moment I knew that the fairies still had no tools to handle the problem, and many things had changed. Take this opportunity to discard the method you used to construct the module, and then rewrite the module on Cffi or cTYPES. If this method is not enough, because you are a bit stubborn, then only accept such pain. Maybe trying to write something nasty on the C preprocessor can make porting easier.

Use Tox for local testing. It is very useful to be able to run your tests immediately on all Python versions, which will find many problems for you.

Prospect

The Basic code library for unified 2.x and 3.x is now indeed ready to start. A large amount of time in the migration will still be spent trying to resolve the API when attempting to solve the interaction between Unicode and other modules that may have changed its API. In any case, if you're thinking about porting the library, do not touch the version below 2.5, the 3.0-3.2 version, which will not cause too much damage to the version.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.