Experience in Porting a project from Python2.x to Python3.x, python2.xpython3.x
After the pain of porting jinja2 to python3, I put the project temporarily, because I am afraid of breaking the compatibility of python3. My approach is to use only one Python 2 code library, and then translate it into Python 3 using the 2to3 tool during installation. Unfortunately, even a few changes will break iterative development. If you have selected the python version, you can concentrate on your work and are lucky to avoid this problem.
Thomas Waldmann from the MoinMoin project runs jinja2 through my python-modernize and unifies the code library to run python 2, 6, 2, 7, and 3.3 at the same time. We only need a small amount of cleanup, and our code is very clear and can run on all python versions, and it looks like there is no difference with normal python code.
Inspired by him, I read the code over and over again and began to merge other codes to enjoy the pleasure of a unified code library.
Below I will share some tips to achieve a similar experience.
Discard python 2.5 3.1 and 3.2
This is the most important point. It is easier to give up 2.5 because no one is using it now. It is no big problem to give up 3.1 and 3.2. It should be really pitiful for the current python3 users. But why did you give up these versions? The answer is that 2.6 and 3.3 have many overlapping syntaxes and features, and the code can be compatible with these two versions.
- String compatibility. 2.6 and 3.3 support the same string syntax. You can use "foo" to represent a native string (2. x indicates byte, 3. x indicates unicode), u "foo" indicates unicode strings, and B "foo" indicates native strings or byte arrays.
- Print functions are compatible. If you have fewer print statements, you can add "from _ future _ import print_function" and start using the print function, instead of binding it to other variables, you can avoid the strange troubles.
- Compatible exception syntax. The "failed t Exception as e" syntax introduced by Python 2.6 is also the Exception capture Syntax of 3.x.
- All class modifiers are valid. This can be used to modify the interface without leaving any trace in the class structure definition. For example, you can modify the iteration method name, that is, change next to _ next _ or change _ str _ to _ unicode _ to be compatible with python 2.x.
- Built-in next call _ next _ or next. This is useful because they are similar to calling methods directly, so you don't have to worry too much about adding runtime checks and packaging a function.
- Python 2.6 has the same bytearray type as the python 3.3 interface. This is also useful because 2.6 does not have a byteobject type of 3.3. Although there is a built-in name, it is only an alias of str, and the usage habits are quite different.
- Python 3.3 has added encoding and decoding from byte to byte and string to string, which has been removed from 3.1 and 3.2. Unfortunately, their interfaces are complicated, no alias, but at least two. version x is closer.
The last point is useful in stream encoding and decoding. This function is removed from 3.0 until 3.3 is restored.
That's right. The six module can let you go a little farther, but do not underestimate the meaning of code consortium. During the Python3 porting process, I almost lost interest in jinja2 because the Code began to abuse me. Even if the code library can be integrated, it still looks uncomfortable and affects the visual (six. B ('foo') and six. u ('foo') flying everywhere) will also cause unnecessary troubles due to 2to3 iterative development. You don't have to deal with these troubles. Let's get back to the pleasure of coding. Jinja2's current code is very clear, and you don't have to worry about the compatibility of python2 and 3, but some places still use the statement: if PY2 :.
Let's assume that these are the python versions you want to support and try to support python2.5. This is a pain point. I strongly recommend that you give up. There is another possibility of 3.2 support. If you can wrap all strings when calling a function, I do not recommend this because of aesthetics and performance.
Skip six
Six is a good thing and jinja2 is also used at the beginning, but it is not powerful at the end, because it is required to be transplanted to python3, but some features are still lost. You do need six. If you want to support python2.5 at the same time, but there is no need to use six since 2.6, jinja2 has developed a compatible module that includes assistants. Including few non-python3 code, less than 80 lines of the entire compatible module.
Because of other libraries or project dependent libraries, users hope that you can support different versions. This is indeed a lot of trouble for six.
Start using Modernize
Using python-modernize to port python is a good response header. It generates code when running like 2to3. Of course, he still has many bugs, and the default options are not very reasonable. You can avoid some annoying things, but you can go further. But you also need to check the results and remove some import statements and discord.
Repair Test
Run the test before doing other things to ensure that the test still passes. There are a lot of problems with the standard libraries of python3.0 and 3.1, which is caused by changes in test habits.
Write a compatible Module
So you will plan to skip six. Can you completely remove the help document? Of course, the answer is no. You still need a small compatible module, but it is small enough to put it in your package. Below is a basic example, what does a compatible module look like:
import sysPY2 = sys.version_info[0] == 2if not PY2: text_type = str string_types = (str,) unichr = chrelse: text_type = unicode string_types = (str, unicode) unichr = unichr
The exact content of that module depends on how many actual changes you have made. In Jinja2, I put a bunch of functions here. It includes ifilter, imap, and functions similar to itertools, all of which are built in 3.x. (I am entangled in Python 2. x functions to make the code clearer, so that the iterator behavior is built-in rather than defective ).
Test for version 2.x instead of 3.x
In general, check whether the python version you are using is 2.x or 3.x. In this case, we recommend that you check whether the current version is python2 and put python3 in another branch. In this way, when python4 is available, the "surprise" you receive will have less impact on you.
Okay:
if PY2: def __str__(self): return self.__unicode__().encode('utf-8')
Relatively unsatisfactory handling:
if not PY3: def __str__(self): return self.__unicode__().encode('utf-8')
String processing
The biggest change in Python 3 is undoubtedly a change to the Unicode interface. Unfortunately, these changes are very painful in some places, and they have encountered inconsistencies throughout the standard library. Most transplantation of time functions related to string processing will be completely abolished. The string processing topic can be written into a complete document, but here is a concise note for porting Jinja2 and Werkzeug:
'Foo' is a string of the local machine. This string can be used in identifiers, source code, file names, and other underlying functions. In addition, in 2.x, as long as the character string can only use ASCII characters, it can be used as a Unicode String constant.
This attribute is very useful for the basis of unified encoding, because in the normal direction of Python 3, Unicode is introduced to some interfaces that previously do not support Unicode, but this is not the case in turn. Because the String constant "Upgrade" is Unicode, and 2. x still supports Unicode to some extent, how can this string constant be used.
For example, the datetime. strftime function strictly does not support Unicode in Python2, and only supports Unicode in 3.x. However, in most cases, the return value on 2. x is only ASCII encoded, so functions like this do run well on 2. x and 3. x.
>>> u'<p>Current time: %s' % datetime.datetime.utcnow().strftime('%H:%M') u'<p>Current time: 23:52'
The string passed to strftime is a local string (bytes in 2.x and Unicode in 3.0 ). The return value is also a local string and is only an ASCII character. Therefore, once the string is formatted on 2. x and 3. x, the result must be a Unicode string.
U'foo' indicates a Unicode string. many libraries of x already support Unicode very well, so such string constants should not be strange to many people.
B 'foo' is a string stored in bytes. Since 2.6 does not have any Byte object similar to Python 3.3, and Python 3.3 lacks a real byte string, the availability of such constants is indeed limited. When bound to a byte array object with the same interface on 2.x and 3.x, it becomes more available immediately.
Because such a string can be changed, the change to the original byte is very effective. Then you can encapsulate the final result by using inbytes () again to convert the result to a more readable string.
In addition to these basic rules, I also added variables such as text_type, unichr, and string_types to the above compatible modules. Through these great changes:
- Isinstance (x, basestring) to isinstance (x, string_types ).
- Isinstance (x, unicode) is changed to isinstance (x, text_type ).
- Isinstance (x, str) indicates the intention to capture bytes. Now it is changed to isinstance (x, bytes) or isinstance (x, (bytes, bytearray )).
I also created an implements_to_string decoration class to help implement a class with _ unicode _ or _ str:
if PY2: def implements_to_string(cls): cls.__unicode__ = cls.__str__ cls.__str__ = lambda x: x.__unicode__().encode('utf-8') return clselse: implements_to_string = lambda x: x
This idea is that you only need to press 2. x and 3. the _ str __method of x returns the Unicode string (yes, in 2. x looks a little strange), decoration class in 2. x will automatically rename it to _ unicode __, and then add a new _ str _ to call _ unicode _ and encode the return value with a UTF-8 before returning. In the past, this pattern has been quite common in 2. x modules. Such as Jinja2 and Django.
The following is an example of this usage:
@implements_to_stringclass User(object): def __init__(self, username): self.username = username def __str__(self): return self.username
Changes to the meta-class syntax
Because Python 3 has changed the syntax for defining metadatabase and calls the metadatabase in an incompatible way, this makes porting a little harder than it is when it is not changed. Six has a with_metaclass function to solve this problem, but it generates a virtual class in the inheritance tree. For Jinjia2 porting, this solution made me very uncomfortable and I slightly modified it. In this way, the external API is the same, but this method uses the temporary class to connect with the Meta class. The advantage is that you don't have to worry about performance impact when using it and keep your inheritance tree perfect.
This code is a little difficult to understand. The basic idea is to use this idea: the Meta class can be customized to create a class and can be selected by its parent class. This special solution is to use the metadata class to delete its parent class from the inheritance tree during the subclass creation process. The final result is that this function creates a virtual class with a virtual Meta class. Once the virtual subclass is created, you can use the virtual meta-class, and the virtual meta-class must have a constructor to create a new class from the original parent class and the existing meta-class. In this way, classes that are both virtual classes and virtual meta classes will never appear.
The solution looks as follows:
Def with_metaclass (meta, * bases): class metaclass (meta): _ call _ = type. _ call _ init _ = type. _ init _ def _ new _ (cls, name, this_bases, d): if this_bases is None: return type. _ new _ (cls, name, (), d) return meta (name, bases, d) return metaclass ('temporary _ class', None ,{}) the following is how you use it: class BaseForm (object): pass class FormType (type): pass class Form (with_metaclass (FormType, BaseForm): pass
Dictionary
One of the more annoying changes in Python 3 is the change to the dictionary iteration protocol. All dictionaries in Python2 have the keys (), values (), and items () of the returned list, And iterkeys (), itervalues (), and iteritems () of the returned iterator (). In Python3, none of the above methods exists. Instead, these methods are replaced by a new method that returns a view object.
Keys () returns the key view. Its behavior is similar to a read-only set. values () returns a read-only container and can be iterated (not an iterator !), Items () returns a read-only class set object. However, unlike a common set, it can also point to objects that are easy to change. In this case, some methods may encounter failures at runtime.
On the positive side, because many people do not understand that a view is not an iterator, you only need to ignore this in many cases.
Werkzeug and Dijango implement a large number of custom dictionary objects. In both cases, the decision is to ignore the existence of the view object and then let keys () and its friends returned iterator.
Due to restrictions of the Python interpreter, this is the only reasonable thing that can be done now. But there are several problems:
- The fact that the view itself is not an iterator means that you usually have no reason to create a temporary object.
- The class set behavior of the built-in dictionary view cannot be copied in pure Python due to the limitation of the interpreter.
- The implementation of the 3.x view and the implementation of the 2.x iterator mean that there are a lot of repeated code.
The following figure shows how the Jinja2 encoding library iterates the dictionary:
if PY2: iterkeys = lambda d: d.iterkeys() itervalues = lambda d: d.itervalues() iteritems = lambda d: d.iteritems()else: iterkeys = lambda d: iter(d.keys()) itervalues = lambda d: iter(d.values()) iteritems = lambda d: iter(d.items())
To implement dictionary similar to objects, class modifiers become a feasible method again:
if PY2: def implements_dict_iteration(cls): cls.iterkeys = cls.keys cls.itervalues = cls.values cls.iteritems = cls.items cls.keys = lambda x: list(x.iterkeys()) cls.values = lambda x: list(x.itervalues()) cls.items = lambda x: list(x.iteritems()) return clselse: implements_dict_iteration = lambda x: x
In this case, all you need to do is implement the keys () and youyuan methods as iterators, and then the rest will be automatically performed:
@implements_dict_iterationclass MyDict(object): ... def keys(self): for key, value in iteritems(self): yield key def values(self): for key, value in iteritems(self): yield value def items(self): ...
Changes to the general iterator
Since the iterator is changed in general, a little bit of help is needed to make the change painless. The only change is the conversion from next () to _ next. Fortunately, this change has been transparent. The only thing you really need to change is the change from x. next () to next (x), and the rest is done by language.
If you plan to define the iterator, the class modifier becomes a feasible method again:
If PY2: def implements_iterator (cls): cls. next = cls. _ next _ del cls. _ next _ return clselse: implements_iterator = lambda x: x to implement such a class, you only need to define the iteration step method _ next _ in all versions: @ implements_iteratorclass UppercasingIterator (object): def _ init _ (self, iterable): self. _ iter = iter (iterable) def _ iter _ (self): return self def _ next _ (self): return next (self. _ iter ). upper ()
Conversion codecs
One of the excellent features of the Python 2 encoding protocol is that it does not depend on types. If you want to convert a csv file to a numpy array, you can register an encoder like this. However, since the main public interfaces of the encoder are closely related to string objects, this feature is no longer known to anyone. Many such functions have been deleted because the conversion codecs in 3.x have become stricter. However, due to the proof that the conversion codec is useful, it was re-introduced in 3.3. Basically, all Unicode to byte conversion or the opposite conversion codecs are unavailable before 3.3. Hex and base64 encoding are among those with these encodings.
The following are two examples of using these encoders: one is string operations and the other is stream-based operations. The former is the well-known str. encode () in 2.x. However, if you want to support both 2.x and 3.x, the changes to the string API seem a bit different now:
>>> import codecs>>> codecs.encode(b'Hey!', 'base64_codec')'SGV5IQ==\n'
Similarly, you will notice that in 3.3, the encoder does not understand the alias and requires you to write the encoding alias as "base64_codec" instead of "base64 ".
(We give priority to these codecs rather than functions in the binascii module, because by adding encoding and decoding to these encoders, we can support the added stream-based encoding operations .)
Other considerations
There are still a few areas where I don't have a good solution, or it is often annoying to handle these areas, but there will be fewer and fewer such areas. Unfortunately, some of these places are now part of the Python 3 API and are hard to find until you trigger an edge situation.
Processing file system and file IO access on Linux is still annoying because it is not Unicode-based. The Open () function and the file system layer both have the default options specified by the dangerous platform. For example, if I SSH from a de_AT machine to an en_US machine, Python would like to roll back to ASCII code for file system and file operations.
I have noticed that on Python3, text operations are most reliable and 2.x works normally by simply opening the file in binary mode and then decoding it explicitly. In addition, you can also use the codec. open or io. open functions on 2.x, as well as the built-in Open functions with encoding parameters on Python 3.
The URL in the standard library cannot be correctly represented in Unicode, which makes some URLs not correctly processed in 3.x.
Because the syntax is changed, auxiliary functions are required to trace exceptions generated by objects. This is usually rare and easy to handle. The following is one of the situations where the syntax is changed. In this case, you will have to move the code to the exec block.
if PY2: exec('def reraise(tp, value, tb):\n raise tp, value, tb') else: def reraise(tp, value, tb): raise value.with_traceback(tb)
If some of your code depends on different syntaxes, the previous exec technique is usually very useful. However, because exec has different syntaxes, you cannot use it to perform any operations on namespaces. The following code segment does not have a major problem, because the eval that uses compile as an embedded function can run on two versions. In addition, you can start an exec function through exec itself.
exec_ = lambda s, *a: eval(compile(s, '<string>', 'exec'), *a)
If you write a C Module on Python c api, commit suicide. From the moment I knew it, the fairy still had no tools to handle the problem, and many things had changed. Take this opportunity to discard the method used to construct the module, and then re-write the module on cffi or ctypes. If this method does not work, it is because you are stubborn, so you have to accept such pain. Maybe trying to write some annoying things on the C Preprocessor can make porting easier.
Use Tox for local testing. It is very helpful to run your test on all Python versions immediately, which will find many problems for you.
Outlook
It is now possible to unify the Basic Encoding libraries of 2.x and 3.x. A lot of time for porting will still be spent trying to solve Unicode and how the API operates when interacting with other modules that may have changed their own APIs. In any case, if you are planning to port the library, do not touch versions earlier than 2.5 and 3.0-3.2. This will not cause too much damage to the version.