Google's Python engineer released a new project to accelerate python by at least five times. The name of the new project is unladen swallow, which is intended to find a new Python explanation. Program Virtual Machine, the new JIT compilation engine. The goal in the first quarter is to achieve a performance improvement of 25-35%, which has been completed, Code Published on the Google Code website. For more information, see Target We want to make Python faster. At the same time, we also hope to make the unladen swallow project available for large and well-existing applications without suffering. 1. Create a New Python version that is at least five times faster than cpython. 2. The performance of Python applications should be very stable. 3. MaintainSource codeLevel of compatibility. 4. maintain compatibility with the cpython extension module at the source code level. 5. We do not want to maintain a long-term Python implementation. We regard this project as a development branch rather than a version Branch (fork ). Project Overview To achieve our goal of performance and compatibility, we chose to modify cpython instead of developing this implementation from scratch. It is worth noting that we choose to start with cpython 2.6.1: Python 2.6 and 2.4/2.5 (currently used for most valuable applications) and Python 3.0 (the ultimate version in the future) can coexist. Starting with a cpython version, we can avoid re-implementing a large number of built-in functions, objects, and standard library modules. At the same time, we can reuse some existing and commonly used cpython C Language extension APIs. Migrating existing applications from a 2.x cpthon makes it easier for us to migrate existing applications. starting from X and requiring large application maintainers to migrate their programs first is impractical for our project audience. Our main task is to focus on improving the execution speed of Python code, rather than making too much effort on the python Runtime Library. Our long-term plan is to use a JIT created on the basis of llvm to replace the traditional cpython virtual machine, while minimizing the impact on other parts of the python running mode. Through observation, we found that python applications spend a lot of running time in the main eval loop. In particular, minor adjustments to virtual machine components such as opcode Scheduling (opcode dispatch) can also have a major impact on the Running Performance of Python. We believe that compiling Python code into machine code through the JIT engine of llvm will bring more benefits. Some notable benefits: * Turning to JIT allows us to convert python from a stack-based machine to a register-based machine ), practice has proved that this change improves the performance of another similar language. * Not to mention anything else, simply eliminating the need to send and receive operation codes (Opcodes) is a victory. For more information, see http://bugs.python.org/issue4753. * The current cpython virtual machine operation code acceptance/sending restrictions make further performance optimization almost impossible. For example, we want to implement type feedback and dynamic recompilation Ala self-93 ), however, we believe that using the binary code compiled by cpython to implement the multi-state inline high-speed cache (polymorphic inline caches) will be unacceptable. * Llvm is particularly worth noting. This is because it is easy to use to generate code functions (codegen) for multiple platforms, and it has the ability to compile C and C ++ into the same intermediate code-this is exactly what we want to bring to Python. It makes inlining and analysis possible to eliminate the obstacle between the current Python and C. With the framework for generating machine code, we can compile python into a more efficient implementation. Take the following code as an example: For I in range (3 ): Foo (I) At present, it will be translated as this inefficient $ X = range (3) While true: Try: I = $ X. Next () Optional t stopiteration: Break Foo (I) Once we have a way to know that range () represents the range () built-in function, we can change it to something like this. For (I = 0; I <3; I ++) Foo (I) In C language, the unboxed data type can be used for mathematical operations. Foo (0) Foo (1) Foo (2) We intentionally designed the unladen swallow internal structure to support multiple kernels. Servers will only have more and more kernels in the future. We need to explore this point so that more work can be done in the parallel structure. For example, we can use a kernel as a parallel optimizer, which can perform increasingly expensive (important) code optimization during code execution, and use another kernel to execute the Code itself. We are also considering implementing a parallel GC and using another kernel to release the memory module. Since most industrial servers have 4 to 32 cores, we believe the benefits of this optimization are a potential fortune. However, we still need to pay attention to the needs of highly parallel application procedures, rather than blindly consuming these kernels. Emphasize that many fields have been considered or implemented by some other dynamic languages, such as jruby, rubinius, and parrot, including Jython, other Python implementations such as pypy and ironpython. We are looking for debugging information, regular expression performance, and other ideas to improve dynamic language performance from these other implementations. This is a path that has been taken by many people. We need to try to avoid the dilemma of re-inventing the wheel. Plan Blueprint Unladen swallow will release a new version every three months and fix the Bug During the release. 2009 phase 1 (Q1) Q1 is mainly used to make minor changes to the cpython Implementation of the video memory. Our goal is to achieve 25-35% performance improvement on the current base line. The goal of this phase is to be relatively conservative. We want to give the client applications visible performance optimizations as quickly as possible, rather than waiting for them to wait until the entire project is completed. 2009 stage 2 (Q2) Q2 will focus on abolishing Python virtual machines and replacing them with llvm-based implementations with the same features. We expect some performance improvement, but it is not the main task of 2009q2. We mainly want to get something that can run on llvm. It is a person after this stage. 2009 stage 3 (Q3) and future Tasks starting from Q3 will be "simple" to do these jobs well. We do not want to do original work, but try to use the research results of the past 30 years as much as possible. Go to the relevant papers to view part of the list of papers we intend to implement (far less than all ). We plan to emphasize the considerations of the regular engine's other expansion modules identified as performance bottlenecks. However, regular expressions have been identified as a good goal and will be the first field for optimization. In addition, we intend to remove the Gil and multithreading status of Python. We believe that by implementing a more advanced GC, this can be achieved, similar to IBM recycler. Our long-term goal is to make Python faster and replace those types implemented using C for speed with Python again. 2009q3 accurate performance optimization goals will be determined during Q2. Http://code.google.com/p/unladen-swallow/wiki/Proj-ectplan Http://danmarner.yo2.cn/unladen-swallow-project-pl Original: Google Translator: danmarner You are welcome to reprint the original/translated link. ====================================== Unladen swallow Project Plan-optimized Python plan Note: For links to all referenced materials, see related papers. Source: http://danmarner.yo2.cn/ |