Python status: Why PyPy is the future of Python?

Source: Internet
Author: User
Tags continuum analytics numba

Python is now more than just a glue scripting language. Don't believe me? Take a look at the following success stories using Python:

    • YouTube-written primarily by Python
    • Nasa
    • Industrial Light & Magic Runs-Film company
    • OpenStack
    • Sage-Science Software and others (SciPy, Pythonxy)
    • Web Framework Django, Pyramid, bottle ...
    • Revision control system
    • Other good software

If you are looking for a quick introduction to the beautiful Python language, I recommend my-favorite-python-things

High-level language is the mainstream

Today's high-level languages can write code that is simple and flexible. So it's a good idea to quickly create an application, and you don't have to take the time to handle a variety of data types (all interface sample code is for compilation processing). Maybe some people will argue that this feature will produce a bug code. But Guido van Rossum said, "Who will not be tested for the production code?" Static languages can handle some errors at compile time. However, this does not detect all errors. Finally you have to write the test code. There is a time when you can write test code for dynamic languages. Besides, people can't design a perfect type system. Jim Treavor wrote some summaries about this.

New technology allows us to design an efficient operating environment for dynamic languages (JavaScript V8, Luajit, Racket, Common Lisp ...), which can also compete with large frameworks (JVM,. NET, ...)

All of this makes high-level languages increasingly popular in large enterprises and daily life.

Can python continue to be legendary?

Python is now very popular, and its position is challenged by its competitors. Python has a good ecosystem and large software and community support, but it lacks the efficient and advanced operating environment of its competitors.

Python as the glue language.

As I said at the beginning of a feature, Python is easy to connect to various compile libraries, which is an important reason why it was popular 20 years ago as a glue language. But the tools that are still active are already old, and you have to spend a lot of effort to use them.

    • cTYPES
    • The C extension is evil. They are bound to a specific version of Python and cannot be reused. What's worse, CPython2 and CPython3 's C extension API are different. Think of what it would be like to port a library to Python3.
    • Cython-This is designed to be used to write C extensions. But I'm sure that using the C extension is the last thing you want to do. Cython is an external tool that needs to be compiled. Its final code has no dynamic behavior, but its syntax still needs to be learned. Cython does not support type inference. Using CPython you have to go to compile. Cython is also not a standard. It cannot be executed as an explanatory code. _nuitka_ 's author, Kay Hayen, sums up very well at the static compilation-that is.
    • Swig, boost-these are very easy to do, usually modify the code of C + +, or write some schema files.

In contrast, there are a lot of new tools that can handle these tasks better under the same performance (or even beyond).

    • Cffi-a package that can handle your C library with ease. You often do things like database clients and drivers when you touch hardware or support other software. How easy it is to try to use it in Python. You don't need to write any encapsulation, typed code. And there's CPython and pypy support.
    • BITEy
Use Python as the core of your code-the other side of glue language

The glue language also has another side. Let's think about the underlying high-performance programming process. may seem to be the following process:

    • Idea
    • A lot of complex underlying code and organization code. is likely to be a bunch of obscure generic code (for re-usability).
    • Writing glue language
    • Compile
    • Run
    • Very likely to do a lot of debugging, and then go back to modify, considering that there are so many underlying code.

Thank you for the simplicity of Python, the nature of scripting language, and a lot of tools to use as a template and core for your code. This means that you only need to write the least amount of underlying code and let Python do the rest: Build the organization code and the environment your underlying code needs.

This is just like the idea of Lisp, where code is data and code can be understood by other executing code (the code is processed as data). So the machine understands the code that is executing at runtime and optimizes it to get all the data in the usual way, rather than using a template like C + +. This is not what C + + and other popular programming languages do. Eventually we have a relatively lower level of abstraction, and the runtime information is relatively rich, allowing the compiler to:

    • For unknown hardware (when encoding), including supported data types, and the available optimization methods.
    • Auto-tuning (tuning) (for example, data provided for a library, such as ATLAS ...)
    • Push more information to the compiler and get better reasoning.
    • People don't have to worry about data types (run-time environments can guarantee fast and correct usage of data types)

So the whole process is like this:

    • Idea
    • A little Python code (the best part) to build the entire architecture. Then there are some underlying code, which is also great, because the code doesn't have disgusting templates and context codes. In fact, the underlying code can also be generated from Python code.
    • Run
    • debugging, shorter than the previous step

In terms of performance, such a process has a better future than the previous approach .

These are already used in this way: PyPy, Cffi, Pyopencl, Pycuda, Numba, Theano ...

Think of Python as a high-speed language

There are many ways to write high-speed code in Python. The most popular and still widely disseminated approach is to write the most complex parts of the application in the underlying language, and then use it, which is certainly unfortunate for python.

All the great and efficient tools in Python require a lot of complex C code, which prevents other contributors from coming in. Now we want to write a high-speed and beautiful Python code.

There are many tools to compile Python code into machine code, such as Nuitka, PYTHON2C, Shedskin, Pythran. I think they are all failures, and when you use them, you need to say goodbye to dynamic behavior. They support only a subset of the Python language, and are still a great distance away from full support. I don't even think they can do it in the future. In addition, they do not use the advanced technology and runtime information that makes the JIT (Just-in-time run-time compile execution) solution a great deal better.

Multi-core programming

In this regard, Armins Rigo's article is very good, can refer to: multicore programming in PyPy and CPython

The design of the Interpreter

In order to make the next development easier and achieve the best state of dynamic language, Python needs a suitable architecture. The current CPython architecture is too simple to limit, so it is difficult to do things like the JIT compiler. Here are some of the efforts that have failed to enhance the performance of the CPython interpreter:

    • Psyco (replaced by PyPy)
    • Unladen Swallow
    • Eliminate many of the Gil's failed attempts
    • There are also attempts to fix some of the flaws in CPython: Stackless and Hotpy, but the insistence of Guido (the father of Python, the benevolent dictator) makes these items not merged into Python. (To illustrate, hotpy is not a product-level thing).

The biggest problem with CPython is his C API, which is not well designed. The implementation of other parts is affected by this.

What can we do?
    1. Advancing the use of new tools in the Binder code ( cffi, bitey)
    2. Stop relying on the underlying properties of CPython (C api,c extension) in the public library. As an alternative, use intermediate tools with the following functions:
  • Cffi -simplifying the application of C library
  • Cython -Write a portable C extension. I don't recommend it for normal programming, but it's really better and simpler to maintain the C extension. Cython already has CPython and pypy back end.
Why is pypy a trend?

PyPy provides a better architecture for optimization and further language development. For most of Python's existing problems, PyPy has provided a solution:

    • The advanced runtime and design are described in this article: the Architecture of Open Source applications.
    • Speed-PyPy built-in JIT is great, sometimes (in fact, rarely) even comparable to C.
    • Gil Problem-PyPy introduces a great STM implementation, which is described in Armins Rigo's article.
    • Glue code-using Cffi can simply handle C library, even faster than CPython cTYPES!
    • Asynchronous programming. In this regard, the pypy built-in Greenlet is more appropriate than the CPython C extension. In fact, the non-stack concept (also known as Greenlet) continues to evolve in PyPy (see Https://ep2012.europython.eu/conference/talks/the-story-of-stackless-python)
    • Sandbox technology
    • The app is in the web and on the move. Here are some of Dusty's articles: Pushing Python past the Present

PyPy has supported multiple platforms (x86, 64_x86, ARM)

PyPy also includes an excellent modern architecture, introduced in Jim Huang's speech, and the main points of the speech are:

    • The framework of explanatory language
    • Component combinations for research and products (different data models, garbage collection – these can be changed in specific application scenarios)
    • Built on the functional architecture based on the component chain (translation toolchain). Each step continues/transforms the program model, introduces features, various backend (JVM, JavaScript, LLVM, GCC ir, etc.). Take a look at the example of a translation chain: type inference, such as thePython code, byte code, and Function object --JIT
    • Contains a large number of modern optimization techniques developed at different levels of the architecture (this task can be simplified)

It takes a lot of effort to get all the software to support pypy--you need to do much work on the existing library. With new tools, however, writing software that supports PyPy and CPython is a little easier than using C extensions (as described in what we can do).

CPython Legacy Issues

Now let's take a look at the legacy of the CPython dependency code, which stems from their tightly dependent C extensions . These are mainly science-related software (NumPy, SciPy, etc.). Python has long been used in scientific computing (I think it was 2 years ago) before PyPy became a product-level software, and the software has evolved a lot in terms of tools, code, and communities. Together, the software builds a great platform that is often used as a replacement for software such as MATLAB (some might even consider it a better choice). To achieve this, theC extension is the only solution currently. Now, the development of these software is still tightly bound with CPython, because it takes a lot of work to make the software of science computation all support PyPy. The approximate scenario is to use JIT-on-demand execution to modify specific functions, and then dynamically compile them into machine code and switch to using the C extension . This idea does not need to rewrite all the scientific computing platforms, and the speed is equally fast. The typical project in this way is Numba, sponsored by Continuum Analytics, a strong scientific computing platform based on the Python library. Numba in this way because its quick scripts need to be compatible with other scientific computing code that relies on CPython. Numba worth Learning, the Numba speech of SCIPY conference is a very good exposition.

I have to say that Python's scientific computing community is great. They are very focused on quality, ease of use and promotion of their products (many meetings have been organized for this purpose: SCIPY conference, Pydata, etc.). Thank them for making Python the first choice for a free scientific analysis platform. There is also a need to mention Travis Oliphant, who has put a lot of effort into the community to align the platform. Here's a look at this blog post: Why Python is the programming language you finally have to learn

Where's pypy?

I hope PyPy did not reach the product level at that time.

Subsequent

There is an interesting discussion about the follow-up of this article on Reddit. This is a discussion on the pros and cons of applying pypy commercially. The discussion summarizes how to use pypy based on high-performance libraries. The most important thing is to use the PyPy software stack (raw python, cffi, etc.) to make maintenance and optimization simple (for example, lazy computing). As for the shortcomings, the above mentioned, mainly related to the legacy of CPython.

English Original: the Python condition. Why PyPy are the future of Python

Python status: Why PyPy is the future of Python?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.