Use a simple example to spy on the operating mechanism of the CPython kernel _python

Source: Internet
Author: User
Tags goto

I recently spent some time exploring CPython, and I want to share some of my adventures here. Allison Kaptur's excellent guide to getting started with Python internals is a bit long-winded, and I'd like to step through my own exploration process to be more organized, This may be followed by other curious Python users.

1. And noticed some strange things.

In the beginning, I just set up nose to test some of the Python 3 code I wrote. When I ran these tests, I got an incredible error message: "Typeerror:bad argument type for built-in operation", which I haven't seen in this program before.

The ultimate cause of this error is obvious-I accidentally left a PDB breakpoint (' Import pdb; Pdb.set_trace () ') in the program. When I removed it, the test worked.

However, I used nose to test on Python 2 repos, and in that case, the breakpoint left by the error did not cause the nose to crash, but it looked like "hang." The program is not really hanging up-it's just not showing things to stdout (standard output). Nose is doing this on purpose, and it makes sense when I'm running a set of tests. I might just want to see the results of the test, not the state of a bunch of programs printing themselves. If you hit "C" in this script, nose only passes this breakpoint as usual.

Normally, I might just shrug my shoulders, remove this breakpoint, and go on with my work. But! I was at a hacker school and had time to delve into anything that caught my interest, so I decided to take this opportunity to spy on the Python kernel.

2. Make one of the simplest test examples.

As a result, the problem is a little complicated-I'm not sure if it's nose, or PDB or CPython own code. And, of course, I can't use any breakpoints because these breakpoints can cause my program to crash.

Finally, after validating some assumptions, it appears that the PDB's call to ' input () ' caused a crash. So: in Python2 and Python3, is there any difference in the implementation of input? Or is it something different?

I was debugging with Jesse, and finally we realized that nose handled the standard output in an interesting way:

Self._buf = Stringio ()
sys.stdout = Self._buf

Here you use Sys.stout to represent all the standard output in Python, meaning that all content that is output to the terminal is sent here. But since we can access sys.stout like other Python variables, we can change this sys.stout. Instead, nose sets the Sys.stoud to Stringio (), which is just any string.

If you do this, the print function will not work!

Import sys, io
sys.stdout = io. Stringio ()
print ("Hello")
# Oh No, nothing printed!

We wondered if that line was the problem, so we built a simple test sample:

Import sys, io
sys.stdout = io. Stringio ()
print ("hello!") # Nothing would appear
input ("Input:") # raises a TypeError

Running this in Python 3 will appear the "bad argument for built-in operation" we've seen before. So now we know where to look! When you try to change the sys.stdout, the built-in function ' input () ' is interrupted in a strange way.

3. Learn a little cpython!

So we want to see how the ' input ' is implemented. Python has a very cool module called ' Inspect ' that allows you to check the source code, like this:

>>> from collections import namedtuple
>>> import Inspect print (Inspect.getsource (namedtuple))
def namedtuple (TypeName, Field_names, Verbose=false, Rename=false): "" "
Returns a new subclass of tuple with name D fields.
.....

However, when you want to call ' inspect.getsource ' for ' input ', the result will be: "Typeerror:is not a module, class, method, function, Traceback, frame, or CO De object. " This means that our function is not implemented in Python-it is implemented in C, so ' inspect module cannot display its code for us.

...... However, using the magic of the Cinspect module, we can view the C source code!

>>> import Cinspect; Print (Cinspect.getsource (input))
static Pyobject *
builtin_input (pyobject *self, Pyobject *args)
{
Pyobject *line;
char *str;
.....

Well, now we know that the function we're looking for is called ' Builtin_input '. At this point, we're going to start browsing the C code, not just the Python code, we're going to debug in the midrange instead of in the Python interpreter. You don't need to be a C-language expert to see what comes next-most of the time I'm going to speculate on the function name.

So, let's retrieve the source code for CPython, and then we'll find that ' builtin_input ' is a ' builtin_input_impl ' package, and ' Builtin_input_impl ' is a bltinmodule. C to implement a method. Let's try to load Python into the Lldb C language debugger and set a breakpoint at the beginning of that method:

flowerhack$ lldb--/users/flowerhack/cpython/python.exe
flowerhack$ Breakpoint set--file bltinmodule.c--line 2337

When you step through the source code (the process is like what you do in the PDB--you keep Knocking "n" to run the next line of code), we find the point where the problem first occurs:

Stdout_encoding_str = _pyunicode_asstring (stdout_encoding);
Stdout_errors_str = _pyunicode_asstring (stdout_errors);
if (!stdout_encoding_str | |!stdout_errors_str)
goto _readline_errors;//"Throws" an exception

The third line misled me: "If the encoded string is empty or the error string is empty, then we get an error." But wait a minute, isn't an empty error string not meant to be found without errors?

Because of this, I looked further at the definition of _pyunicode_asstring (another C function):

#define _pyunicode_asstring Pyunicode_asutf8

That's just a macro: "Hey, when we call _pyunicode_asstring, we're going to call Pyunicode_asutf8." "So what we really want to look for is the definition of Pyunicode_asutf8:

char*
Pyunicode_asutf8 (pyobject *unicode)
{return
pyunicode_asutf8andsize (Unicode, NULL);
}

...... It seems that all the things that this function does is call pyunicode_asutf8andsize, and that's exactly what we want to read.

There are several error cases in the Pyunicode_asutf8andsize function, each of which returns NULL. It is strange for me to return NULL in the error case instead of returning the error code like-1. Maybe there are other conventions that I don't know about?

Anyway, in order to show me exactly which error situation I was in, I did "Print debugging"-I added a print statement after every possible error condition, and then ran the program-so we could find out what was wrong when we called Pyunicode_check.

So is there a check in the Python3 that hasn't been done in Python2? Well, we can compare the two versions of the source code to find the answer. Finally, Python 2 's source code did not perform a similar code check, but Python 3 did. So, if Sys.stdout is replaced with something that is incorrectly encoded, it will fail to run in 3, not in 2.

4. Harvest!

It seems to be a lot of work to find out just the reason behind a very common fixed bug. And maybe it does. But! We learned something cool in the process. While I was validating some assumptions, I found a lot of ways Python handles standard input and output. I learned more about how to read a large, many macro C project experience. I was surprised to learn that the Goto statement was still in use. But it makes sense to do so in a coherent way--it seems like it would be tedious to do something like abnormal handling in C without a goto. and browsing the bltinmodule.c input functions in Python2 and Python3 is really a cool thing--strictly speaking, checking. They re refactoring and cleaning things look neat.

Disclaimer: Setting Cinspect is a bit complicated. The introduction to this project's readme will help, but it will take a lot of time to pay attention to the "indexing your sources" step.

If you're used to using GDB before, you just need to know that Lldb is very similar to it. If you haven't used two of them, they're a bit like a PDB on debugging.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.