Cgo and Python

Source: Internet
Author: User
Tags datadog
This is a creation in Article, where the information may have evolved or changed.

If you are on the new Datadog Agent, you might notice most of the codebase are written in Go, although the checks we use T O Gather metrics is still written in Python. This is possible because the Datadog Agent, a regular Go binary, embeds a CPython interpreter so can be called whenever It needs to execute Python code. This process can be made transparent using a abstraction layer so the can still write idiomatic Go code even when th Ere ' s Python running under the hood.

There is a number of reasons why do you might want to embed Python in a Go application:

    • It is useful during a port; Gradually moving portions of a existing Python project to the new language without losing any functionality during the PR Ocess.
    • You can reuse existing Python software or libraries without re-implementing them in the new language.
    • You can dynamically extend your software by loading and executing regular Python scripts, even at runtime.

The list could go on, but for the Datadog Agent the last point was Crucial:we want you to being able to execute custom checks or change existing ones without forcing your to recompile the Agent, or in general, to compile anything.

Embedding CPython is quite easy and well documented. The interpreter itself is written in C and a C API are provided to programmatically perform operations at a very Like creating objects, importing modules, and calling functions.

In this article we ll show some code examples, and we'll focus on keeping the Go code idiomatic while interacting with PYT Hon at the same time, but before we proceed we need to address a small gap:the embedding API was C but our main applicatio N is Go, what can this possibly work?

Introducing CGO

There is a number of good reasons why do you might not want to introduce CGO in your stack, but embedding CPython is one of Those cases where you must. Cgo is not a language nor a compiler. It's a Foreign Function Interface (FFI), a mechanism we can use on Go to invoke functions and services written in a differ ENT language, specifically C.

When we say "CGO" we ' re actually referring to a set of tools, libraries, functions, and types that is used by the Go tool Chain under the hood so we can keep doing to go build get our Go binaries. An absolutely minimal example of a program using the CGO looks like this:

package main// #include <float.h>import "C"import "fmt"func main() {    fmt.Println("Max float value of float is", C.FLT_MAX)}

The comment block right above the import "C" instruction are called a "preamble" and can contain actual C code, in this case an Header inclusion. Once imported, the "C" Pseudo-package lets us "jump" to the foreign code, accessing the FLT_MAX constant. You can build the example by invoking go build , the same as if it is plain Go.

If you want to has a look at all of the work CGO does under the hood, run go build -x . You'll see the "CGO" tool would be invoked to generate some C and go modules, then the C and go compilers would be invoked T o Build the object modules and finally the linker would put everything together.

You can read more about CGO on the Go blog. The article contains more examples and few useful links to get further into details.

Now the We have a idea of the what CGO can does for us, let's see how we can run some Python code using this mechanism.

Embedding Cpython:a Primer

A Go program that, technically speaking, embeds CPython are not as complicated as you might expect. In fact, at the bare minimum, all we had to be initialize the interpreter before running any Python code and finalize It when we ' re done. Please note this we ' re going to use Python 2.x throughout all the examples but everything we'll see can is applied to Pyth On 3.x as well with very little adaptation. Let's look at an example:

package main// #cgo pkg-config: python-2.7// #include <Python.h>import "C"import "fmt"func main() {    C.Py_Initialize()    fmt.Println(C.GoString(C.Py_GetVersion()))    C.Py_Finalize()}

The example above does exactly what's the following Python code would do:

import sysprint(sys.version)

You can see we put a #cgo directive in the preamble; those directives is passed to the Toolchain Uild Workflow. In this case, we tell CGO to invoke "Pkg-config" to gather the flags needed to build and link against a library called "py thon-2.7 "and pass those flags to the C compiler. If you had the CPython development libraries installed in your system along with Pkg-config, this would let you keep usin G A plain to go build compile the example above.

Back to the code, we use and to set up and shut down the interpreter and the Py_Initialize() Py_Finalize() Py_GetVersion C function to retrieve the String containing the version information for the embedded interpreter.

If you ' re wondering, all the cgo bits we need to put together to invoke the C Python API is boilerplate code. The Datadog Agent relies on Go-python for all the embedding operations; The library provides a Go friendly thin wrapper around the C API and hides the CGO details. This is another basic embedding example, the this time using Go-python:

package mainimport (    python "github.com/sbinet/go-python")func main() {    python.Initialize()    python.PyRun_SimpleString("print 'hello, world!'")    python.Finalize()}

This looks closer to regular Go code, no more CGO exposed and we can use Go strings back and forth while accessing the PYT Hon API. Embedding looks powerful and developer friendly. Time to put the interpreter to good Use:let's try to load a Python module from disk.

We don ' t need anything complex on the Python side, the ubiquitous "Hello World" would serve the purpose:

# foo.pydef hello():    """    Print hello world for fun and profit.    """    print "hello, world!"

The Go code is slightly more complex but still readable:

// main.gopackage mainimport "github.com/sbinet/go-python"func main() {    python.Initialize()    defer python.Finalize()    fooModule := python.PyImport_ImportModule("foo")    if fooModule == nil {        panic("Error importing module")    }    helloFunc := fooModule.GetAttrString("hello")    if helloFunc == nil {        panic("Error importing function")    }    // The Python function takes no params but when using the C api    // we're required to send (empty) *args and **kwargs anyways.    helloFunc.Call(python.PyTuple_New(0), python.PyDict_New())}

Once built, we need to set the environment variable to the current working dir so, the PYTHONPATH import statement would be Able to find the foo.py module. From a shell, the command would look like this:

$ go build main.go && PYTHONPATH=. ./mainhello, world!

The dreadful Global interpreter Lock

Have to bring in CGO in order to embed Python is a tradeoff:builds would be slower, the garbage Collector won ' t help us Managing memory used by the foreign system, and cross compilation would be non-trivial. Whether or not these is concerns for a specific project can is debated, but there's something I deem not negotiable:the Go concurrency model. If We couldn ' t run Python from a goroutine, the using Go altogether would make very little sense.

Before playing with concurrency, Python, and CGO, there's something we need to know:it ' s the Global interpreter Lock, Also known as the GIL. The GIL is a mechanism widely adopted in language interpreters (CPython are one of those) preventing more than one thread F Rom running at the same time. This means, no Python program executed by CPython would be a ever able to run in parallel within the same process. Concurrency is still possible and in the end, the lock are a good tradeoff between speed, security, and implementation simp Licity. So what should this pose a problem when is it comes to embedding?

When a regular, non-embedded Python program starts, there's no GIL involved to avoid useless overhead in locking operation S The GIL starts the first time some Python code requests to spawn a thread. For each thread, the interpreter creates a data structure to store information the "the" and locks the GIL. When the thread had finished, the state was restored and the GIL unlocked, ready for be used by other threads.

When we run Python from a Go program, none of the above happens automatically. Without the GIL, multiple Python threads could is created by our Go program. This could cause a race condition leading to fatal runtime errors, and most likely a segmentation fault bringing down the Whole Go application.

The solution to this problem are to explicitly invoke the GIL whenever we run multithreaded code from Go; The code is not a complex because the C API provides all the tools we need. To better expose the problem, we need to does something CPU bounded from Python. Let's add these functions to our foo.py module from the previous example:

# foo.pyimport sysdef print_odds(limit=10):    """    Print odds numbers < limit    """    for i in range(limit):        if i%2:            sys.stderr.write("{}\n".format(i))def print_even(limit=10):    """    Print even numbers < limit    """    for i in range(limit):        if i%2 == 0:            sys.stderr.write("{}\n".format(i))

We'll try to print odd and even numbers concurrently from Go, using both different goroutines (thus involving threads):

Package Mainimport ("Sync" "Github.com/sbinet/go-python") func main () {//The following would also create the GIL explicitly//by calling Pyeval_initthreads (), without waiting//for the interpreter to doing that python. Initialize () var wg sync. Waitgroup WG. ADD (2) Foomodule: = Python. Pyimport_importmodule ("foo") Odds: = Foomodule.getattrstring ("Print_odds") Even: = Foomodule.getattrstring ("Print_ev EN ")//Initialize () has locked the the the GIL, but at this point we don ' t need it//anymore. We Save the current state and release the "Lock//So" Goroutines can acquire it state: = Python. Pyeval_savethread () go func () {_gstate: = python. Pygilstate_ensure () odds. Call (python. Pytuple_new (0), Python. Pydict_new ()) Python. Pygilstate_release (_GSTATE) WG. Done ()} () go func () {_gstate: = python. Pygilstate_ensure () even. Call (python. Pytuple_new (0), Python. Pydict_new ()) Python. Pygilstate_release (_gstATE) WG. Done ()} () WG. Wait ()//At the-know we won ' t need Python anymore in this//program, we can restore the state and lock T    He GIL to perform//the final operations before exiting. Python. Pyeval_restorethread (state) Python. Finalize ()}

While reading the example you might note a pattern, the pattern that would become our mantra to run embedded Python code:

    1. Save the state and lock the GIL.
    2. Do Python.
    3. Restore the state and unlock the GIL.

The code should is straightforward but there's a subtle detail we want to point out:notice that despite seconding the GIL Mantra, in one case we operate the GIL by calling PyEval_SaveThread() PyEval_RestoreThread() and, in another (look inside the goroutines) we do the same With PyGILState_Ensure() and PyGILState_Release() .

We said when multithreading was operated from Python, the interpreter takes care of creating the data structure needed to s Tore the current state, if the same happens from the C API, we ' re responsible for that.

When we initialize the interpreter with Go-python, we ' re operating in a Python context. So was PyEval_InitThreads() called it initializes the data structure and locks the GIL. We can use and to PyEval_SaveThread() PyEval_RestoreThread() operate the already existing state.

Inside the Goroutines, we ' re operating from a Go-context and we need to explicitly create the state and remove it when Don E, which is and does for PyGILState_Ensure() PyGILState_Release() us.

Unleash the Gopher

At this point we know how to deal with multithreading Go code executing Python in a embedded interpreter but after the GI L, another challenge is right around the corner:the Go scheduler.

When a goroutine starts, it's scheduled for execution in one of the threads Available-see here for more GOMAXPROCS details on th E topic. If A goroutine happens to perform a syscall or call C code, the current thread hands over the other goroutines waiting to Run in the thread queue to another thread so they can has better chances to run; The current goroutine are paused, waiting for the syscall or the C function to return. When this happens, the thread tries to resume the paused goroutine, but if it's not possible, it asks the Go runtime to Find another thread to complete the goroutine and goes to sleep. The Goroutine is a finally scheduled to another thread and it finishes.

With this on mind, let's see how can happen to a goroutine running some Python code when a goroutine are moved to a new th READ::

    1. Our goroutine starts, performs a-C call, and pauses. The GIL is locked.
    2. When the C-call returns, the current thread tries to resume the goroutine, but it fails.
    3. The current thread tells the Go runtime to find another thread to resume our goroutine.
    4. The Go Scheduler finds an available thread and the Goroutine are resumed.
    5. The goroutine is almost do and tries to unlock the GIL before returning.
    6. The thread ID stored in the original thread and are different from the ID of the current thread.
    7. panic!

Luckily for us, we can force the Go runtime to always keep us goroutine running on the same thread by calling the Lockost Hread function from the runtime package from within a goroutine:

go func() {    runtime.LockOSThread()    _gstate := python.PyGILState_Ensure()    odds.Call(python.PyTuple_New(0), python.PyDict_New())    python.PyGILState_Release(_gstate)    wg.Done()}()

This would interfere with the scheduler and might introduce some overhead, but it's a price, we ' re willing to pay to AV OID random panics.

Conclusions

In order to embed Python, the Datadog Agent have to accept a few tradeoffs:

    • The overhead introduced by CGO.
    • The task of manually handling the GIL.
    • The limitation of binding goroutines to the same thread during execution.

We ' re happy to accept each of the these for the convenience of running Python checks in Go. But by being conscious of the tradeoffs, we ' re able to minimize their effect. Regarding other limitations introduced-to-support Python, we had few countermeasures to contain potential issues:

    • The build is automated and configurable so, devs has still something very similar to go build .
    • A Lightweight version of the agent can is built stripping out Python support entirely simply using Go build tags.
    • Such A version only relies to core checks hardcoded in the agent itself (System and network checks mostly) it's CGO free And can be the cross compiled.

We ' ll re-evaluate our options in the future and decide whether keeping around cgo are still worth it; We could even reconsider whether Python as a whole are still worth it, waiting for the Go plugin package to be mature Enoug H to the support we use case. But for now the embedded Python was working well and transitioning from the old Agent to the new one couldn ' t be easier.

is a polyglot who loves mixing different programming languages? Learning about the inner workings of languages to make your code more performant? Join us at datadog!

154 Reads
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.