Python Cookbook (3rd edition) Chinese version: 15.13 pass a null-terminated string to the C function library

Source: Internet
Author: User

15.13 pass a null-terminated string to the C function library?

To write an extension module, you need to pass a null-terminated string to the C function library.
However, you are not quite sure how to use Python's Unicode string to implement it.

Solution?

Many C libraries contain strings that manipulate null endings and are declared to be of type char * .
Consider the following C function, which we use for demonstration and testing purposes:

 void print_chars (char *< span class= "n" >s) {while  (*s) {printf (%2x " (unsigned char) *s); s++} printf ( "\n ");                /span>                

This function prints the hexadecimal representation of each character that is passed in the string, so it can be easily debugged. For example:

Print_chars("Hello");   Outputs666F       

You have several options for calling such C functions in Python.
First, you can PyArg_ParseTuple() restrict it to only manipulating bytes by invoking and specifying a "Y" conversion code, as follows:

Static Pyobject *py_print_chars (Pyobject *self, Pyobject *args) {  char *s;  if (! Pyarg_parsetuple (args, "Y", &s)) {    return NULL;  }  Print_chars (s);  Py_return_none;}

The result function is used as follows. Take a closer look at the string with the null byte embedded and how Unicode support is rejected:

>>>Print_chars(B' Hello World ')6c 6c 6f 64 6f + 6c>>> print_chars (b " Hello\x00world ' ) traceback (most recent call Last): File 1, in <module> typeerror: must be bytes without null bytes, not bytes>>> span class= "n" >print_chars ( ' Hello World ' )  Traceback (most recent): File  "<stdin>", line 1, in  <module>typeerror:  ' STR ' does not support the buffer interface >>>              

If you want to pass a Unicode string, PyArg_ParseTuple() use the "s" format code in, as follows:

Static Pyobject *py_print_chars (Pyobject *self, Pyobject *args) {  char *s;  if (! Pyarg_parsetuple (args, "s", &s)) {    return NULL;  }  Print_chars (s);  Py_return_none;}

When used, it automatically converts all strings to a null-terminated UTF-8 encoding. For example:

>>>Print_chars(' Hello World ')6c 6c 6f 64 6f + 6c>>>Print_chars(' Spicy Jalape\u00f1O)# NOTE:UTF-8 Encoding4a 6c (C3 B1 6f)>>> print_chars ( "Hello" \x00world '  traceback (most recent call last): File  "<stdin>", line 1, <module>must be str without null characters, not str>>>  Print_chars (b ' Hello World ' )  Traceback (most recent): File  "<stdin>", line 1, in  <module>typeerror: must be str, not bytes>>>  

If for some reason you want to use it directly PyObject * and cannot use it PyArg_ParseTuple() ,
The following example shows you how to check and extract an appropriate reference from a byte and a string object char * :

/* Some Python Object (obtained somehow) */pyobject *obj;/* Conversion from bytes */{   char *s;   s = pybytes_asstring (o);   if (!s) {      return NULL;   /* TypeError already raised   *   /} print_chars (s);} /* Conversion to UTF-8 bytes from a string */{   pyobject *bytes;   char *s;   if (! Pyunicode_check (obj)) {       pyerr_setstring (Pyexc_typeerror, "expected string");       return NULL;   }   bytes = pyunicode_asutf8string (obj);   s = pybytes_asstring (bytes);   Print_chars (s);   Py_decref (bytes);}

The previous two conversions ensure that the data is null-terminated,
However, they do not check whether null bytes are embedded in the middle of the string.
So if this is important, then you need to check it yourself.

Discuss?

If possible, you should avoid writing strings that depend on null endings, because Python doesn't have to.
It is best to use a combination of pointers and length values to handle strings.
However, there are times when you have to deal with the C language legacy code when you have no choice.

Although it is easy to use, one of the issues that can easily be overlooked is thePyArg_ParseTuple()
The use of the "s" format code will have memory loss.
But when you need to use this conversion, a UTF-8 string is created and permanently appended to the original string object.
If the original string contains non-ASCII characters, it causes the size of the string to increase until it is garbage collected. For example:

>>>Importsys>>> s =  Spicy Jalape\u00f1o ' >>> sys getsizeof (s) 87 >>> print_chars (s) Span class= "C1" ># passing String53 (4a) 6c, C3 B1 6f>>> sys. Getsizeof (s) # Notice increased size 103>>>            

If you care about the memory loss, you'd better rewrite your C extension code and let it use the PyUnicode_AsUTF8String() function. As follows:

Static Pyobject *py_print_chars (Pyobject *self, Pyobject *args) {  pyobject *o, *bytes;  char *s;  if (! Pyarg_parsetuple (args, "U", &o)) {    return NULL;  }  bytes = pyunicode_asutf8string (o);  s = pybytes_asstring (bytes);  Print_chars (s);  Py_decref (bytes);  Py_return_none;}

With this modification, a UTF-8 encoded string is created as needed and then discarded after use. The following is the revised effect:

 >>> import sys>>> span class= "n" >s =  ' spicy jalape\u00f1o ' >>> sys. Getsizeof (s) 87>>> span class= "n" >print_chars (s) 53 4a 61 6c C3 B1 6f>>> sys. Getsizeof (s) 87>>>   

If you try to pass a null-terminated string to a ctypes-wrapped function,
Note that ctypes can only allow bytes to be passed, and it does not check for intermediate embedded null bytes. For example:

>>>ImportcTYPES>>>Lib=cTYPES.Cdll.LoadLibrary("./libsample.so")>>>Print_chars=Lib.Print_chars>>>Print_chars.Argtypes=(cTYPES.C_char_p,)>>> print_chars (b " Hello World ' ) 48 6c 6c 6f (6f) 6c 64>>> print_chars (b ' hello\x00 World '  48 6c 6c 6f>>> print_chars< Span class= "P" > ( ' Hello World ' ) traceback (most recent call last): File  "<stdin>", line 1, <module>argument 1: <class ' TypeError ';: Wrong Type>>>   

If you want to pass a string instead of a byte, you need to perform a manual UTF-8 encoding first. For example:

Print_chars(' Hello World ').  Encode(' utf-8 '))6c 6c 6f, 6f 6c>>>
       

For other extension tools (such as Swig, Cython),
When you use them to pass strings to C code, you need to learn the right things first.

Albert (http://www.aibbt.com/) The first artificial intelligence portal in China

Python Cookbook (3rd edition) Chinese version: 15.13 pass a null-terminated string to the C function library

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.