Print various Chinese dict list positions and Chinese dictlist

Source: Internet
Author: User

Print various Chinese dict list positions and Chinese dictlist

 

During the development process, we often need to print some variable values to facilitate debugging. At this time, we will find that if the dict list containers contain Chinese characters, no matter the str type or unicode type, they cannot be printed. As follows:

>>> Print {'name': 'zhang san '}
{'Name': '\ xd5 \ xc5 \ xc8 \ xfd '}
>>> Print {'name': u'zhang san '}
{'Name': U' \ u5f20 \ u4e09 '}

Of course, as a mortal, I am unable to fill in the meaning of these hexadecimal systems. It is also very troublesome to transfer each time. Is there a way to make it happen once and for all. Google has found that there are still many positions.

Note: This experiment is mainly based on win7 and Python2.7. The running environment is as follows:

>>> import sys,locale
>>> sys.getdefaultencoding()
'ascii'
>>> locale.getdefaultlocale()
('zh_CN', 'cp936')
>>> sys.stdin.encoding
'cp936'
>>> sys.stdout.encoding
'cp936'

Address: http://www.cnblogs.com/xybaby/p/7854126.html

Str Chinese

First, let's analyze why the Chinese container (dict list tuple) cannot be included)

>>> Data = {'yan ': 1, 2: ['ru'], 3: 'yu '}

>>> Data
{2: ['\ xc8 \ xe7'], 3: '\ xd3 \ xf1', '\ xd1 \ xcf': 1}
>>> Print data
{2: ['\ xc8 \ xe7'], 3: '\ xd3 \ xf1', '\ xd1 \ xcf': 1}
>>> Print data [3]
Yu

The data above contains Chinese characters in the key value and a nested list. This data is used later.

You can see that no matter whether the data is output directly (call dict. _ repr _) or print data (call dict. _ str. the result of _ repr. When _ str _ of the container is called, the _ repr _ method of the container element is actually called. This is a good Verification:

>>> class OBJ(object):
... def __str__(self):
...    return 'OBJ str'
... def __repr__(self):
...    return 'OBJ repr'
...
>>> lst = [OBJ()]
>>> print lst
[OBJ repr]
>>>

OBJ: the custom class __str _ repr _ has different implementation methods. When used as an element of the container (list), OBJ is obviously called. _ repr __

Print-a-list-that-contains-chinese-characters-in-python gave the answer to a question in stackoverflow.

When you print foo, what gets printed out is str(foo).
However, if foo is a list, str(foo) uses repr(bar) for each element bar, not str(bar).

Of course, this issue has long been discovered. In PEP3140 str (container) shocould call str (item), not repr (item), in this proposal, we recommend that you use _ str _ instead of _ repr _ when printing the container __. But Guido (father of Python) relentlessly refused because:

Guido said this would cause too much disturbance too close to beta

Although the proposal was rejected, the demand still exists, so there are various solutions.

First pose: print one by one

Directly print elements in the container

>>> Lst = ['zhang san', 'Li si']
>>> Print '[' + ','. join (["asdf", "Chinese"]) + ']'
[Asdf, Chinese]
>>> For k, v in {'name': 'zhang san'}. items ():
... Print k, v
...
Name zhangsan

It is very convenient for simple container objects, but it is very troublesome for nested container objects, such as the data example above.

Second pose: json dumps

This method is recommended on the Internet.

>>> Import json
>>> Dumped_data = json. dumps (data, encoding = 'gbk', ensure_ascii = False)
>>> Print dumped_data
{"2": ["such as"], "3": "yu", "Yan": 1}

As you can see, although the Chinese characters are printed, the quotation marks are added on both the 2 and 3 documents, which is strange.

Note that the preceding two parameters (encoing ensure_ascii) have default parameters (encoding = 'utf-8', ensure_ascii = True ), it is different from what we use here.

>>> dumped_data = json.dumps(data)
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27.9\lib\json\__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "D:\Python27.9\lib\json\encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "D:\Python27.9\lib\json\encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc8 in position 0: invalid continuation byte

Of course, why is the UnicodeDecodeError reported here? You can refer to this Article "do not want to be despised again? Let's see it! Understand Python2 character encoding

The ensure_ascii parameter is also critical.

>>> dumped_data = json.dumps(data, encoding = 'gbk')
>>> print dumped_data
{"2": ["\u5982"], "3": "\u7389", "\u4e25": 1}

Python document is described;

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only.

Third pose: repr string_escape

>>> Decoded_data = repr (data). decode ('string _ escape ')
>>> Print decoded_data
{2: ['ru '], 3: 'yu', 'yan ': 1}

Since the repr output is a hexadecimal str, you can use string_escape for conversion. For more information, see the preceding section.

Fourth pose: PEP3140

Although PEP3140 was reject, we can still use its idea, that is, force call str. _ str _ instead of str. _ repr __

 1 class ForceStr(str): 2     def __repr__(self): 3         return super(ForceStr, self).__str__() 4  5 def switch_container( data ): 6     ret = None 7     if isinstance(data, str): 8         ret = ForceStr(data) 9     elif isinstance(data, list) or isinstance(data, tuple):10         ret = [switch_container(var) for var in data]11     elif isinstance(data, dict):12         ret = dict((switch_container(k), switch_container(v)) for k, v in data.iteritems())13     else:14         ret = data15     return ret

 


>>> Switched_data = switch_container (data)
>>> Print switched_data
{2: [For example], 3: Yu, Yan: 1}
>>> Switched_data
{2: [For example], 3: Yu, Yan: 1}

ForceStr inherits from str, and then ForceStr. _ repr _ calls str. _ str __. Then recursively replace the str elements in the container with ForceStr. We can see that Chinese characters can be printed in sequence, and the format is correct.

Unicode Chinese

The basic posture is the same in the previous chapter. The answer is as follows:

Same as the second posture

>>> Udata = {u'yan ': 1, 2: [u' such as '], 3: u 'yu '}
>>> Print json. dumps (udata, encoding = 'gbk', ensure_ascii = False)
{"2": ["such as"], "3": "yu", "Yan": 1}

Same as the third pose

>>> Print repr (udata). decode ('unicode _ escape ')
{2: [u 'such as'], 3: u 'yu ', u 'yan': 1}
>>>

Same as the fourth pose

 1 def switch_container( data ): 2     ret = None 3     if isinstance(data, unicode): 4         ret = ForceStr(data.encode(sys.stdout.encoding)) 5     elif isinstance(data, list) or isinstance(data, tuple): 6         ret = [switch_container(var) for var in data] 7     elif isinstance(data, dict): 8         ret = dict((switch_container(k), switch_container(v)) for k, v in data.iteritems()) 9     else:10         ret = data11     return ret

>>>
>>> Print switch_container (udata)
{2: [For example], 3: Yu, Yan: 1}

When str and unicode Chinese coexist

Same as the second posture

>>> Data [4] = U' Ah'
>>> Print json. dumps (data, encoding = 'gbk', ensure_ascii = False)
{"2": ["such as"], "3": "yu", "4": "Ah", "Yan": 1}

Same as the third pose

>>> Print repr (data). decode ('string _ escape ')
{2: ['ru '], 3: 'yu', 4: U' \ u554a', 'yan ': 1}

Er, unicode Chinese cannot be printed.

>>> print repr(data).decode('unicode_escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character u'\xc8' in position 6: illegal multibyte sequence
>>>

You may have the correct posture, but I didn't try it out.

Same as the fourth pose

 1 def switch_container( data ): 2     ret = None 3     if isinstance(data, str): 4         ret = ForceStr(data) 5     elif isinstance(data, unicode): 6         ret = ForceStr(data.encode(sys.stdout.encoding)) 7     elif isinstance(data, list) or isinstance(data, tuple): 8         ret = [switch_container(var) for var in data] 9     elif isinstance(data, dict):10         ret = dict((switch_container(k), switch_container(v)) for k, v in data.iteritems())11     else:12         ret = data13     return ret

>>> Print switch_container (data)
{2: [For example], 3: Yu, 4: Ah, Yan: 1}

Summary

The json. dumps version can be used to handle str Chinese, unicode Chinese, and the coexistence of str and unicode Chinese, but the results are slightly different from the real ones.

String_escape (unicode_escape) is used only for str (unicode) Chinese characters, and usage is limited

The self-implemented switch_container version can support str Chinese characters, unicode Chinese characters, and coexistence of str and unicode Chinese characters.

The coexistence of str and unicode is really a headache!

Reference

Print-a-list-that-contains-chinese-characters-in-python

Don't you want to be despised again? Let's see it! Understand Python2 character encoding

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.