Pylons Unicode document

Source: Internet
Author: User

Unicode introduction under pylons: http://wiki.pylonshq.com/display/pylonsdocs/Unicode

1.3 Unicode literals in Python source code

In Python source code, Unicode literals are written as strings prefixed
The 'U' or 'U' character:

1
2
>>>U'abcdefghjk'
>>>U'lmnopqrstuv'

You can also use","""'Or'''Versions too. For example:

 1 
2
3
    U "This  
... is a really long
... Unicode string "

Specific code points can be written using\ UEscape sequence, which is
Followed by four hex digits giving the code point. If you use\ UInstead
You specify 8 hex digits instead of 4. Unicode literals can also use the same
Escape sequences as 8-bit strings, including\ X,\ XOnly takes two
Hex digits so it can't express all the available code points. You can add
Characters to Unicode strings usingUnichr ()Built-in function and find
Out What the ordinal isOrd ().

Here is an example demonstrating the different alternatives:

1
2
3
4
5
6
7
8
9
 >>>  S   =   U"  \ X66 \ u0072 \ u0061 \ u0000006e  "   +   Unichr  (  231  )   +   U "ais" 
>>> # ^ Two-digit hex escape
>>> # ^ Four-digit Unicode escape
>>> # ^ Eight-digit Unicode escape
>>> For C In S : Print ORD ( C ),
...
97 102 114 97 110 231 105 115
>>> Print S
Français

Using escape sequences for code points greater than 127 is fine in small doses
But Python 2.4 and above support writing Unicode literals in any Encoding
Long as you declare the encoding being used by including a special comment
Either the first or second line of the source file:

1
2
3
4
#! /Usr/bin/ENV Python
#-*-Coding: Latin-1 -*-
U = U'abcdé'
Print ORD(U[-1])

If you don't include such a comment, the default encoding used will be ASCII.
Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as
Default encoding for string literals; in Python 2.4, characters greater
127 still work but result in a warning. For example, the following program has
No encoding declaration:

1
2
3
#! /Usr/bin/ENV Python
U = U'abcdé'
Print ORD(U[-1])

When you run it with Python 2.4, it will output the following warning:

1
2
3
SYS: 1: deprecationwarning: Non-ASCII character '\ xe9' in file testas. py on line
2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for de
Tails

And then the following output:

1
233

For real world use it is recommended that you use the UTF-8 encoding for your
File but you must be sure that your text editor actually saves the file
UTF-8 otherwise the python interpreter will try to parse UTF-8 characters
They will actually be stored as something else.

Note

Windows users who use the scite
Editor can specify the encoding of their file from the menu using
File-> Encoding.

Note

If you are working with Unicode in detail you might also be interested in
TheUnicodedataModule which can be used to find out Unicode Properties
Such as a character's name, category, numeric value and the like.

 

2 applying this to Web Programming

So far we 've seen how to use encoding in source files and seen how to decode
Text to Unicode and encode it back to text. We 've also seen that Unicode
Objects can be manipulated in similar ways to strings and we 've seen how
Perform input and output operations on files. Next we are going to look at how
Best to use Unicode in a web app.

The main rule is this:

Your application shocould use Unicode for all strings internally, decoding any
Input to Unicode as soon as it enters the application and encoding the Unicode
To UTF-8 or another encoding only on output.

If you fail to do this you will find thatUnicodedecodeerrorS will start
Popping up in unexpected places when Unicode strings are used with normal 8-bit
Strings because Python's default encoding is ASCII and it will try to decode
The text to ASCII and fail. It is always better to do any encoding or decoding
At the edges of your application otherwise you will end up patching lots
Different parts of your application unnecessarily as and when errors pop up.

Unless you have a very good reason not to it is wise to use UTF-8 as
Default encoding since it is so widely supported.

The second rule is:

Always test your application with characters abve 127 and abve 255 wherever
Possible.

If you fail to do this you might think your application is working fine, but
Soon as your users do put in non-ASCII characters you will have problems.
Using Arabic is always a good test and www. Google. AE is a good source of sample
Text.

The third rule is:

Always do any checking of a string for illegal characters once it's in
Form that will be used or stored, otherwise the illegal characters might be
Disguised.

For example, let's say you have a content management system that takes
Unicode Filename, and you want to disallow paths with a'/'character. You
Might write this Code:

1
2
3
4
5
6
 Def   Read_file  (  Filename  ,  Encoding  ): 
If '/' In Filename :
Raise Valueerror ( "'/' Not allowed in filenames" )
Unicode_name = Filename . Decode ( Encoding )
F = Open ( Unicode_name , 'R' )
#... Return contents of file...

This is incorrect. If an attacker cocould specify the 'base64' encoding, they
Cocould passL2v0yy9wyxnzd2q =Which is the base-64 encoded form of the string
'/Etc/passwd'Which is a file you clearly don't want an attacker to get
Hold of. The above code looks/Characters in the encoded form and
Misses the dangerous character in the resulting decoded form.

Those are the three basic rules so now we will look at some of the places you
Might want to perform Unicode Decoding in a pylons application.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.