Answering this for late-comers because I don ' t think so the posted answers get to the root of the problem, which is the Lack of locale environment variables in a CGI context. I ' m using Python 3.2.
Open () Opens file objects in text (string) or binary (bytes) mode for reading and/or writing; In text mode the encoding used to encode strings written to the file, and decode bytes read from the file, could be specifie D in the call; If it isn ' t then it's determined by locale.getpreferredencoding (), which on Linux uses the encoding from your locale ENVI Ronment settings, which is normally utf-8 (from e.g. Lang=en_us. UTF-8)
>>>F=Open(' Foo ', ' W ') # Open file for writing in text mode>>>F.Encoding' UTF-8 ' # encoding is from the environment>>> F. ( ' € ' ) # write a Unicode string1>>> F. () >>> exit () [email protected]:~$ HD Foo00000000 E2 82 AC |...| # data is UTF-8 encoded
Sys.stdout is in fact a file opened for writing in text mode with an encoding based on locale.getpreferredencoding (); You can write strings to it just fine and they ' ll is encoded to bytes based on Sys.stdout ' s encoding; Print () By default writes to Sys.stdout-print () itself have no encoding, rather it ' s the file it writes to that have an en Coding
>>>Sys.stdout. ' UTF-8 ' # encoding is from the Environment>>> Exit () [email Protected]:~$ python3 -c > foo[email Protected]:~$ HD foo00000000 E2 82 AC 0a |....| # data is UTF-8 encoded; \ n is from print ()
; You cannot write bytes to Sys.stdout-use sys.stdout.buffer.write () for that; If you try to write bytes to sys.stdout using Sys.stdout.write () then it'll return an error, and if you try using print ( Then print () would simply turn the bytes object into a string object and an escape sequence like would be \xff
treated as The four characters \, x, F, F
[Email protected]:~$ python3-C' Print (b "\xe2\xf82\xac") ' >Foo[email protected]:~$ HD Foo00000000 62 27 5c 78 6532 5c 78 66 3832 5c 78 61 6327 | ' \xe2\xf82\xac ' |00000010 0a |.|
In a CGI script you need to write to sys.stdout and you can use print () to do it; But a CGI script process in Apache have no locale environment Settings-they is not part of the CGI specification; Therefore the Sys.stdout encoding defaults to ansi_x3.4-1968-in other words, ASCII; If you try to print () A string this contain non-ascii characters to sys.stdout you'll get "unicodeencodeerror: ' ASCII ' cod EC can ' t encode character ...: Ordinal not in range (128) "
A simple solution are to pass the Apache process's LANG environment variable through to the CGI script using Apache ' s mod_e NV passenv command in the server or virtual host configuration:passenv LANG; On Debian/ubuntu Make sure this in/etc/apache2/envvars you had uncommented the line ". /etc/default/locale "So, Apache runs with the system default locale and not the C (Posix) locale (which is also ASCII encoding); The following CGI script should run without errors in Python 3.2:
#!/usr/bin/env python3 import Sysprint () print () print ( ' + Sys.. Encoding + ' </pre>h€llów?rld<body>
Https://stackoverflow.com/questions/9322410/set-encoding-in-python-3-cgi-scripts
Completely solve the coding problems of Python CGI programming