Python-code encoding format conversion and python encoding format conversion

Source: Internet
Author: User

Python-code encoding format conversion and python encoding format conversion

Recently, I just changed my job. I didn't have much time to sort out my work. I spent most of my time familiarizing myself with the new company's business and their code framework, the most important thing is that there are still many new things to learn. I used to develop php back-end development. After I came here, I had to learn C ++ from the front-end. Haha, in short, it is very full. You can enjoy a good night's sleep every day when you go home from work ~). Let's talk about the job change. I have graduated for half a year at the beginning of this year. I feel that my technology is growing fast, and the programmer in my company is not as good as the operation. So I want to change my job, I interviewed three (two big ones and one small one) and gave them an offer. Of course, I picked a variety of comprehensive aspects from the big companies (salary, what to do, transportation, etc) it's not bad. Anyway, I feel like I have come in smoothly (much easier than I did when I graduated). Haha, the harder I work, the luckier I am, and the harder I work !. Starting from this week, we will continue to organize our blogs so as not to get too lazy and used to ourselves.

When I first came to this company and got familiar with the environment, the boss started to ask me to do a job of migrating and modifying code. What I want to say is that this kind of work is really boring ~~, Looking at other people's code, changing others' code, changing a variable here, and changing a file name here are all technical and tedious tasks, however, you can familiarize yourself with the environment by using the migration code. Let's talk about the theme of today-the code encoding format has changed. For some reason, the code needs to be migrated from data center A to data center B, which cannot be accessed from each other, however, the Code of data center A is UTF-8 encoded. Data Center B must be GBK encoded to see how to solve this problem.

Encoding Problems

Let's talk about the encoding problem first. In the above example, the database in data center B is all GBK encoded, so the data obtained from the database is all GBK, the data retrieved from the database is encoded in GBK format. It must be displayed without garbled characters. If the data retrieved from the database is not converted, you need to set the encoding to GBK when sending the header. The output files (such as html and tpl) must all be GBK. See the following figure for more clarity:

    DB (GBK) => php (the encoding format is not limited, but if the code file contains Chinese characters, the file must be gbk encoded or converted to gbk when the Chinese characters are output) => header (GBK) => html, tpl (GBK)

Or there is another way to convert utf8 to gbk in the Code only when the database is released. In general, utf8 is more popular and has fewer problems.

   DB (GBK) => php (utf8, and convert the data retrieved from the database to utf8) => header (utf8) => html, tpl (utf8)

As long as we follow the above two standard encoding formats, there will be no garbled characters. At least the first method I tested is okay, so I guess the second method is OK, now let's write a small script to convert the file encoding format:

#! /Usr/bin/python #-*-coding: UTF-8-*-# Filename: changeEncode. pyimport osimport sysdef ChangeEncode (file, fromEncode, toEncode): try: f = open (file) s = f. read () f. close () u = s. decode (fromEncode) s = u. encode (toEncode) f = open (file, "w"); f. write (s) return 0; doesn t: return-1; def Do (dirname, fromEncode, toEncode): for root, dirs, files in OS. walk (dirname): for _ file in files: _ file = OS. path. join (root, _ file) if (Cha NgeEncode (_ file, fromEncode, toEncode )! = 0): print "[Conversion failed:]" + _ file else: print "[success:]" + _ filedef CheckParam (dirname, fromEncode, toEncode ): encode = ["UTF-8", "GBK", "gbk", "UTF-8"] if (not fromEncode in encode or not toEncode in encode ): return 2 if (fromEncode = toEncode): return 3 if (not OS. path. isdir (dirname): return 1 return 0 if _ name __= = "_ main _": error = {1: "The first parameter is not a valid folder", 3: "the source and target encoding are the same", 2: "The encoding you want to convert is no longer within the range: UTF-8, GBK "} dirname = sys. argv [1] f RomEncode = sys. argv [2] toEncode = sys. argv [3] ret = CheckParam (dirname, fromEncode, toEncode) if (ret! = 0): print error [ret] else: Do (dirname, fromEncode, toEncode)

The script is simple and easy to use.

./ChangeEncode. py target_dir fromEncode toEncode

  
Note the following common encoding relationships:

Us-ascii encoding is a subset of UTF-8 encoding, which is obtained from stackoverflow. The original text is ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded,

I tried it. It is true that when Chinese characters are not added, the encoding is displayed as us-ascii. After Chinese characters are added, the encoding becomes UTF-8.

There is also the ASNI encoding format, which represents the local encoding format. For example, in the simplified Chinese operating system, ASNI encoding represents GBK encoding.

Another point is that the command for viewing the file encoding format in linux is:

file -i *

You can see the encoding format of the file.

Of course, some of the above files may have special characters, which may fail to be processed, but the general program file is no problem.

Refer:

Http://stackoverflow.com/questions/11303405/force-encode-from-us-ascii-to-utf-8-iconv

  

 

The copyright of this article is owned by the author iforever (luluyrt@163.com), without the author's consent to prohibit any form of reprint, repost the article must be in the obvious position on the article page to give the author and the original connection, otherwise, you are entitled to pursue legal liability.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.