Solution to garbled Chinese file names across platforms

Source: Internet
Author: User
Tags filezilla

The cause is this ~~

A long time ago -- about two years ago, I installed FreeBSD as a server on a P3 machine and often used filezilla's SFTP to back up local files, of course, some documents with Chinese names are also included. Everything is normal and there is no problem with uploading and downloading. Even if you use SSH to pack it with tar, compress it with gzip, and then download it and unlock it.

But it's a good day. I installed Ubuntu on another P3 machine a while ago. The problem occurs:

First of all, in Windows with filezilla to connect to Ubuntu when the Chinese file name is all garbled, but with putty connected to Ubuntu through SSH is normal (originally garbled, and later set to UTF-8 encoding is good, but filezilla does not have similar settings to modify ). If filezilla is used to upload a local text file to Ubuntu, the remote file name is correct in filezilla, but garbled characters are displayed in putty.

Then the more serious problem is: When I downloaded files from FreeBSD using filezilla's SFTP on Ubuntu, I found that the Chinese file names had become garbled. You have to use tar/tar.gz to try again. The result is still messy. I finally asked Google, but the results were not satisfactory. I tried many methods and could not solve the problem. I even installed p7zip on FreeBSD and ubuntu.

After a few days of hard work, I thought about how the locale settings on several systems were. Frankly speaking, if this was not mentioned in the information I found, I have never noticed this before. The result of the locale command Check on FreeBSD and Ubuntu is:

FreeBSD's lc_all is "C", that is, the alias of "POSIX". It seems to be a solution without fixed encoding, or it can be considered as a local encoding, which is the default setting.
Lc_all for Ubuntu is "zh_CN.UTF-8", which is set by me.
Windows doesn't have to be mentioned. It must be a local code. What I didn't expect is that the English version of Windows used in the company is actually a local code. I always thought it was Unicode, it seems to be to maintain compatibility.

It is easy to understand that filezilla does not change the encoding method, so when the remote end is inconsistent with the local end's encoding, garbled characters are inevitable. So in windows, filezilla is connected to Ubuntu to see garbled code, and my Putty is set to UTF-8 encoding, so it is correct-because the Ubuntu end is a UTF-8. If you use filezilla to upload a Chinese file, because Windows is local encoding, so after the upload is still local encoding (GB series), Ubuntu is a UTF-8, so it is a messy code.

FreeBSD also has the same problem, because those files were previously uploaded from windows and all are encoded in GB. In the early years, I used Putty to use the default local encoding scheme, so no problems have been found. In this case, because FreeBSD is a local code, and Ubuntu UTF-8 is inconsistent, so how to get it all garbled.

However, it is said that packaging with tools such as 7zip can solve such encoding inconsistency problems. I tried but did not succeed. Later, I realized that 7zip processes the file name according to the locale settings of the system, because the file name was originally encoded in the GB system, and the system is set to UTF-8, so it is always wrong.

The final solution is:

Run the following command on Ubuntu:

 
Export lc_all = "zh_cn.gbk"
SCP Raptor @ freebsdserver:/home/raptor/myfiles /*.*.
Export lc_all = "zh_CN.UTF-8"
Convmv-f gbk-T UTF-8 -- notest *.*

The principle is to change the locale of ubuntu to GBK, and then use the SSH copy file, and then change locale as the UTF-8, and then use convmv to convert the file name encoding.

However, tests show that it is okay not to change the locale direct SCP, because SCP does not change the file name encoding method, however, after locale is changed, it is easy to check the content of SCP before transcoding (otherwise, the content of SCP is garbled and will be normal only after transcoding ).

The result is that these files are normal on Ubuntu, and it will be troublesome to get them back to Windows later. You need to package and download them with tools such as 7zip and then unlock them.

In the final analysis, all these problems are caused by the use of local encoding to record file names in the damn windows system. Unicode is not complete enough.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.