I. BACKGROUND
We used the SVN pre-commit hook to do an SVN locking function, to control the SVN submission before the line, to prevent the development and planning of random submissions, resulting in online bugs. This platform has a web interface function, you can compete for some people to open a submit permission, and record his submission file and log this time, garbled appeared, the Web interface displayed in Chinese are garbled, including log and submit file name
Garbled format is:/design/x_?229?175?188?229?133?165?230?149?176?230?141?174/04_?233?129?147?229?133?183?230?149?176?230? 141?174?232?161?168/?229?149?134?229?186?151?230?149?176?230?141?174?232?161?168.xls such A;
For example, the Web is garbled:
SVN list Chinese garbled
The corresponding actual Chinese characters are:
11.png
Second, the reason
The problem of character encoding, as we all know, is primarily ASCII encoding and Unicode, utf-8 conversions.
The above code is basically a 229 format, which is the? SVN should be the addition of the delimiter, the following three digits for ASCII encoding, we can use the ASCII character function to convert it to a character, all of the above ASCII characters together to form a Unicode encoding, Then turn utf-8, you can get urlencode after the Chinese characters, and then UrlDecode can be;
Third, solve
Know the above principle, I use JS solution, so do not modify the back-end code, the server does not need to reload, debugging will be more convenient; If you only use other languages, such as Python, Java, PHP, etc. to parse garbled. You can find the corresponding function to replace it.
3.1 UrlDecode and UrlEncode
For example: http://50vip.com/%E5%88%86%E5%B8%83%E5%BC%8F.html this link inside%e5%88%86%e5%b8%83%e5%bc%8f is urlencode after the Chinese, The Chinese character "distributed" can be obtained by using UrlDecode.
About this part, the network blog a lot, casually find;
The JavaScript method is as follows:
function UrlDecode (ZIPSTR) {
var uzipstr= "";
for (Var i=0;i<zipstr.length;i++) {
var chr = Zipstr.charat (i);
if (Chr = = "+") {
uzipstr+= "";
}else if (chr== "%") {
var asc = zipstr.substring (i+1,i+3);
if (parseint ("0x" +asc) >0x7f) {
Uzipstr+=decodeuri ("%" +asc.tostring () +zipstr.substring (i+3,i+9). toString ());
i+=8;
}else{
Uzipstr+=asciitostring (parseint ("0x" +asc));
i+=2;
}
}else{
uzipstr+= CHR;
}
}
return uzipstr;
}
3.2 Unicode Turn Utf-8:encodeutf8
This part of the code online to find, there may be bugs, I use no problems, other languages should have a simpler way
function EncodeUtf8 (S1) {
var s = Escape (S1);
var sa = s.split ("%");
var Retv = "";
if (Sa[0]!= "") {
Retv = sa[0];
}
for (var i = 1; i < sa.length; i++) {
if (sa[i].substring (0, 1) = = "U") {
Retv + = Hex2utf8 (Str2hex (sa[i].substring (1, 5));
else Retv + = "%" + sa[i];
}
return RETV;
}
function Str2hex (s) {
var c = "";
var n;
var ss = "0123456789ABCDEF";
var digs = "";
for (var i = 0; i < s.length; i++) {
c = S.charat (i);
n = ss.indexof (c);
Digs + = Dec2dig (eval (n));
}
return value;
return digs;
}
function Dec2dig (N1) {
var s = "";
var n2 = 0;
for (var i = 0; i < 4; i++) {
N2 = Math.pow (2, 3-i);
if (N1 >= n2) {
s + + ' 1 ';
N1 = N1-N2;
} else
s + + ' 0 ';
}
return s;
}
function Dig2dec (s) {
var Retv = 0;
if (s.length = = 4) {
for (var i = 0; i < 4; i++) {
Retv + = eval (S.charat (i)) * MATH.POW (2, 3-i);
}
return RETV;
}
return-1;
}
function Hex2utf8 (s) {
var RetS = "";
var TempS = "";
var ss = "";
if (s.length = = 16) {
TempS = "1110" + s.substring (0, 4);
TempS + = "Ten" + s.substring (4, 10);
TempS + = "Ten" + s.substring (10, 16);
var sss = "0123456789ABCDEF";
for (var i = 0; i < 3; i++) {
RetS + = "%";
SS = temps.substring (I * 8, (eval (i) + 1) * 8);
RetS + = Sss.charat (Dig2dec (ss.substring (0, 4));
RetS + = Sss.charat (Dig2dec (ss.substring (4, 8));
}
return RetS;
}
Return "";
}
3.3 Parsing SVN garbled
Parse SVN format garbled, get each character of the ASCII, and then stitching Unicode, then turn UTF8, and then UrlDecode, successfully resolved the Chinese characters;
function Svn_ascii_to_utf8 (ori) {
s = Ori.split ('? ');
Three is a Chinese character, at least one Chinese character
if (S.length < 3) {
return to the Ori;
}
var ascii = ';
For (i in s) {
x = S[i];
if (x.length = = 3) {
ASCII + + string.fromcharcode (x);
Console.log (ASCII);
}
else if (X.length > 3) {
ASCII + = String.fromCharCode (X.substr (0, 3));
ASCII + + X.SUBSTR (3);
}
else {
Doing nothing
}
}
Return UrlDecode (EncodeUtf8 (ASCII));
}
Then, when the Web page is displayed, the corresponding garbled, call S = Svn_ascii_to_utf8 (s), there are garbled to garbled, no garbled to remain unchanged;