This article describes the following issues:
1. Repeat Encoding
2. Multiple encoding formats
3. Several FAQs about Encoding[Description]
The encoding described in this article refers to encode, which can be understood as escape, rather than programming code.
The encoding or escape mechanism solves two problems for us:
A. Avoid reserved word conflicts. For web applications, XSS is also one of the problems.
B. The problem of expressing non-input characters. For example, if I want to express a non-input character on the keyboard in a program, it can be solved by encoding.
[Go to the topic]
1.Repeat Encoding:
I have read some third-party decoder/encoder library functions. A considerable number of functions are not repeatedly encoded. That is to say, the function found that [& #] is followed by several numbers and then 【; in the format of "#", no HTML encoding is performed. Similarly, URL/JS processing is also the case. This is definitely not correct. It violates the browser decoding rules, because the browser only decodes where decoding is needed and ignores the format of the string you give it. Why do you need to avoid repeated encoding During encoding?
I know that there are many reasons why some of my peers do not recode. Some use third-party library functions, and others do not know who the code is, what are the fundamental problems solved by coding ...... The reason column is incomplete,In short, it's okay to repeat the Code where you need to repeat it.. There seems to be no example, and it cannot make people look comfortable. Well, let's take a small example:
Http://a. B .c/admin.jsp? D = x & e = 6
------ Redirect-1 -------> http://a. B .c/login.jsp? Backurl_1 = http % 3A % 2F % 2Fa. B. c % 2Fadmin. jsp % 3Fd % 3Dx % 26e % 3D6
------ Redirect-2 -------> http://a. B .c/sso.jsp? Backurl_2 = http % 3A % 2F % 2Fa. b. c % 2Flogin. jsp % 3Fbackurl_1% 3 Dhttp % 253A % 252F % 252Fa. b. c % 252Fadmin. jsp % 253Fd % 253Dx % 2526e % 253D6 or above the URL after redirect-2 will be returned to redirect-1 for the first time and decoded once, if redirect-1 to redirect-2 does not recode backurl_1, when -- redirect-2 --> returns -- redirect-1, it will be decoded to the prototype State. If the parameter value in backurl_1 contains URL-sensitive characters, the problem may occur.
2.Multiple encoding formats
Take a look at a sensitive string: [</> "& amp; # = -'"]
HTML encoding-3: [& # lt ;.....]
JS Code-1: [\ <\/> \ "\ & \#\= \-\'\"]
JS Code-2: [\ x3C \ x2F \ x3E \ x22 \ x26 \ x23 \ x3D \ x2D \ x27 \ x22]
JS Code-3: [\ u003C \ u002F \ u003E \ u0022 \ u0026 \ u0023 \ u003D \ u002D \ u0027 \ u0022]
URL encoding: [% 3C % 2F % 3E % 22% 26% 23% 3D-% 27% 22% 3F]
Observe the above Code carefully. It is worth noting that:HTMLEncoding-1:The number in the middle of the ISO8859-1 is actually the encoding representation of the current character, the second form is based on the HTML-1 hexadecimal representation, the HTML-3 is the alias Representation
For JavaScript code, you can use [\] + [characters to express any sensitive and visible characters. This is why JavaScript code-1 is like this, for-2,-3 I think it should be clear at a glance, and the HTML-2 is similar.
It should be difficult to encode the URL.
To sum up: The browser will use the ISO8859-1 as the basis for decoding by default, for HTML/JS/URL is only used in different forms, leave some questions for everyone:
A. If a Web application specifies a character set while programming, but does not specify a character set in the HTTP Response Stream, what will happen? Will the browser encounter a decoding error?
B. the encoding problem is a small cut that you have to skip. I hope you will try your best to figure out the historical evolution of character sets and encoding and the relationship between different codes, many problems have been solved.
3.Several common coding problems
A. Sometimes only one encoding is used. Although it is incorrect, it seems that the XSS problem can be solved. Why do I need to use the combined encoding?
A) The sensitive character set is different. For HTML 【?] No. Is not a sensitive character, but it is a sensitive character for a URL.
B) To avoid changing users' input, the browser decoding mechanism should be followed, which is completely correct. Otherwise, such problems will occur.
B. Question about URL encoding and URL parameter encoding:
A) It is important for us to check the validity of a URL that does not allow the use of [Javascript:] + [javascript Functions] such as URLs. There are also multiple protocols such as ftp: //.
B) For URL encoding and URL parameter encoding, I hope you will experiment more and try again. In my opinion, we only need to encode the URL parameter values in the true sense, similar to the first-level URL in [href] [src] [replace, we mainly perform a legality check on the URL. As for why, we can't understand it at half past one. If you do the experiment yourself, the effect will be better.
C. You can repeat the code, as long as the encoding environment does not judge the error, it is okay to repeat the number of times.
D. the principles of XSS related to web applications that use complex front-end UIS Based on JSON data are completely consistent, but the difficulty of developers' judgment is increased because the syntax environment cannot be clear at a glance, however, after figuring out the principle, you are done right. After you fix the XSS problem, you will not encounter many bugs or incomplete fix problems.
The XSS issue has come to an end. I believe that with this series of outlines, although you may not be able to use it flexibly, at least you have a clear route to go.
Also reference the sentence:
The relationship between "Knowledge" and "knowledge application" is similar to that between Cowherd and actress Weaver. The relationship between the two seems to be close, but it is actually far from each other!