When Google Analytics, Firefox, and IIS come together...

Source: Internet
Author: User
Tags control characters printable characters website server
Document directory
  • IE8 processing of Chinese cookies
  • Firefox's processing of Chinese cookies

Today, my colleagues found a strange phenomenon when advertising AdWords:

After clicking the AdWords advertisement in Firefox to jump to the customer's website, refresh the page again or browse other pages with an HTTP Error (error code 400) indicating "Bad Request ).

This problem does not occur in IE and Chrome.

Cookie

Because HTTP itself is stateless, the technology used to maintain the status is generally Cookie. Previously, I encountered several access exceptions caused by cookies. One time, my colleagues could not access the New Oriental website using Firefox (see my previous article: Firefox could not access a specific website), and the other time I could not log on to my Gmail account. These two problems are eventually solved by clearing the Cookie. So this time I have experience. I used the web developer bar to check which cookies are available on the current customer's website. I checked and found a garbled Cookie.

You don't have to think about it. You also know that this is because the Chinese characters are not encoded and are directly inserted into the cookies. Look at the Cookie header __utmz, Which is implanted by Google Analytics (GA. After the Cookie is deleted, the access is normal.

Cookie encoding of Google Analytics

The ad Url tested by a colleague adds the tracking parameters supported by Google Analytics, and the GA code is also deployed on the customer's website.

When executed, GA checks whether the current Url contains the ad tracking parameters (at least utm_source must be included). Once found, GA considers the traffic to be paid, in this case, it extracts the sources (utm_source), Ad series (utm_campaign), and Ad Media (utm_medium) from the ad information and decodes them (first try decodeURIComponent function, if it fails, use the unescape function) and store the Cookie _ utmz persistently. HoweverWhen writing cookies, GA misses the encoding operation.. That is to say, if the original information of our advertising series or media contains Chinese, GA will directly insert Chinese information into the Cookie.

For example, I want to advertise my blog:

  • Ad series: Kevin blog promotion
  • Ad Source: google
  • Advertising media: PPC
  • Landing page Url with tracking parameters: http://www.imkevinyang.com /? Utm_source = google & utm_medium = ppc & utm_campaign = Kevin % E5 % 8D % 9A % E5 % AE % A2 % E5 % AE % A3 % E4 % BC % A0

Then GA will execute code similar to the following when writing cookies (the value of _ utmz is simplified here ):

Var data = "Kevin blog promotion"; // GA indicates that the Cookie operation document is incorrect. cookie = "_ utmz =" + data; // The correct Cookie storage Operation document. cookie = "_ utmz =" + encodeURI (data );

When using Javascript to access cookies, the standard operation should be to compile a code at the time of storage, and decode it at the time of retrieval. This ensures that all characters stored in cookies are ASCII characters. Early JS uses escape/unescape for codec, And now usually uses the encodeURI or encodeURIComponent function, both functions are UTF-8 encoding.

Potential Chinese Cookie Problems

So what happens when we directly store Chinese characters to cookies? What is the difference between IE and Firefox? We did a few experiments under IE8 and Firefox3.6.

IE8 processing of Chinese cookies

Tutorial steps:

  • Open IE8, clear all cookies and caches, and create a clean test environment.
  • Access http://www.imkevinyang.com/
  • In the address bar, execute javascript: alert (document. cookie = "mycookie = zookeeper encoding; expires = Mon, 25 May 2020 10:31:49 GMT") to write a persistent cookie.

In this way, I set a cookie that expired on January 1, May 25, 2020 on my blog. The reason why we need to set persistent cookies instead of session cookies is that IE will write persistent cookies to the hard disk so that we can understand this process, however, I do not know the location of the session cookie.

When you are careful, you will notice that the cookie value above is very strange and there are several garbled characters. In fact, the garbled characters are obtained after I decode the UTF-8 (6 bytes) of the two Chinese characters using GB2312 (each two bytes corresponds to one character. As for why we want to perform this test, we will know later.

The IE address bar uses ANSI encoding. That is to say, when you enter Chinese characters in the address bar, IE will encode Chinese characters in the default Character Set of the system. When you use a Chinese system, the "encoding" character in the address bar is actually encoded as B1 E0 C2 EB four bytes. In the English system, the system uses the western character set as the default Character Set and does not have any Chinese characters. Therefore, the "encoding" character set will be replaced ?, That is, 3F.

IE automatically selects the most suitable encoding when creating the cookie file. When we write "GB2312 encoding" (get binary stream E7 BC 96 E7 A0 81 B1 E0 C2 EB After encoding), because the last four bytes cannot be decoded with UTF-8, therefore, IE stores the file as GB2312. (IE will store the file as a UTF-8 if you only test "too many variables ).

Now let's take a look at what is in the file.

Open the everything tool and search for "www. imkevinyang txt" to list all files whose names contain www. imkevinyang and txt.

Open this file and store the persistent cookie information of IE.

 

 

At this time, we can use javascript: alert (document. cookie) in the address bar to find that the Cookie value displayed by IE is the same as that set at the beginning.

After reading the local Cookie information, let's see what the Cookie IE sends to the server is.

We use Fiddler to monitor the entire HTTP Communication Process (HTTP Watch is not used here because HTTP Watch will decode the HTTP message and it will not be able to see the original binary data, which is inconvenient for analysis ).

We initiate an access to the homepage of my blog. In Fiddler, we will see:

(Text form)

(Binary Raw Data)

 

We are surprised to see that IE does not send the character "zookeeper encoding" We set (the binary value is E7 BC 96 E7 A0 81 B1 E0 C2 EB ), "encoding" (now I know why I used "encoding" for testing ). The binary value is E7 BC 96 E7 A0 81 ef bf bd. Note that IE replaces the last four bytes of the original information with ef bf bd.

This is because when IE sends an HTTP message, it will detect whether the byte stream can be decoded in UTF-8. If not, the corresponding exception byte is replaced with ef bf bd (that is, the corresponding character ). This is a bit similar to what we mentioned earlier. Does the English system use missing characters? .

Firefox's processing of Chinese cookies

Firefox does not directly store cookies as files like IE, so it is not so convenient to study them.

However, we still follow the same steps for the experiment, but this time we will modify the Cookie value for the sake of simplicity.

  • Open Firefox, clear all cookies and caches, and create a clean test environment.
  • Access http://www.imkevinyang.com/
  • Execute javascript in the address bar: alert (document. cookie = "mycookie = 1 encoding 1 ")
    The dialog box popped up for the first time in Firefox shows that the Cookie is successfully set and the "1 encoding 1" string is returned.

However, if you use Javascript: alert (document. cookie) again, you will find that the pop-up content has changed:

We can view the cookies in the current domain through the Web Developer Toolbar and find that the current cookies are indeed as shown in the second dialog box above, with garbled characters:

What we are concerned about now is how this garbled code comes from?

We first copy this string of text to Notepad ++ (note, need to adjust Notepad ++ to the UCS-2 encoding State) to see what the corresponding bytes.

 

31 is the ASCII code of the character "1. Where did 16 and 01 come from?

It is actually a Unicode Code Point. The Unicode code of "encoding" is "7F16 7801 ". The 16 and 01 values shown above are obtained after the Unicode code is truncated. To confirm this conclusion, I have tested several Chinese cookies.

That is to say, the address bar of Firefox uses the Unicode code. That is to say, when you enter a string like "mycookie = 1 encoding 1", what Firefox sees is:

\ U006d \ u0079 \ u0063 \ u006f \ u006f \ u006b \ u0069 \ u0065 \ u003d \ u0031 \ u7f16 \ u7801 \ u0031

When a Chinese Cookie is stored, it truncates the Unicode high level and retains the low level. Then write the data to the Cookie store. This is why the Cookie "encoding" is changed to "16 01 ".

When Firefox sends an HTTP request to the server, the encoding method of the http message is the same as that of IE. It also judges that the byte stream can be decoded by the UTF-8. If you are interested, you can perform the test according to the above method.

Why cannot Firefox access

Based on our understanding of how IE and Firefox process Chinese cookies, we can now know that IE uses ANSI for Chinese cookies, that is to say, the Cookie will never contain non-printable characters in the ASCII character set (each byte of GB2312 encoding also starts from A0), while Firefox uses the Unicode code, but it is intercepted at a high level, as a result, cookies may contain non-printable characters in the ASCII character set.

IE and Firefox handle the encoding of byte stream sequences in the same way when constructing HTTP messages. Unable to use the byte stream sequence decoded by UTF-8, replace it with ef bf bd, which we have seen in Fiddler. Non-printable characters in the ASCII character set are not processed and directly sent to the server.

Therefore, when accessed via Firefox, the HTTP Request received by the server may contain non-printable characters, but this will not happen if accessed via IE.

For example, if a Chinese Cookie is set on Firefox, the Unicode code of "I" is 62 11, which is truncated by Firefox's high position, then 11 is left, which corresponds to Device Control 1 in the ASCII code table, that is, control characters. When you initiate a Request to the server with this Cookie, the server may directly throw a Bad Request exception, telling the client that the Request you sent does not comply with the HTTP specification.

Therefore, not only do cookies not support such non-printable characters, but other HTTP headers do not support such non-printable characters. We can directly use WFetch to construct such an "invalid" request:

The server will throw a 400 Bad Request.

Different Processing Methods for IIS and Apache

When there is a problem with the request initiated by the client, the processing method of the server depends on the implementation of different servers. The problem we discussed above will only affect IIS, but will not affect those servers that use Apache or LiteSpeed in the background. This shows that the fault tolerance of IIS is still a little worse. I don't know whether it is a good thing or a bad thing from the security perspective.

Summary

You may be confused when you talk about this much. Let's repeat the story.

The advertising agent places an advertisement and adds google's advertisement parameters to the Url of the landing page, which contains Chinese information. The GA code is deployed on the customer's website, after GA reads the Chinese information, it directly throws it into the Cookie without being encoded. In Firefox, the Unicode code of this Chinese character is truncated to keep it low. When you refresh the page again, Firefox sends the truncated character to the IIS server, and the character after this truncation is a non-printed character, which IIS cannot handle, A Bad Request is thrown, telling the client that the Request is illegal and cannot be processed.

The whole story is like this.

What should we do? It is recommended that for the sake of insurance, if the customer's website server is using IIS, then you still do not put those URLs with Chinese (even UTF-8-encoded) ads on Firefox, otherwise, it may be a waste of money, because the user may not be able to access it again, in addition, I may not be able to access new Oriental in the future (now I know why my colleague was unable to access new Oriental in Firefox at the time ...).

I hope the entire analysis process will be helpful to you.

--Kevin Yang

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.