Data collection and processing for Google Analysis

Source: Internet
Author: User

Figure 3-1 describes how Google analysis collects, processes, and displays data. Google analysis uses a common  technology-page tag. A page tag is a piece of JavaScriptCode, You have to put it in the website page you want to be tracked. We call it Google's analysis tracing code or gatc for short. If you do not put this code in the page, Google analysis will not track this page.



The  process starts when a visitor sends a request to the Web server for a certain page. The server sends a response page to the visitor's browser (step 1 in Figure 3-1) to respond to the preceding request. When the browser processes data, it may also contact other servers that contain the data, video, or script files needed in the response page. This is an example of how to handle gatc.





Figure 3-1 "Google analysis" Flowchart


When the visitor's browser reaches gatc, the tracing code starts to work. During code execution, gatc identifies visitors and their browsing environment attributes, such as how many times she accessed the site, where she came from, and what operating system she used, what browser does she use, etc.


After appropriate data is collected, gatc sets (or updates, depending on the actual situation) some third-party cookies (step 2), which will be discussed later. Cookies store visitor information. After these cookies are created on the visitor's machine, the tracing code sends the data to the Google Analysis Server.


After the data is collected and cookies are set, the browser is activated to download a file named Ga from the Google Analysis Server (still in step 2. JS files. all functional code required for "Google analysis" is included in this ga. in the JS File



Once the GA. js file is downloaded, the collected data is sent to Google in the form of a page view. The page view means that the visitor has browsed some pages on the website. Other data, such as event and e-commerce data, can also be sent to "Google analysis" (We will discuss this later ). A page view passes a request to a hidden GIF file named ututm.gif (step 4) to the Google Analysis Server. Each piece of information collected by gatcis stored in the request for __utm.gif as query-string, as follows:


Http:// Utmwv = 4.6.5 & utmn = 1881501226 & utmhn = & utmcs = UTF-8 & utmsr = 1152x720 & utmsc = 24-bit & utmul = en-S & utmje = 1 & utmfl = 10.0% 20r42 & utmdt = analytics % 20 talk % 20by % 20 Justin % 20 cutroni & utmhid = 465405990 & utmr =-& utmp = % 2 fblog % 2f & utmac = UA-XXXX-1 & utmcc = __utma % 3d32856364. 1914213586.1269919681.1269919681.1269919681.1% 3B % 2B _ utmz % 3d32856364. 1269919681.1.1.utmcsr % 3d (direct) % 7 cutmccn % 3d (direct) % 7 cutmcmd % 3d (none) % 3B & gaq = 1


When the Google analysis server receives this page view, it stores it in a type of temporary data warehouse. Google does not describe the exact storage format of the data, but we know some storage formats for raw data. Assume that the data is stored as a big text file or log file (step 5). Each line in the log file contains multiple attributes of the page view. Including:

• Visitors (such as websites and search engines)

• Visitors have been there several times (number of visits)

• Visitor Location (Geographic Information)

• Who is the visitor (IP address)




The next step is data processing. Google analyzes and processes data in log files at intervals of about three hours. The data processing time is fluctuating. Google analysis does not process data in real time. Although data is processed every three hours, data is usually processed within 24 hours after being collected. This is because after a whole day of  is completed, it will be processed again.


Because these actions may result in inaccurate daily metrics, it is best to avoid using Google Analysis for real-time or intra-day reports.


During Processing, each line in the log file is split into different fragments. A segment corresponds to an attribute of the page view. The following is an example of a log file. This is not the actual data from Google analysis, but it can still be used to indicate:[21/Jan/2010: 19: 05: 06 − 0600]

"Get extension utm.gif? Utmwv = 4.6.5 & utmn = 1881501226 & utmhn = & utmcs = UTF-8 & utmsr

= 1152x720 & utmsc = 24-bit & utmul = en-US & utmje = 1 & utmfl = 10.0% 20r42 & utmdt

= Analytics % 20 talk % 20by % 20 Justin % 20 cutroni & utmhid = 465405990 & utmr

=-& Utmp = % 2 fblog % 2f & utmac = UA-XXXX-11 & utmcc

==__ Utma % 3d32856364. 1914213586.1269919681.1269919681.1269919681.1% 3B % 2B


_ Utmz % 3d32856364. 1269919681.1.1.utmcsr % 3d (direct) % 7 cutmccn % 3d (direct)

% 7 cutmcmd % 3d (none) % 3B & gaq = 1 "_ utma

= 32856364.1914213586.1269919681.1269919681.1269919681.1; _ utmb

= 100957269; _ utmc= 100957269; _ utmz = 100957269.1164157501.1.1.utmccn

= (Direct) | utmcsr = (direct) | utmcmd = (none )"


Of course, most of these data is hard to understand, and only some content is obvious: Date and Time (Jan 21,201 0 at 19:05:06) and IP address ( are easy to recognize.


Google analysis puts each segment in the log file record into a data element called a field. Then, the field is converted to a dimension. For example, the IP address becomesVisitor IPField, the visitor city becomesVisitor CityFields andCityDimension


Understanding this is very important: the page view has many attributes, each of which is stored in different fields or dimensions. Then, Google analysis uses these fields to operate data and dimensions to generate reports.

After each row is split into fields and dimensions (Step 6-9), the configuration information is applied to the data. The content includes:

• Site Search

• Objectives and channels

• Filter

See Step 7


Finally, after all the settings are applied, the data is stored in the database (Step 10)


Once the data is stored in the database, the processing is complete. When you (or other users) request a report, the appropriate data will be retrieved from the database and sent to the browser. After Google analyzes and processes data and saves it to the database, it cannot be rational. This means that historical data will never be modified or processed. Any setting or configuration errors may always affect the quality of the data. It is important to avoid configuration errors. There is no way to cancel data errors. This also means that any configuration changes analyzed by Google will not change the historical data. Changes only affect the subsequent data, but not the previous data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.