Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall
on Google search "Alexa", the results of Simplified Chinese more than 70,000. Now, "Alexa rankings" has been the home site webmaster are the most commonly linked words. So, what does this Alexa do? What makes the domestic web site askance? What are the little-known secrets of Alexa? This reporter through in-depth investigation and interview, efforts to restore a true Alexa readers, many of which are the first time in the media to disclose.
Selling Alexa
Write down this topic, the reporter also feared that has the defect place, carefully think about the investigation in recent days to interview, to the reporter set up the website of the disabled, to provide help to reporters friends, as well as reporters themselves, have put a great enthusiasm for Alexa, in a sense, we are also in the common "betray Alexa."
A few days ago, a Beijing weekly has just released the "Top 100 Chinese commercial websites" list, of course, the list itself without the taste of "commercial", but some of the sites introduced, the media has also cited another ranking data to illustrate the value of these sites, this is "China's top commercial website 100" Rankings cited by the ranking is "world-renowned third-party evaluation agencies-The United States Alexa," the ranking of the global website. Of course, the media's reference to the Alexa rankings is nothing more than to illustrate the authority of its "top 100 Chinese business websites" rankings. and another "authority" to establish their own "authority", just explains the media to another "authoritative"--alexa ranking recognition.
Who is Alexa?
April 1996, Alexa was founded in the United States, when it was only a small site, mainly to do classified navigation. At that time, many of the sites were Yahoo's fans,alexa just one of many followers. But Alexa himself has some technology, is to access a Web site traffic statistics and analysis. Later, Alexa also to provide users with their own development of search engine services.
July 1997, Alexa released a software, is now the famous Alexa Toolbar (Alexa toolbar), which is embedded in the Microsoft IE Browser tool, it in the user access to each Web page to Alexa sent back a string of code, Tell Alexa about this browsing information. And users will see the toolbar on the Web site in the global ranking of all the site information, of course, this is only Alexa given the ranking of the site.
1999, Alexa was the United States E-commerce flagship enterprise "Amazon" acquisition, as the latter's wholly owned subsidiary.
In the spring of 2002, Alexa gave up its search engine and worked with Google instead. Google's web crawler around the world constructs a huge database of page information, which greatly enriches the Alexa URL database. At the same time, Alexa also secretly released their own crawling procedures, the internet search for unknown URLs. After years of accumulation, Alexa URL Library has stored 40 billion web site information, more than Google and other search engines, the Internet is the most complete Web site information database. The Alexa database daily average increment of up to 1TB, every two months can be a comprehensive update of the database.
Now, Alexa toolbar in the global "Installed capacity" has been tens of millions, and Alexa's main work is based on the global Computer Users Desktop tool bar back to the information, the world's browsing habits of internet users to monitor, and develop and sell a variety of related products. Among them, there are up to 100,000 Web site rankings of the global Web site, there are specific industry sites for the ranking analysis, a customized website for individual Web site traffic monitoring reports. Alexa also provides a lot of free basic information, for example, "Global site 500 Strong", and "Simplified Chinese website 100", netizens can see on the Alexa alone on a site's ranking history of the map, and even at the same time on the maximum of 5 sites of traffic and rankings and other data for intuitive horizontal contrast.
Although Alexa provides a lot of valuable information on its website, its real popularity is the ranking of global websites that have been quoted and repeatedly hyped by the media and that have caused a lot of controversy. According to the Alexa website of the "official statement", Alexa published the total ranking of the global site is through the collection of all Alexa toolbar return information, calculated by the comprehensive rankings. From this "official statement", we can see that the impact of Alexa ranking factor is two, one is the Alexa collection of information, the second is the Alexa of these information processing of the calculation method.
On the Alexa website, people can search for a specific Web site to query the number of visitors to their sites, Alexa provides a call to reach the technologists users (per million user visits, for short reach) data to indicate the number of visitors, This data refers to the average number of daily visits to a Web site per 1 million Alexa toolbar users. For example, November 3, 2004, The reach value of google.com is 178,500, that is to say, in this day, nearly 180,000 people in every 1 million Alexa toolbar users have visited Google.com. Only the number of visitors, not enough to reflect the situation of a Web site by netizens, Alexa also provided another data to reflect the visit of the netizens The usage rate of the specific website, this is pageviews per user (the number of page browsing, for short PV). The PV value of a website is the average of the total number of pages viewed on the site by the Alexa toolbar users who visit the site on a daily basis, while the same person's repeated browsing of the same page is counted only once a day. Also take the November 3, 2004 google.com Access data for example, this day, google.com PV value is 4.0, that is to say, visitors to the average on the Google.com Web site to browse 4 pages.
With reach and PV these two data, Alexa can give the global website according to the comprehensive flow ranking, it put the site of this ranking is called Traffic Rank (traffic ranking, referred to rank), according to Alexa on its website explanation, This rank is determined by the geometric mean of reach and PV, which is the square root of the product. Obviously, the higher the reach and PV value of a website, the higher rank. Still take google.com as an example, its 4.0 PV value in the general site is not high, but, because Google search engine users very much, its reach value is far higher than the general site, so that google.com rank to 3, that is, google.com the day of the world ranking is the third Bit。 The world's highest ranked site is Yahoo.com, whose reach and PV values are relatively high on global sites.
Use and suspicion
Every moment, the world installed the Alexa Toolbar computer terminals will report to the Alexa terminal on the Internet access. According to this information, Alexa daily rankings of the global site do recalculation, that is to say, Alexa ranking is updated daily. Because almost all of the world's web sites in the Alexa monitoring range, and the daily update of the list is too intuitive, coupled with the public at any time available to check the data, so that the Alexa rankings on many occasions appear very sensitive.
Since 2003, the domestic Internet industry has started to warm up, once the capital of the relentless dumping site CEOs have regained the feeling of the past, but VCs are indeed more than the last. The dot-com bubble was a lot smarter, and when it came to attracting investment, it had to first reassure the capital, and even if it was already on the market, it needed to give shareholders some clear data to show their worth. At this time, Alexa rankings were introduced to the domestic, although the site has never been the attention of the people, but it provides the list is quickly showing great commercial value.
Some websites, while submitting business plans to investors, start intentionally or unintentionally to mention their Alexa rankings, according to some of the default logic, ranked 300 sites around the world is always more than 1000 sites outside the world more valuable. Finally, in the "insider" word of mouth, Alexa rankings in China on the Internet fame, and at this time, far away on the other side of the Alexa but its Chinese fans do not know.
When Alexa rankings began to get the industry's general recognition, finally someone began to use Alexa rankings to create "value". A large number of personal site webmaster began to study Alexa ranking rules, the internet began to spread a variety of Alexa for the cheat tools, many websites began to publish the Alexa cheat methods of the article, various forums began to appear on the Alexa rankings of the discussion. For a time, China's internet industry has been scraping the "Alexa cyclone."
At the end of 2003, in response to the Alexa fever, some in the industry began to question the credibility of Alexa rankings, and uncovered a lot of Web site cheating "insider." In fact, there are quite a number of users of the Alexa rankings expressed suspicion, because in 2003 most of the time, Alexa ranked the world's third and fourth sites are two South Korean sites, Alexa explanation is that the Internet users in South Korea installed Alexa toolbar ratio is higher, So the Alexa in South Korea to get the sample data is relatively high, which led to South Korea's two portals into the world's top five.
However, the Alexa explanation seems to be more and more black, because the Alexa toolbar is always only English version, there is no Chinese version, not Korean version, if you want to say that the penetration of the bar, Europe and the United States should be the first. Some netizens believe that the South Korean website is by cheating to promote the site rankings, more people think, since the Alexa did not introduce a localized version of the toolbar, Asian countries should resolutely boycott Alexa rankings.
In any case, from the pursuit to the stick kill, the industry's focus on Alexa has not been reduced. Into the 2004, so there are a lot of web site owners are racking their brains to think of their own site Alexa rankings "do go up", And previously revealed Alexa Gunners also continue to have a variety of purposes shelling the Alexa. After 11 holiday this year, the domestic site in the Alexa rankings suddenly experienced a large-scale collective landslide, all kinds of rumors began to circulate in netizens, a more popular saying is, Alexa finally began to adjust the algorithm to counter more and more Chinese The website, people regard this as the Alexa "justifiable defense", after all, a list for the biggest selling point of the site depends on the existence of its ranking fairness.
All kinds of rumors can not solve the audience on the Alexa credibility of the doubt, because the domestic mainstream media has never been to the Alexa technology in-depth analysis and reporting, and in the following introduction, the reader can see, perhaps South Korea's ranking is too high for another reason, And the so-called Alexa cheating means is not only a number of previous media coverage of the refresh so simple, and, Alexa also has its own very smart cheat-proof means. However, this article mentioned cheating means and the internet can be seen everywhere, "cheat book" has the essence of different, so, Alexa to the extent to resist this unusual way of cheating is hard to say
Alexa Tool Bar DNA
To fully understand whether Alexa rankings are trustworthy, Alexa must be technically on the global web site traffic monitoring for a comprehensive anatomy, of course, Alexa never published its own technical details, the reporter decided to "the way, but also the body", since the Alexa claimed that its data source is the tool bar, The journalist decided to start by cracking the tool bar.
Commissioned by the press, the circle of well-known web technology experts Kobayashi, with nearly a night's time, the Alexa toolbar and its data to the Alexa returned to the detailed analysis, to obtain a lot of valuable firsthand information. A few years ago on the Alexa study of Kobayashi that the recent analysis revealed some of the Alexa more secretive technical details.
Kobayashi told reporters, now the latest version of the Alexa toolbar operating mechanism and has not changed much, whenever the user with the Alexa toolbar in IE browser to open a new page, Alexa server (data.alexa.com) will receive encrypted packets, The core information in this packet is more than 10 parameters, including the current page address, page open time, client display resolution, Alexa Toolbar version number, the user is "Amazon" users, and so on, which has an important hidden parameters, by Kobayashi analysis, Considered the Alexa for each installed toolbar automatically generated ID number, this number should be the only global. Alexa can through this ID for each feedback packet of the issue of the unique identification, which is to solve the problem of PV repeat calculation and prevent the same user multiple times to refresh cheating an important means.
Kobayashi told reporters, from the current results of the study, any one of the assembly master can easily grasp the Alexa toolbar back to the secret of the packet, if this person is also a network programming master, then to the Alexa cheating on the more easy. From the reporter later on a shanghai Alexa cheat expert interview, small forest analysis is completely correct. The cheater is a senior web development engineer, and its use of the means and Kobayashi's analysis also basically coincide-to write a Alexa toolbar Return code generator, batch generated Alexa can identify code string, And then use the virtual multi-user way to send back to data.alexa.com, so that you can deceive the Alexa server, let it mistakenly think that these data are sent to different users (with the cheater I agree, this topic in the final public reporter and the cheater online chat record).
Kobayashi believes that the realization of this method of programmatically simulating the cheating of multiuser access the most important link is that Alexa used to uniquely identify the identity of the ID number of the generation algorithm to crack, which requires a sufficient number of Alexa tool to sniff, crawl its data packets for quantitative analysis of the algorithm. But Kobayashi also pointed out that these work for a master programmer, it is not really anything, but the final cheat of the implementation still need to simulate a fast enough ID number generation program, which may be difficult, but according to Kobayashi estimates, the domestic can do these things a few people, It's just that the Internet is a community of experts who rarely do it.
From the above technical analysis, the Alexa server to do every day is to continue to receive the global user returned packets, the more than 10 parameters are extracted and written to a specialized database, then the data collected on that day are analyzed and calculated at a specific time, and the site rankings are updated with new calculations. According to reporters, the analysis of the database will be kept for at least three years, because in the Alexa site on each site rank Change trend Chart can provide up to three years of data changes.
After deciphering the DNA of the tool bar, the reporter also discussed with some friends the Alexa toolbar in the global distribution. According to Alexa, the tool bar is its only source of information, then the distribution of the toolbar in the global user has become another factor that can affect the Alexa rankings. If the toolbar is really the only source of Alexa data, you can imagine, when the Chinese netizens did not install Alexa toolbar, Sina Sohu such a portal site in the Alexa rankings will not see the shadow, but the second half of this year, Sina and Sohu have followed Google to the fourth and fifth place in the world, this seems to indicate that the Alexa toolbar has a high penetration rate in China, otherwise, Sina, Sohu and immediately after the occupation of the Alexa Global 500 in nearly 1/3 share of the Chinese site has been suspected of cheating.
Fortunately, a friend of the journalist, Cao, provided reassuring data. As an expert on Web traffic analysis, CAO provides long-term flow monitoring and statistical analysis Services for up to 2000 domestic websites in www.tong123.com. Cao traffic analysis and Alexa different, alexa is not monitored on the site to do anything, and tong123.com to the site of the page embedded way of third-party traffic statistics. Commissioned by the reporter, Cao temporarily in its data sampling analyzer added to the Alexa toolbar monitoring. After a week of data statistics, Cao concluded that the www.tong123.com system monitored 2000 of all users of the site, Alexa toolbar installation rate of about 1.5%.
Just before the press, Cao also sent to reporters to add to this result, he believes that, because the tong123 system using the cumulative average of the calculation method, Alexa toolbar actual installation ratio should be higher than the current data, because the monitoring item just added soon.
Because the monitoring scope of the tong123.com is 2000 kinds of websites, basically excluding because the monitor object is less likely to cause the user choice tendency, its credibility is relatively high. Even with the 1.5% underestimated installation rate, the impact of Alexa on domestic users is staggering, and if domestic internet users are counted at 90 million, the Alexa toolbar may have more than 1.3 million users. According to Alexa's 10 million-or-so tool-piece global download total, China's internet users seem to be more focused on Alexa, which may also be as a domestic website this year in the Alexa rankings in the overall rankings to improve an explanation.
Since the distribution of the toolbar will greatly affect the Alexa monitoring results, then the Alexa rankings may appear a large geographical relevance, if so, Alexa authority is really questionable. Things are far from so simple, on the Alexa site, the reporter saw the top 100,000 rankings of global Web site quotes are 499 U.S. dollars, obviously, on the other side of the ocean, or someone approved Alexa ranking data, or even spend money to buy that ranking data
Bogus technical privacy
In the Alexa all kinds of public information or netizens to its discussion, no one explicitly put forward the Alexa in addition to the tool bar there are other access to public network traffic monitoring means, and long-term for many well-known web site to provide technical support of the Kobayashi, through the log analysis of these sites, That Alexa also has a lot of technical privacy. Unable to obtain the Alexa official response, we have to say that these may be used by the technical means is "false."
To further explore the Alexa privacy, we have to look at the process of web browsing first. When the user opens IE browser, in the address bar input a string of URLs and returns, some contain the HTTP request packets are sent out, as with other traffic, these packets will also pass through the user network gateway, was routed to the public network, after a telecommunications room, Eventually transferred to a server that provides DNS resolution, the IP address of the destination URL is transferred to a path that can eventually be routed to the destination IP. From the process of sending HTTP requests, if Alexa is sniffing all DNS servers around the world, it will be possible to get a very close to the facts of the global HTTP request data, which makes it very clear which users are concerned about which sites.
But sniffing around global HTTP requests is almost impossible for Alexa. 10,000 step back, even if the Alexa can crawl the data, it will not have the ability to calculate based on this data. However, the Alexa rankings show some "jitter" signs seem to tell people, in addition to the sidebar, Alexa does have the use of other technical means of suspicion.
In the summer of 2004, Kobayashi found that the site rankings of Hong Kong, China, abnormal upgrade, such as "sun" such a small site, incredibly can surpass many mainland big website rankings. As a result of telecommunications gateways and simple traditional systems, the mainland and Hong Kong netizens generally do not do "cross-regional exchange of visits", and the Hong Kong Internet users installed Alexa toolbar is no more than the absolute numbers. From the analysis and calculation of the previous article, the number of Alexa toolbar installed by the mainland netizens is probably not lower than the amount of Internet users in Hong Kong, in this case, the Hong Kong web site in the Alexa rankings of the general substantial increase is unreasonable.
After two months of high ranking, the Hong Kong site's ranking began to fall gradually, but there are still some sites ranked higher than its actual position. Kobayashi believes that this phenomenon and South Korea's Web site in 2003, there is a lot of similarity, and South Korea's many sites are still high ranking. This phenomenon if using the Alexa toolbar penetration rate to explain is unreasonable, but if Alexa added another sampler to explain, it is easy to make sense.
According to Kobayashi inferred that Alexa may be in different parts of the world set up a number of sampling machines, through the cooperation with some telecommunications agencies or bright or dark, and even in the vicinity of the Telecommunications gateway network sniffing, get some "compensating" sampling data. The purpose of these sampling data is to take care of the sampling effect of Alexa's "low installation rate of tool bar". For example, if the Alexa think that the South Korean Internet has been very popular, and Korean users have not installed Alexa toolbar habit, it may try to install some sampling in Korea, to make up for the lack of sampling in Korea with the tool bar. The same situation may also lead to a surge in the rankings of Hong Kong's web site just after setting up a sample. After a period of investigation, Alexa will gradually adjust the number of samples in each region to get what it thinks the most reasonable results.
Even Xiao Lin also believes that this year, the mainland rankings of large-scale upgrade also has a sampling machine at the mischief, and the national day after the general decline in the rankings of the mainland site may be because Alexa after a year of investigation, readjust the proportion of the Chinese mainland sampling results.
Of course, Alexa to maintain its ranking authoritative and impartial, not only to improve the flow of data sampling process, but also in the prevention of cheating to make great efforts, those anti cheating technology is Alexa's secret. So alexa in the end is what method to prevent cheating? In fact, for the former reporter mentioned in the programming simulation Alexa tool to return packets of cheating, Alexa almost no good way to prevent, even to detect such cheating means are more difficult.
In fact, for the current online cheating means, Alexa has enough immunity. Generally speaking, because most of the search engine can not handle the JS script, so for the search engine cheating is often written in JS script malicious code to achieve, and Alexa and Google's crawling program is currently the only way to identify the JS script such programs.
Kobayashi in the traffic monitoring of several large websites, Alexa enabled a called Ia_archiver robot program, this robot program similar to Google and other search engines use the Spider program, specifically on the internet crawling, spying on each web page traffic information. Especially when a website's traffic exceeds Alexa set threshold, Ia_archiver will immediately crawl to the site of the server, analysis of the flow of this site is normal, there is no cheating behavior. According to Kobayashi's monitoring of Ia_archiver, the robot program has been able to identify most of the traffic based on Web server-side cheating. But now the industry's awareness of ia_archiver is generally inadequate, the reporter is only on the Alexa site to see a help page on the Ia_archiver robot program, and the domestic know that the robot program is not many people, more lack of relevant technical research.
Trust Alexa?
Alexa rankings for the credibility of different people have different views, engaged in Web technology research and application of the industry are more inclined to say: Alexa ranking is not necessarily absolute accurate, but it is relatively credible. The following reporters cite a number of examples of Alexa rankings from different aspects of the credibility of the comparison.
It must be admitted that, although Alexa used a lot of technical means to improve the effectiveness of Web site traffic monitoring data, but because of some of its inherent technical flaws, it is inevitable that a small number of Web site rankings results in some problems, although these problems to a large extent to blame the site domain name structure of the unreasonable.
For example, Alexa ranking is for the URL address, and did not consider the IP address factors, which greatly reduced the Alexa ranking calculation difficulty, but also inevitably produced a deviation. In general, Alexa is only interested in the level two domain name, and for the three-level domain name of the traffic is included in the first level of the domain name of the statistics, for the comparison of large Web sites, this is the right strategy, but for a different user distribution of the level three domain name of the site, Should its traffic statistics add up the traffic for all the child users? This is obviously going to vary, A very obvious example of this kind of unreasonable statistics is cninfo.net, I believe that the old netizens in China still remember this domain name suffix, which is the National public info Port general two domain name, for example, Sh.cninfo.net and gd.cninfo.net respectively are Shanghai and Guangdong two sites, and Alexa will its unified A subdomain for www.cninfo.net. As a result, Www.cninfo.net has become the 40th-most-ranked website in Simplified Chinese, although it is not a site that can be accessed at all.
Another challenge to the Alexa ranking credibility is the huge difference in traffic characteristics of different types of websites. For example, the portal site and professional web site or professional forum traffic is almost no comparability, because different types of users at different times different occasions to visit different types of sites, their browsing habits will be very different.
Even if the two sites according to the Alexa algorithm of the comprehensive flow assessment values are identical, it can not equate the impact of two Web sites. For example, a site with a reach value of 1 1000,PV and a site with a reach value of 100,PV 10 should be the same, but in most cases the latter's content will be more valuable than the former. Because browsing the previous site users are only read a page to walk away, that the site may not be attractive; then the average user of the site would have to look at 10 pages to leave, and those users would approve the site's content. Of course, the more extreme situation will appear, such as the former only one page, but the content of the page is very rich, and the latter each page content is very small, or to the user set a lot of unnecessary jump, even the latter may be a novel serial station.
The existence of these complex situations makes it hard to believe that Alexa can use only reach and PV the two values of all the sites on the Internet are divided into measured. However, if we look at Alexa from a different angle, we will find that there is reason for its existence, even it may indeed be trustworthy to netizens.
If we only consider Alexa as a Traffic analysis service tool, its value will be highlighted. The Alexa Web site provides a very intuitive flow trend statistics chart function, which in many other cases is to be paid to obtain services, and Alexa just as a free public service.
Fig. 1 is the ccw.com.cn of the World Network (rank) in the year, and from the chart, it can be seen clearly that the website November 3, 2003 ~2004 year November 3 ranking from about 7,500 to 2000, And three of the very sudden fall curve of the corresponding date is the spring Festival, 51, 11 of these three long vacation. As we all know, network users mainly concentrated in the IT industry, the three long vacation caused by the fall curve is very faithfully recorded at the time of the Netizen browsing-because the holiday, most users go to travel or at home to rest, the Internet users greatly reduced, so the flow down, the ranking of the site also decreased. And after the long vacation, it people back to the unit, the first day of work will always think of the internet to see the latest information on the industry. So after the long vacation, the flow of ccw.com.cn instead of a different degree of improvement before the holiday, reflected in the rank curve, is three large trough after the height of the recovery and continuous climb.
In fact, if the site can be detailed analysis of the trend of change in traffic, people can also identify those who cheat the site, because cheating site traffic changes are often abnormal, and its normal web site flow change curve must be different. Figure 2 and Figure 3, respectively, are 265.com and dsdiy.com of the six months of traffic ranking trend change chart, 265.com is well-known web site navigation station, the last six months of the global ranking has been stable in the 70~120 name. The rank change curve of dsdiy.com is questionable, in the first week of September, the site's ranking from 100,000 outside the world to jump to 200 or so, the following one months, the site's ranking has been wandering between the 100~400 name, and on the first day of the National Day holiday, the site's ranking unexpectedly jumped to 30 or so, to the In the second week of October, the site's rankings were quickly dropped to 100,000, no longer seen on rank graphs, websites with similar rankings, sinapet.com and haohz.com, and so on.
As a Web Access monitoring tool, Alexa recorded the site's real traffic at the same time, but also recorded those cheating Web site flow changes. In this sense, Alexa is clearly also credible, the key is that our choice of perspective must be able to play the role of Alexa.
No matter what the purpose, people betray Alexa always hope to get benefit from it, the more in-depth research on Alexa, the more profound understanding of Alexa's value, the more can benefit from it