Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Search engine and search engine optimization (SEO) has always been a pair of contradictions, reasonable optimization to help search engine site content identification, to facilitate the promotion of the site. However, there is a positive, there are the opposite, some seoers through a variety of deceptive means to deceive the search engine, in order to achieve increased number of pages and page ranking purposes.
Early keyword stack spam cheat one of the ways, is the Chinese Word library in the word directly with the software pieces into the article, such an article has no practical significance, can only give search engines to see. So for such an article, the search engine is by what way to identify it?
We know that every search engine has web quality monitoring department, for Baidu and other artificial processing of the search engine, users found such a site, complaints to Baidu, Baidu directly blocked this site. But for Google such a stop is also automatic processing of search engines, the keyword piling up the recognition of cheating is more important.
For the identification of the key word piling cheating, the search engine usually adopts the method of statistical analysis.
The search engine first participle the page, Word can be completed after the number of words n and the length of the article L, from a large number of articles in the statistics found that the length of the article L and the number of words n two numbers there is a certain distribution relationship, in general l/n bounded between 4 to 8, the mean value of about 5-6. In other words, a length of 1000-byte article, there should be 125-250 participle, because the composition of Chinese and English words is inconsistent, so in English and Chinese this ratio range will be different. If the search engine found l/n particularly large, then this article exists on the existence of keyword piling phenomenon, if the l/n is particularly small, it may be that this article is composed of some words of meaningless articles.
Further, through a large number of normal article statistics found that the number of the highest density of several keywords and n/l there is a certain distribution relationship between the search engine can be through the Web page distribution and statistical results of the distribution map to compare the existence of a keyword piling phenomenon.
In addition, the search engine will also from the stop word ratio to determine whether the article is a natural article. Stop word is such as "" "" I "" is "and so on in the article commonly used words or words, if the proportion of the text stop word in the normal proportion of the range, this page should be submitted to the Web page Quality Monitoring Department audit.
Of course, there are more algorithms can compare the Web page and natural language articles to determine whether the article is a natural article.
The so-called Magic ruler, Tao, some cheaters have given up the words of the composition of the article of cheating, and instead of the sentence composition of the cheat, cheaters through reptiles or other ways to get online article sentence, and the software will be dozens of articles of a few sentences together into an article. This requires search engines to do semantic analysis to determine whether cheating, but the current research on semantic analysis is still in the research phase, which is the direction of the next generation of intelligent search engine.
However, we still cannot generate the beat of automatic articles, and the generation of articles based on artificial intelligence is still an important direction for human beings to study their own language and their own intelligence. Cheating and cheating will promote the study of artificial intelligence.
If in the end, the software can generate a human can understand the article, this is spam or essence? Can you say for sure that the current level of intelligence is not high enough RSS aggregation article must be spam? However, if such a huge amount of the emergence of the article, how do we face this phenomenon?
Author: Clay Figurine
Source: http://www.nipei.com