Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
I wrote an article last week let your station and Sina's news data to keep in sync, some netizens have interest, so I decided to share with you the false original system mentioned inside, introduce the principle of its realization, this system in my Sisyphus studio also introduced.
Search engine is still a machine, by changing the title, replace some words, upset some chapters, insert some links and other means, can achieve the purpose of false original, the current online also has a similar pseudo original tools, but also need manual operation to generate, so I want to do a fully automated, unmanned automatic false original system, Combined with the automatic acquisition program, we can realize the original process of collecting-> storage->, and the whole process realizes unmanned management and has real-time nature.
Anyway, to change the word without affecting the semantics of the article, the better way is to use synonyms for substitution, so I thought of the first step, is to establish a thesaurus, search the Internet in this database without fruit, decided to find relevant sites for collection, found that PowerWord can well meet my requirements, through collection, Set up a thesaurus, tens of thousands of data.
Then is the keyword replacement, then how to replace it, and what? My idea is to first participle of the article, divided into several phrases, and then take longer than two characters, in the thesaurus to search, if there is, then replace, I use Python to implement this process, In addition, to accelerate the speed of synonyms, you can use Key-value for storage. Some key code is as follows:
def getnewword (text,list):
CxN. Execute ("SELECT ID from tool_words where name= '%s ' limit 1"%text)
Result=cxn. Fetchone ()
The If type (result) is not nonetype:
CxN. Execute ("SELECT name from Tool_wordslike where wid=%d order by rand () limit 1"%result[0])
Result4=cxn. Fetchone ()
If Type (RESULT4) is not nonetype:
LIST[TEXT]=RESULT4[0]
def cuttest (Text,flag):
list={}
Wlist = seg. Cut (text)
Wlist. Reverse ()
Result= ""
For TMP in Wlist:
If Len (TMP) >1:
If flag==1:
Getnewword (Tmp,list)
If flag==1:
Result= ""
for k in list. Iterkeys ():
Result+=k+ "," +list[k]+ ";
Else:
Result+=tmp+ ";";
return result
However, after all, the pseudo original system, is also a program, it is certainly impossible to fully guarantee the semantics of the improper, the flow of statements, mainly for those who do garbage station Tatsu people, haha, I remember my site has a transformation after quite funny, http://www.xxfsw.com/ Show24047.html, Nobel laureate in physics, Guenzburg died, the result of the death turned into a dead, I have no words. Of course, apart from the substitution of synonyms, there is also the reversal of the paragraph, insert links, and so on, these are easier to achieve, I will not whisper, we choose according to the implementation of the situation, and then I also thought of some methods, can realize the search engine rendering using false original content, to achieve the user to provide false original content, so as to achieve the purpose, And do not affect the user experience, just do not know how big the danger, will not be manually detected by Baidu.
As a result, after such a toss, Baidu Spider came to your station, a big surprise: Ah demo, the content of this article has not seen Ah!
This article by Sisyphus Studio (Beijing website Construction http://www.beijingjianzhan.com/) Starting, reproduced please specify, thank you.