After all, the search engine is still a machine. by modifying the title, replacing some words, disrupting some chapters, inserting some links, and other means, you can achieve the goal of pseudo-originality, at present, there are similar pseudo-original tools on the Internet, but manual operations are required to generate them, so I want to make a fully automatic, unattended
I wrote an article last week to keep the news data of your website synchronized with Sina. some netizens are interested, so I decided to share with you the pseudo-original system mentioned in it and introduce its implementation principles. this system is also introduced in my siphus studio.
After all, the search engine is still a machine. by modifying the title, replacing some words, disrupting some chapters, inserting some links, and other means, you can achieve the goal of pseudo-originality, at present, there are similar pseudo-original tools on the Internet, but manual operations are required to generate them. so I want to create an automatic, unattended, automatic pseudo-original system, combined with an automatic collection program, the collection, warehouse receiving, and pseudo-original processes can be implemented, and the entire process is unmanaged and real-time.
To change words without affecting the meaning of the document, a better method is to replace synonyms. so I thought the first step is to create a synonym Library. after searching for such a database online, I decided to search for related websites for collection and found that Kingsoft's word overlord could meet my requirements well. through collection, I established a dictionary with tens of thousands of pieces of data.
Then, the keyword is replaced. how can we replace and replace? My idea is to first split the article into several phrases, and then take the characters with length greater than two, and search in the synonym Library. if there is one, replace it, I use python to implement this process. In addition, key-value can be used for storage to speed up synonyms. Some key code is as follows:
Def getnewword (text, list): cxn.exe cute ("select id from tool_words where name = '% s' limit 1" % text) result = cxn. fetchone () if type (result) is not NoneType: cxn.exe cute ("select name from tool_wordslike where wid = % d order by rand () limit 1" % result [0]) result4 = cxn. fetchone () if type (result4) is not NoneType: list [text] = result4 [0] def cuttest (text, flag): list ={} wlist = seg. cut (text) wlist. reverse () result = "" for tmp in wlist: if len (tmp)> 1: if flag = 1: getnewword (tmp, list) if flag = 1: result = "" for k in list. iterkeys (): result + = k + "," + list [k] + ";" else: result + = tmp + ";"; return result |
However, after all, the pseudo-original system is also a program. it is certainly impossible to completely guarantee that the semantics is inappropriate. the statements are fluent and mainly provided to the people who make garbage stations. haha, I remember my website was quite funny after the transfer, http://www.xxfsw.com/show24047.html, the Nobel physics prize winner of the Russian academician jizburg died, the results of the death into a silence, I have no words... Of course, in addition to the replacement of synonyms, there are also paragraphs that are reversed, inserted links, and so on. this is easier to implement and I will not elaborate on it. you can choose based on the implementation situation, later, I also thought about some methods to implement the use of pseudo-original content for the search engine, and to provide users with pre-pseudo-original content, it does not affect the user experience. I just don't know how dangerous it is. Will it be manually detected by Baidu...
As a result, Baidu spider came to your site and was shocked: Oh, I have never seen this article! Received. If you do not understand anything, add my qq3700004340 to discuss it.