On the principle and realization of pseudo-original system

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

I wrote an article last week let your station and Sina's news data to keep in sync, some netizens have interest, so I decided to share with you the false original system mentioned inside, introduce the principle of its realization, this system in my Sisyphus studio also introduced.

Search engine is still a machine, by changing the title, replace some words, upset some chapters, insert some links and other means, can achieve the purpose of false original, the current online also has a similar pseudo original tools, but also need manual operation to generate, so I want to do a fully automated, unmanned automatic false original system, Combined with the automatic acquisition program, we can realize the original process of collecting-> storage->, and the whole process realizes unmanned management and has real-time nature.

Anyway, to change the word without affecting the semantics of the article, the better way is to use synonyms for substitution, so I thought of the first step, is to establish a thesaurus, search the Internet in this database without fruit, decided to find relevant sites for collection, found that PowerWord can well meet my requirements, through collection, Set up a thesaurus, tens of thousands of data.

Then is the keyword replacement, then how to replace it, and what? My idea is to first participle of the article, divided into several phrases, and then take longer than two characters, in the thesaurus to search, if there is, then replace, I use Python to implement this process, In addition, to accelerate the speed of synonyms, you can use Key-value for storage. Some key code is as follows:

def getnewword (text,list):

CxN. Execute ("SELECT ID from tool_words where name= '%s ' limit 1"%text)

Result=cxn. Fetchone ()

The If type (result) is not nonetype:

CxN. Execute ("SELECT name from Tool_wordslike where wid=%d order by rand () limit 1"%result[0])

Result4=cxn. Fetchone ()

If Type (RESULT4) is not nonetype:

LIST[TEXT]=RESULT4[0]

def cuttest (Text,flag):

list={}

Wlist = seg. Cut (text)

Wlist. Reverse ()

Result= ""

For TMP in Wlist:

If Len (TMP) >1:

If flag==1:

Getnewword (Tmp,list)

If flag==1:

Result= ""

for k in list. Iterkeys ():

Result+=k+ "," +list[k]+ ";

Else:

Result+=tmp+ ";";

return result

However, after all, the pseudo original system, is also a program, it is certainly impossible to fully guarantee the semantics of the improper, the flow of statements, mainly for those who do garbage station Tatsu people, haha, I remember my site has a transformation after quite funny, http://www.xxfsw.com/ Show24047.html, Nobel laureate in physics, Guenzburg died, the result of the death turned into a dead, I have no words. Of course, apart from the substitution of synonyms, there is also the reversal of the paragraph, insert links, and so on, these are easier to achieve, I will not whisper, we choose according to the implementation of the situation, and then I also thought of some methods, can realize the search engine rendering using false original content, to achieve the user to provide false original content, so as to achieve the purpose, And do not affect the user experience, just do not know how big the danger, will not be manually detected by Baidu.

As a result, after such a toss, Baidu Spider came to your station, a big surprise: Ah demo, the content of this article has not seen Ah!

This article by Sisyphus Studio (Beijing website Construction http://www.beijingjianzhan.com/) Starting, reproduced please specify, thank you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.