Example explanation: A detailed tutorial on compiling the collection rules of Jieqi Novels

Source: Internet
Author: User

AddCollectionRules
Rule Description
System Default variables: <{ArticleID}>-ArticleSequence Number, <{chapterid}>-Chapter sequence number, <{subarticleid}>-Article subsequence number, <{subchapterid}>-Chapter subsequence number.
The system tag * can replace any string.
System tag! Can replace any string except <and>.
System tag ~ Can replace any string except <>.
The system tag ^ can replace strings other than numbers and <>.
The system tag $ can replace the numeric string.
Collection rulesThe content to be obtained is replaced by more than four system tags, such !!!!

Basic settings

The identifier added in configs \ article \ collectsite. php can be entered at will. Generally, it is abbreviated as the domain name of the site to be collected, which is differentiated from other rules. Example: feiku

The name of the website to be collected. Example: Apsara Stack

The address of the site to which the website address is collected. Example: http://www.feiku.com

The subserial number calculation method is not required. I leave it blank here.
Supports the use of the four arithmetic operations marked by <{ArticleID}> (+ plus,-minus, * multiplication,/division, % remainder)

The subsequence calculation method is not required. I leave it blank here. (Who knows how many books he puts in a folder? He does not follow the rules. I cannot collect the books)
Supports the use of the four arithmetic operations marked by <{ArticleID}> (+ plus,-minus, * multiplication,/division, % remainder)

Leave the proxy server address empty

Proxy server port

When the existing chapter cannot be matched, whether to clear all and re-collect is optional based on your needs

Whether to set the collected articles to full-text by default. Whether or not to choose based on your needs. If "yes" is selected, whether the articles are serialized or completed, you will see the full-text on the website. We recommend that you select "no"

Send the http_referer flag, which is used to break through the Anti-DDoS collection settings. If yes is selected by default, I do not know what to use. If yes is selected, I will try again first.

The recipient's webpage encoding (automatically detects gb2312 utf8 big5). The default "Automatic Detection" encoding is different from that on this site and will be automatically converted.
Document Information Page collection rules

The document information page address is the book information page url, and the book id is replaced by <{ArticleID}>. Example:
Http://feiku.com/Book/ <{ArticleID}>/index.html

The collection rule of the Article Title requires that the source file of the webpage be viewed. If not, it can be stopped. view the source file on the information page, and find the position of the article title in the source file (we use Apsara stack as an example, that is, the position of the article title in the source file on the chapter information page ). here we take "my pretty girl" as an example to findCodeYes <Div id = "crbooktitle"> <SPAN class = "booktitle"> my Missy </span> </div>. Copy the above Code to the document title collection rule. in that box, then, replace the real title of my beautiful lady !!!! Of course, you can also replace it with other replacement symbols, such as ****. However, the smaller the scope, the better (Habit problem, of course, only the title of the article can be collected here, but other items you don't want will be collected ).

Collection rules <li class = "L6"> <a href = "/author/WB/144238.html"> Li xingyu </a> </LI> content, yes !!!! Instead, 144238 is only useful for this article. Other articles have other numbers, so they are replaced by any number string $. Therefore, the author's collection rule is
<Li class = "L6"> <a href = "/author/WB/$.html"> !!!! </A> </LI>

<Li class = "L2"> <a href = "/book/ln/133.html"> city </a> </LI> the rule here is <li class = "L2"> <a href = "/book/ln/example .html"> !!!! </A> </LI>

Write the article type ing relationship by yourself. fantasy => 1 | Fantasy => 1 | martial arts => 2 | Xianxia => 2 | romance => 3 | city => 3 | science fiction => 7 | flexibility => 8 | game => 6 | competition => 6 | history => 4 | military => 4 | meiwen => 10 | same person => 9 | biography => 10 | famous books => 10 | note => 10 | joke => 10 | foreign countries => 10 | classical => 10 | children => 10 | detective => 5 | administrator => 10 | fashion => 10 | English => 10 | computer => 10 | study => 10 | law => 10 | Other => 10
"=>" Is used to separate the type name of the Peer type from the serial number of the current site type. "|" is used to separate the two types. "default" indicates the default Type correspondence.
The ing between types and serial numbers of this site is as follows:
Fantasy magic => 1 | martial arts truth => 2 | urban romance => 3 | historical military => 4 | detective reasoning => 5 | online games animation => 6 | sci-fi novels => 7 | terrorism and flexibility => 8 | prose and poetry => 9 | Other types => 10

Keyword collection rules: search keywords for the leading character in the Code near the Keyword -- my beauty Li xingyu beauty city <br/> here, "my beauty Li xingyu beauty City" is replaced. the result rule is the keyword used to search for the protagonist -- ***** <br/>

Content Overview collection rules

'taobao' and 'taobao' elder sister, do you want to leave me alone? I beg you ~~!
Liu Xing, a leader with hundreds of millions of people, does not want his family's big company to abandon his life. Instead, he chooses to become an ordinary white-collar employee in a small company.
A hero saved the beauty of a restaurant made him meet a big beauty. This beauty turned out to be the daughter of the boss of Liu Xing's company in Shanghai head office, in other words, this is his lady.
but on the surface, the pretty and elegant lady has some unknown side. It is really a matter of life!
give me a nanny? Miss Da, are you kidding me? You don't do anything. Are you still a nanny?
the boss has two daughters? In this case, the girl who is difficult during the day is Miss 2nd?
en? What? You decided to live here too? Ah! Don't bother me ~~! One is enough for me. It's really a big girl!
Miss 'day' looks elegant and gentle, but she is very confused. She looks pretty cool, but she is very hot, and she can't afford to hear from her sisters, I live in my house this time ...... It's really fun!
the girl who wants to soak in the beauty is given by the big girl and the little girl! Ah ~~! It also makes people unable to live ~~!

As mentioned above, the result rule is
*****


note: some code in the source file contains line breaks or something. When you copy it, you must replace the content to be collected with a replacement character. Do not change the format, you return a few items and connect them with the previous ones.

Cover image collection rules <Div id = "crbtlbookimg"> </ div> the result rule is <Div id = "crbtlbookimg"> </div> width =" 100 "Height =" 125 "can be converted to width =" $ "Height =" $ "However, if the cover pictures of the collection station are all the same size, you don't have to worry about it. when looking for the position of the cover image in the source file, you can go to the information page to view the image attributes, view the image name, and then search in the source file.

Find an article with no cover picture in the filtered cover image, and then read what is in IMG src = "and". Just write it to it. Here is/img/noimg.gif.

The directory Page Link collection rule is not written because we did not write the face serial number. here we can use this rule to collect the subserial number. in the source file on the document information page, find the code nearby the directory page connection (usually close to reading by clicking here, and the code near the source file in Apsara stack is [Click to read)
[<A href = "/html/book/168/144238/list. shtm"> <font color = "# cc0000"> click to read </font> </a>]
The content 168 and 144238 can be replaced by any number, so the result rule is:
[<A href = "/html/book/$/list. shtm "> <font color =" # cc0000 "> click to read </font> </a>]
The content collected by this rule will be used as a flag <{indexlink}> (this can be used to replace the subserial Number shown below, it can be applied in the following "article directory page address"

You need to find a full-text collection rule. In the source file of the Information Page, find the code near the writing process (with the process "complete ")
<Li class = "L3"> Writing Process </LI>
<Li class = "L4"> finished </LI>
The writing process is used !!!! So the result rule is
<Li class = "L3"> !!!! </LI>
<Li class = "L4"> finished </LI>
This rule is regarded as full because it does not save the collected content, but matches the content. If it does not match, it is considered as a serialization.
Document directory page collection rules

The document directory page address is the directory page address.
Http://feiku.com/html/book/168/144238/List.shtm
However, the subserial number of article 168 is replaced by <{indexlink}> <{ArticleID}>. The result rule is
Http://www.feiku.com/Html/Book/ <{indexlink}>/<{ArticleID}>/list. shtm

View the source file on the directory page, and find the Code <Div id = "nclasstitle"> the text in the text is what we want to collect. Use it !!!! The result rule is <Div id = "nclasstitle"> !!!!

Chapter name collection rules find chapter name near code update words: 3402 "> Chapter 1 elephant ~~ Elephant ~~! </A> </LI> Chapter 1 elephants ~~ Elephant ~~! Is the content to be collected !!!! Or *** replaces 3402 with any number with $. The result rule is to update the number of words. "> !!!! </A> </LI>

Chapter sequence number collection rules find the code near chapter sequence number
<Li> <a href = "3320510. shtm" Title = "updated on:
3320510 is the sequence number of the chapter to be collected, which is replaced by $. The result rule is
<Li> <a href = "$. shtm" Title = "updated on
Chapter content page collection rules

Chapter content page address
Http://feiku.com/html/book/168/144238/3320510.shtm
In section 168, the subserial Number of the article is replaced by the preceding <{indexlink}> instead of the serial number of the Article 144238 with <{ArticleID}> instead of the serial number of section 3320510 with <{chapterid}>. The result rule is
Http://www.feiku.com/Html/Book/ <{indexlink} >/< {ArticleID} >/< {chapterid}>. shtm

The chapter content collection rules section content is near the code. It's too big, so I am lazy.
</Div>
<Div id = "booktext"> Chapter content
</Div>
In the above <Div id = "booktext"> some books and chapters do not contain booktext. For example, some <Div id = "ssmmkkg">
</Div>
<Div id ="
Yes, so I use him. The chapter content is replaced by ***. The result rules are as follows. Let's study by yourself.
</Div>
<Div id = "***** </div>

all the content you don't want in the above Code can be written here. here are some of the items I have removed. You can set them as needed

cmfu.com
multiple filtering rules are allowed, each rule must have one line. You can use replacement labels, such as:


whether to collect the image content to the local device. Do you have to select the image content as needed? (You can choose the image content below. It's exhausted. It's a flash)
collect local image processing, the GD library is required to support
whether to enable image processing or not. Whether to enable image processing has a certain impact on the collection speed
whether to add a watermark to the image collection or not
whether to add a watermark in this module in parameter settings, the method for adding watermarks to images uploaded manually is the same.
background color of the image to be collected
if this field is left blank, the system automatically determines
to erase the original image watermark by region
to erase the content of the area according to the rectangle coordinate in the image. A rectangle is represented by four values (separated by ","), which are X, Y, and X, Y in the upper left corner of the rectangle. When X and Y are greater than 0, it indicates how many pixels are added from the top left corner of the image. When X and Y are less than 0, it indicates how many pixels are reduced from the bottom right corner of the image. Multiple Regions are separated by "|.
for example, this item is set to "100, 100 |-,-50,-1,-1", indicating the rectangular area of * 50 in the upper left corner and lower right corner, respectively.
remove the original image watermark by color
generally, the watermark color is different from that of the image background and content. You can set multiple watermark colors to be erased, separated by "|, for example, "# fafafa | # ff0000 | #00ff00"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.