Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
This is the last step of the crawl module, in the "Knight Station group system to a Health network health care Column Crawl module (a)" and "using the Knight Station group system to a Health network health care column Grab module (ii)" said the previous steps, do not know can go to see again, for these few days of achievement small pride one, this inside has two places to mix, The first is "paging extraction rule" is to control the paging link, the second is the following "Content model extraction rule" is to control the title and body, in fact, we need three rules, can be visual can also be regular, everything to get to the required content, just to be divided clearly.
1, Title extraction
After entering process 3, the content model selection: Heading into the unnamed rule
New regular formula, test expression, you can see the expression is valid, has been extracted
After saving, go back to the main page
So we get the title.
2. Content Extraction
Through the "Knight's regular Test tool" provided by the knight, we test the regular formula established to obtain the content, extract the success, and copy the regular formula
Go to the main page:
Content Model Select Body
Into the extraction rule, where I changed my name.
New regular formula, Test succeeded
Save return, back to main page
3, Paging extraction
Enable paging crawl, into the paging extraction rules, this time do not mix, now we want to deal with the problem of paging, rather than the title body, so, in the upper select to enable paging, at the same time into the paging extraction rule
Here, a sad question is found through the source file, the paging file, God, unexpectedly is relative address, I dizzy, the tutorial here is according to the regular extraction, Sohu Female column is absolute address, if here also copy words, to my regular level, can only extract the relative address to, no way, take the visual extract it, This can be converted to an absolute address, according to the actual situation, do a good job of screening
Well, test it, and it's done, clap for 5 minutes,
OK, step by step save, then save as a module:
By setting up a task, you can see that the article Library has crawled to:
Keep clapping for yourself for 5 minutes ~
Through a few days ' tutorials in the A5 Warrior Software station www.xiake5.com, the Knight Station Group Software Learning, so I deeply realized the power of the knight, instantly have a weapon in the hand, the world I feel, narcissistic, although the first production of the crawl there are imperfect place, but also need to replace the library landscaping, But from the ground zero, let me also have a small success of the feeling, oh, thanks to the knight, let me have the opportunity to operate multiple sites, units in more and more things, the house has to decorate, do not know their own tutorial posts can be pasted a few, God bless it.