Zheng @ playpoly Sr 20091127
Parsley is used in scrapy.
Parsley is an interesting little thing. It uses CSS, XPath, regular expressions, and JSON to describe how to extract structured data from webpages. It is estimated that crawlers/spider will define a set of similar templates. However, parsley also helps you implement the specific implementation in various development languages.
Basic Facts
Parselets is a segment (snippets) Written in parsley language ).
You can think that a parselet defines a set of actions to describe howCodeAccurately extract data, such as where the title is, how to get the title link, and how to extract the number of comments.
Parsley has various language implementation packages, including Ruby, Python, and C/C ++.
Pyparsley is the corresponding Python library.
Sample Code and Result
See: http://parselets.com/parselets/yc/15,
The Code on the left is what we usually call a template, and the result on the right is the extracted structured data.
So how does it become a reality?
Implementation
Install parsley and then uninstall.
Zhengyun 20091127 Beijing