Htmlparse:a Powerful Go tool to parse a HTML document
Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed. Htmlparsehttps://github.com/tancehao/htmlparse===htmlparse is a go tool for parsing a HTML document. It converts a HTML document into a tree. Each node in the tree is either a tag or a text. Given a tag, a programmer can easily get its original infos, including it metadata, its children, its siblings and the Te XT wrapped in it. One can also modify a tree, by writing something to or delete a tag. It can be used in web crawlers, analysis, batch formating and etc.---* [Install] (#install) * [Api] (#api)---# # Install ' sh Go get-u github.com/tancehao/htmlparse ' # # api### parser* # # # # # # # # () *tree the only one method needed to convert the O Riginal bytes to a tree. Example: "Go Import (" Github.com/tancehao/htmlparse ")//... content, _: = Ioutil. ReadFile ("index.html") Parser: = Htmlparse. Newparse (content) Tree: = parser. Parse () ' # # # # # tree* # filter (filter map[string]string) []*tag Find Some tags from the document with a filter, which is A key-value formated Map. Example: "Go products: = tree." Filter (map[string]string{"TagName", "div", "class": "Product"}) ' * # # # # # conditions Map[string]string *tagsets Similar to the Filter method, except, it return value is a tagsets type who has some useful methods. * # # # # string () string Return the original document. * # # # Modify () string Return the modified document. # # # tagsets* # # # Find (map[string]string) *tagsets Return A set of tags from a set of tags or their children using a filte R. It can used with a chain style. Example: "Go photos: = tree. Find (map[string]string{"tagName": "Div", "class": "Product",}). Find (map[string]string{"tagName": "IMG", "Class": "Photo",}) ' * # # # # All () []*tag Get all the tags in this set. * # # # # GetAttributes (attr ... string) []map[string]string Get some attributes from each tag in the this set. Example: "Go inputs: = form. Find (map[string]string{"tagName": "Input"}). GetAttributes ("type", "Name", "Value", "Data-id") for _, Input: = range Inputs { Fmt. Printf ("%s,%s,%s,%s\n", input["type"], input["name"], input["value"], input["Data-id"])} "* # # # () string### T ag* # Find (map[string]string) *tagsets Find the tags from a tag ' s children. * # # # # getcontent () []byte Return the original bytes of a tag in the document, the tag's metadata is included. By design, each tag or text have a pair of pointers which determined its absolute position in the document. So whenever one gets the original content of a tag or text, it just fetches the Subslice Document[head:tail], which can No more faster.* # # string () string Satisfy the Stringer. * # # # # Extract () []byte Filter the text from the original data of a tag. Tags wont ' t be included. * # # # # index () Int64 Get The index of a tag in its among its. * # # # # Prev () *tag Get the previous tag of a tag under the same parent. * # # # # Next () *tag Similar to Prev (). * # # # Modify () string Return the modified data of a tag. One should call this after writing to a tag. * # # # WriteText (position int64, data []byte) (*text, error) Write text into a tag at the given index in the tag ' s children. Example: "Go names: = products. Find (map[string]string{"class": "ProductName"}) fmt. Println (names)//prints://<div class= "ProductName" >Product1</div>//<div class= "ProductName" > product2</div>//<div class= "ProductName" >Product3</div> for _, Name: = range names {name. WRITETEXT (0, []byte ("[ONSALE]")) FMT. PRINTLN (name. Modify ())}//prints://<div class= "ProductName" >[onsale] product1</div>//<div class= "ProductName" >[onsale] product2</div>//<div class= "ProductName" >[onsale] product3</div> "* # # # WRITETAG ( Position int64, tagname string) (*tag, error) Write a tag into a tag at the given index in the tag ' s chidren. Example: "Go script, _: = Body." Writetag ("script")//if ths position is greater than the count of the tag ' s children, it's ll be set to the last scrip t.attributes["src"] = "http://www.fOo.com "body. Modify () "* # # # Delete () error delete a tag. Example: "Go garbage: = tree. Find (map[string]string{"class": "Advertisement"}). All () [0] garbage. Delete () tree. Modify () ' # # # text* # # string () string smilar to tag.* # # # Index () Int64 smilar to tag.* # # # Modify () string smilar To tag.* # # # Delete () smilar to tag.471 times click
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.