Htmlparse:a Powerful Go tool to parse a HTML document

Last Update:2018-01-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed. Htmlparsehttps://github.com/tancehao/htmlparse===htmlparse is a go tool for parsing a HTML document. It converts a HTML document into a tree. Each node in the tree is either a tag or a text. Given a tag, a programmer can easily get its original infos, including it metadata, its children, its siblings and the Te XT wrapped in it. One can also modify a tree, by writing something to or delete a tag. It can be used in web crawlers, analysis, batch formating and etc.---* [Install] (#install) * [Api] (#api)---# # Install ' sh Go get-u github.com/tancehao/htmlparse ' # # api### parser* # # # # # # # # () *tree the only one method needed to convert the O Riginal bytes to a tree. Example: "Go Import (" Github.com/tancehao/htmlparse ")//... content, _: = Ioutil. ReadFile ("index.html") Parser: = Htmlparse. Newparse (content) Tree: = parser. Parse () ' # # # # # tree* # filter (filter map[string]string) []*tag Find Some tags from the document with a filter, which is A key-value formated Map. Example: "Go products: = tree." Filter (map[string]string{"TagName", "div", "class": "Product"}) ' * # # # # # conditions Map[string]string *tagsets Similar to the Filter method, except, it return value is a tagsets type who has some useful methods. * # # # # string () string Return the original document. * # # # Modify () string Return the modified document. # # # tagsets* # # # Find (map[string]string) *tagsets Return A set of tags from a set of tags or their children using a filte R. It can used with a chain style. Example: "Go photos: = tree. Find (map[string]string{"tagName": "Div", "class": "Product",}). Find (map[string]string{"tagName": "IMG", "Class": "Photo",}) ' * # # # # All () []*tag Get all the tags in this set. * # # # # GetAttributes (attr ... string) []map[string]string Get some attributes from each tag in the this set. Example: "Go inputs: = form. Find (map[string]string{"tagName": "Input"}). GetAttributes ("type", "Name", "Value", "Data-id") for _, Input: = range Inputs { Fmt. Printf ("%s,%s,%s,%s\n", input["type"], input["name"], input["value"], input["Data-id"])} "* # # # () string### T ag* # Find (map[string]string) *tagsets Find the tags from a tag ' s children. * # # # # getcontent () []byte Return the original bytes of a tag in the document, the tag's metadata is included. By design, each tag or text have a pair of pointers which determined its absolute position in the document. So whenever one gets the original content of a tag or text, it just fetches the Subslice Document[head:tail], which can No more faster.* # # string () string Satisfy the Stringer. * # # # # Extract () []byte Filter the text from the original data of a tag. Tags wont ' t be included. * # # # # index () Int64 Get The index of a tag in its among its. * # # # # Prev () *tag Get the previous tag of a tag under the same parent. * # # # # Next () *tag Similar to Prev (). * # # # Modify () string Return the modified data of a tag. One should call this after writing to a tag. * # # # WriteText (position int64, data []byte) (*text, error) Write text into a tag at the given index in the tag ' s children. Example: "Go names: = products. Find (map[string]string{"class": "ProductName"}) fmt. Println (names)//prints://<div class= "ProductName" >Product1</div>//<div class= "ProductName" > product2</div>//<div class= "ProductName" >Product3</div> for _, Name: = range names {name. WRITETEXT (0, []byte ("[ONSALE]")) FMT. PRINTLN (name. Modify ())}//prints://<div class= "ProductName" >[onsale] product1</div>//<div class= "ProductName" >[onsale] product2</div>//<div class= "ProductName" >[onsale] product3</div> "* # # # WRITETAG ( Position int64, tagname string) (*tag, error) Write a tag into a tag at the given index in the tag ' s chidren. Example: "Go script, _: = Body." Writetag ("script")//if ths position is greater than the count of the tag ' s children, it's ll be set to the last scrip t.attributes["src"] = "http://www.fOo.com "body. Modify () "* # # # Delete () error delete a tag. Example: "Go garbage: = tree. Find (map[string]string{"class": "Advertisement"}). All () [0] garbage. Delete () tree. Modify () ' # # # text* # # string () string smilar to tag.* # # # Index () Int64 smilar to tag.* # # # Modify () string smilar To tag.* # # # Delete () smilar to tag.471 times click

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Htmlparse:a Powerful Go tool to parse a HTML document

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Htmlparse:a Powerful Go tool to parse a HTML document

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support