Detailed python using the beautiful soup module to modify the content sample code

Source: Internet
Author: User
Beautiful soup is a python library that can extract data from HTML or XML files. It enables you to use your favorite converters to navigate, find, and modify documents in the usual way. He is also able to modify the contents of the Html/xml document. This article mainly introduces Python to use beautiful soup module to modify the content of the method, the need for friends can refer to.

Objective

In fact, the beautiful Soup module can modify the contents of the Html/xml document in addition to searching and navigating. This means that you can add or remove tags, modify label names, change Tag property values, modify text content, and more. This article gives you a very detailed introduction of Python using beautiful soup module to modify the content of the method, the following words do not say, to see the detailed introduction it.

Modify a Label

The sample HTML document used is also as follows:

Html_markup= "" "<p class=" Ecopyramid "> <ul id=" Producers ">  <li class=" Producerlist ">  < P class= "name" >plants</p>  <p class= "number" >100000</p>  </li>  <li class= "Producerlist" >  <p class= "name" >algae</p>  <p class= "number" >100000</p>  </li> </ul> </p> "" "

Modify Label name

Soup = BeautifulSoup (html_markup, ' lxml ') producer_entries = soup.ulprint Producer_entries.nameproducer_entries.name = "P" Print producer_entries.prettify ()

Modifying Tag property values

# Modify Tag Properties # Update label existing attribute value producer_entries[' id '] = "Producers_new_value" Print producer_entries.prettify () # Tag Add new attribute value producer_entries[' class '] = "Newclass" Print producer_entries.prettify () # Delete tag attribute value del producer_entries[' Class ']print producer_entries.prettify ()

Add a new label

We can use the New_tag method to generate a new label, and then use the,,, and append() insert() methods to add the insert_after() insert_before() label to the HTML tree.

For example, add an Li tag to the UL tag in the above HTML document. You first create a new Li tag, and then insert it into the HTML tree structure. and insert the corresponding p tag in the LI tag.

# Add a new label # New_tag generate a Tag object New_li_tag = Soup.new_tag ("li") # Tag object method for adding Properties New_atag = Soup.new_tag ("A", href= " www.example.com "rel=" external nofollow ") New_li_tag.attrs = {' class ': ' producerlist '}soup = BeautifulSoup (html_markup , ' lxml ') producer_entries = soup.ul# Use the Append () method to add to the end Producer_entries.append (new_li_tag) Print Producer_ Entries.prettify () # generates two P tags and inserts them into the li tag New_p_name_tag = Soup.new_tag ("P") new_p_name_tag[' class '] = "name" New_p_ Number_tag = Soup.new_tag ("P") new_p_number_tag["class"] = "number" # Use the Insert () method to specify the position of the insert New_li_tag.insert (0,new_p_ Name_tag) New_li_tag.insert (1,new_p_number_tag) print new_li_tag.prettify ()

modifying string contents

Modify string contents can be used new_string() , append() , insert() methods.

# Modify String Contents # Use the. String property to modify the string contents new_p_name_tag.string = ' new_p_name ' # Use the. Append () method to add the string content New_p_name_tag.append (" Producer ") # Use the New_string () method of the Soup object to generate the string new_string_toappend = Soup.new_string (" producer ") New_p_name_tag.append ( New_string_toappend) # Use the Insert () method to insert New_string_toinsert = soup.new_string ("10000") New_p_number_tag.insert (0,new_ String_toinsert) Print producer_entries.prettify ()

Delete a label node

The Beautiful Soup module provides the decompose() and extract() methods to remove nodes.

decompose()method deletes the node, not only the current node, but also a chunk of its child nodes.

extract()method is used to remove a node or string content from the HTML tree.

# Delete Node Third_producer = Soup.find_all ("li") [2]# Use Decompose () method to remove p node P_name = third_producer.pp_name.decompose () print T Hird_producer.prettify () # Delete node using extract () method third_producer_removed = Third_producer.extract () print soup.prettify ()

Delete label contents

The tag may have a navigablestring object or a tag object as its child node, and all of these child nodes can be removed using the clear() method. This will remove all the. Content from the label.

Other ways to modify content

In addition to the methods mentioned above, there are other ways to modify the content.

insert_after()and insert_before() methods

The above two methods can insert a label or string before or after the tag or string. Method can only receive one parameter, either the Navigablestring object or the Tag object.

replace_with()Method

The method is to replace the original tag or string with a new tag or string content, which can receive a label or string as input.

wrap()and unwrap() methods

wrap() The method is to wrap a label or string with another label.

unwrap() The square law and wrap() method are opposite.

# Wrap () Method li_tags = Soup.find_all (' li ') for li in Li_tags:new_p_tag = Soup.new_tag (' P ') li.wrap (new_p_tag) Print Soup.pret Tify () # Unwrap () method li_tags = Soup.find_all ("Li") for Li in Li_tags:li.p.unwrap () print soup.prettify ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.