Beautiful soup is a python library that can extract data from HTML or XML files. It enables you to use your favorite converters to navigate, find, and modify documents in the usual way. He is also able to modify the contents of the Html/xml document. This article mainly introduces Python to use beautiful soup module to modify the content of the method, the need for friends can refer to.
Objective
In fact, the beautiful Soup module can modify the contents of the Html/xml document in addition to searching and navigating. This means that you can add or remove tags, modify label names, change Tag property values, modify text content, and more. This article gives you a very detailed introduction of Python using beautiful soup module to modify the content of the method, the following words do not say, to see the detailed introduction it.
Modify a Label
The sample HTML document used is also as follows:
Html_markup= "" "<p class=" Ecopyramid "> <ul id=" Producers "> <li class=" Producerlist "> < P class= "name" >plants</p> <p class= "number" >100000</p> </li> <li class= "Producerlist" > <p class= "name" >algae</p> <p class= "number" >100000</p> </li> </ul> </p> "" "
Modify Label name
Soup = BeautifulSoup (html_markup, ' lxml ') producer_entries = soup.ulprint Producer_entries.nameproducer_entries.name = "P" Print producer_entries.prettify ()
Modifying Tag property values
# Modify Tag Properties # Update label existing attribute value producer_entries[' id '] = "Producers_new_value" Print producer_entries.prettify () # Tag Add new attribute value producer_entries[' class '] = "Newclass" Print producer_entries.prettify () # Delete tag attribute value del producer_entries[' Class ']print producer_entries.prettify ()
Add a new label
We can use the New_tag method to generate a new label, and then use the,,, and append() insert() methods to add the insert_after() insert_before() label to the HTML tree.
For example, add an Li tag to the UL tag in the above HTML document. You first create a new Li tag, and then insert it into the HTML tree structure. and insert the corresponding p tag in the LI tag.
# Add a new label # New_tag generate a Tag object New_li_tag = Soup.new_tag ("li") # Tag object method for adding Properties New_atag = Soup.new_tag ("A", href= " www.example.com "rel=" external nofollow ") New_li_tag.attrs = {' class ': ' producerlist '}soup = BeautifulSoup (html_markup , ' lxml ') producer_entries = soup.ul# Use the Append () method to add to the end Producer_entries.append (new_li_tag) Print Producer_ Entries.prettify () # generates two P tags and inserts them into the li tag New_p_name_tag = Soup.new_tag ("P") new_p_name_tag[' class '] = "name" New_p_ Number_tag = Soup.new_tag ("P") new_p_number_tag["class"] = "number" # Use the Insert () method to specify the position of the insert New_li_tag.insert (0,new_p_ Name_tag) New_li_tag.insert (1,new_p_number_tag) print new_li_tag.prettify ()
modifying string contents
Modify string contents can be used new_string() , append() , insert() methods.
# Modify String Contents # Use the. String property to modify the string contents new_p_name_tag.string = ' new_p_name ' # Use the. Append () method to add the string content New_p_name_tag.append (" Producer ") # Use the New_string () method of the Soup object to generate the string new_string_toappend = Soup.new_string (" producer ") New_p_name_tag.append ( New_string_toappend) # Use the Insert () method to insert New_string_toinsert = soup.new_string ("10000") New_p_number_tag.insert (0,new_ String_toinsert) Print producer_entries.prettify ()
Delete a label node
The Beautiful Soup module provides the decompose() and extract() methods to remove nodes.
decompose()method deletes the node, not only the current node, but also a chunk of its child nodes.
extract()method is used to remove a node or string content from the HTML tree.
# Delete Node Third_producer = Soup.find_all ("li") [2]# Use Decompose () method to remove p node P_name = third_producer.pp_name.decompose () print T Hird_producer.prettify () # Delete node using extract () method third_producer_removed = Third_producer.extract () print soup.prettify ()
Delete label contents
The tag may have a navigablestring object or a tag object as its child node, and all of these child nodes can be removed using the clear() method. This will remove all the. Content from the label.
Other ways to modify content
In addition to the methods mentioned above, there are other ways to modify the content.
insert_after()and insert_before() methods
The above two methods can insert a label or string before or after the tag or string. Method can only receive one parameter, either the Navigablestring object or the Tag object.
replace_with()Method
The method is to replace the original tag or string with a new tag or string content, which can receive a label or string as input.
wrap()and unwrap() methods
wrap() The method is to wrap a label or string with another label.
unwrap() The square law and wrap() method are opposite.
# Wrap () Method li_tags = Soup.find_all (' li ') for li in Li_tags:new_p_tag = Soup.new_tag (' P ') li.wrap (new_p_tag) Print Soup.pret Tify () # Unwrap () method li_tags = Soup.find_all ("Li") for Li in Li_tags:li.p.unwrap () print soup.prettify ()