Parsing XML with Python, encountering the following error,
Parseerror:not well-formed (Invalid token): Line 1, column 17
Because the following paragraph of text is in the XML file,
<news>
<title>2016 Year January 2 if (window.yzq_d==null) window.yzq_d=new Object ();
window.yzq_d[' vytlawrhzd8-']= ' &u=13jt854h2%2fn%3dvytlawrhzd8-%2fc% 3d300908984.301767463.303376606.311556217%2fd%3dult%2fb%3d302054045 ';</title>
</news>
Where the "&u" destroys the structure of XML,
The original code is written like this:
Domtree = Xml.dom.minidom.parse (file_path)
collection = domtree.documentelement Categorynamestr
= Collection.getattribute ("name")
Because you want to filter special characters, you need to change the following wording:
Import Codecs
def replace_special_character (content):
content = Content.replace ("&u", "&u")
return content
DataSource = Codecs.open (readfileaddress, ' R ', ' UTF-8 ')
xml_str = "" For line in
datasource:< C8/>xml_str + = line
Xml_str = Replace_special_character (xml_str)
domtree = xml.dom.minidom.parseString (xml_ STR)
collection = domtree.documentelement
categorynamestr = Collection.getattribute ("name")
If there is a better way, welcome to propose.