Recently, the first task is to process a bunch of XML-formatted patent files and extract valid information from them.
Because the company's regulations do not allow the file down to local processing, can only be written on the remote server provided by the other side of the code
Because the element inside the XML is xxx:yyyy this kind of format with the prefix, with Xml.etree of the ElementTree alive and dead parse not come out, finally from Overstack found explanation
ElementTree is isn't too smart about namespaces. You need .find()
to give the, findall()
and iterfind()
methods an explicit namespace dictionary. This isn't documented very well:
namespaces ={ ' owl ' : '/http ' www.w3.org/2002/07/owl# ' } # add more as needed root. ( ' owl:class ' , namespaces /span>
Prefixes is only looked on the namespaces
parameter you pass in. This means your can use any namespace prefix. The API splits off owl:
the part, looks up the corresponding namespace URL namespaces
in the dictionary, then changes the sear Ch to look for the XPath expression {http://www.w3.org/2002/07/owl}Class
instead.
If You can switch lxml
to the library things is better; that library supports the same ElementTree API, but collects Nam Espaces for your in a .nsmap
attribute on elements.
So decisively to install lxml
python2.7 has been installed on the server, but nothing else has been done, so the PIP is manually installed
Execute pip install lxml a lot of mistakes, notice:
Could not the Find function xmlcheckversion in library LIBXML2. is LIBXML2 installed?
You'll also need to install the developer version of the dependent libraries LIBXML2 and libxslt before you find lxml
Yum Install Libxml2-devel
Yum Install Libxslt-devel
After installing the dependent library PIP or error.
This time, I read it again. Error message found in front of the fact that there is another line:
Unable to execute gcc:no such file or directory
Well, not even GCC was installed, no wonder it couldn't be compiled.
Yum Install GCC
Install after the PIP install, incredibly, or error!
SRC/LXML/LXML.ETREE.C:84:20: Fatal error: Python.h: no file or directory
Re-check the data, the original also need to install a python-devel, which is Python's header files and static library package
Yum Install Python-devel
After installation, pip install again lxml
Finally met the loved
Successfully installed lxml
At this point, the installation of lxml under Linux is complete
Installing lxml on a Linux server