Installing lxml on a Linux server

Source: Internet
Author: User

Recently, the first task is to process a bunch of XML-formatted patent files and extract valid information from them.

Because the company's regulations do not allow the file down to local processing, can only be written on the remote server provided by the other side of the code

Because the element inside the XML is xxx:yyyy this kind of format with the prefix, with Xml.etree of the ElementTree alive and dead parse not come out, finally from Overstack found explanation

ElementTree is isn't too smart about namespaces. You need .find() to give the, findall() and iterfind() methods an explicit namespace dictionary. This isn't documented very well:

  namespaces ={ ' owl ' :  '/http ' www.w3.org/2002/07/owl# ' } # add more as needed root. ( ' owl:class ' , namespaces               /span>      

Prefixes is only looked on the namespaces parameter you pass in. This means your can use any namespace prefix. The API splits off owl: the part, looks up the corresponding namespace URL namespaces in the dictionary, then changes the sear Ch to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead.

If You can switch lxml to the library things is better; that library supports the same ElementTree API, but collects Nam Espaces for your in a .nsmap attribute on elements.

So decisively to install lxml

python2.7 has been installed on the server, but nothing else has been done, so the PIP is manually installed

Execute pip install lxml a lot of mistakes, notice:

Could not the Find function xmlcheckversion in library LIBXML2. is LIBXML2 installed?

You'll also need to install the developer version of the dependent libraries LIBXML2 and libxslt before you find lxml

Yum Install Libxml2-devel

Yum Install Libxslt-devel

After installing the dependent library PIP or error.

This time, I read it again. Error message found in front of the fact that there is another line:

Unable to execute gcc:no such file or directory

Well, not even GCC was installed, no wonder it couldn't be compiled.

Yum Install GCC

Install after the PIP install, incredibly, or error!

SRC/LXML/LXML.ETREE.C:84:20: Fatal error: Python.h: no file or directory

Re-check the data, the original also need to install a python-devel, which is Python's header files and static library package

Yum Install Python-devel

After installation, pip install again lxml

Finally met the loved

Successfully installed lxml

At this point, the installation of lxml under Linux is complete

Installing lxml on a Linux server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.