Introduction to the XML Library 4Suite server in Python _python

Source: Internet
Author: User
Tags xpath xslt in python

Before continuing with this article, it is important to understand some of the techniques that we will discuss in this column. The techniques we want to use include: Extensible Stylesheet Language Conversion (extensible Stylesheet Language transformations,xslt), XML Path Language (XML Paths Language, XPath) and Resource Description Framework (Resource Description FRAMEWORK,RDF). There are links to information about all of these technologies in the Resources section.
4Suite Server Overview

We will use the XML repository 4Suite Server (4SS), which is developed by the author of this article, as the Application Foundation for the examples in this article. 4Suite server is an XML resource library with a number of functional programs for XML data and metadata management that make 4Suite Server ideal for fast development of WEB services, regardless of whether you use Python or not.

The examples in this article are written in 4Suite Server 0.11 and need to use Python 1.5.2 or later and 4Suite 0.11. There are links to download all of these applications in the Resources section.
Online software Resource library

This is the second part of the "Python Web Services Developer" column, but it is the first of a three-part series on building an online software repository. In this section, we will build our infrastructure. In a subsequent column, we will specifically describe how to use various protocols (for example, Simple Object Access Protocol (PROTOCOL,SOAP), HTTP and WWW Distributed authoring, and Versioning (WWW distributed Authoring and Versioning,webdav) searches for indexed content and agent-based content additions or content retrieval.

Our online software repository service model is based on RDF patterns in rpmfind.net, but not very closely related. Rpmfind is a system that uses the popular Red Hat Manager (Red Hat Package manager,rpm) format for directories for UNIX and Linux packages. It contains key metadata about the package (including author, version, and description, in the form of RDF, as detailed in Listing 1). For a short definition of RDF, read the previous installment of this column, or go to the Resources section to find a link to the basic introduction to this simple format.

The actual format of the XML is irrelevant. In fact, because the technology described is applicable to any type of XML content, there is no need to describe the software at all. You can use this technique to describe a book's catalogue, employee information, or even a restaurant's wine list.

All of the code and data files used in this example can be downloaded from the links in the Resources section.
Document Definition

In the 4SS XML repository, the document definition allows you to specify a mapping between the XML content and the RDF metadata. To do this, you need to define a set consisting of three XPath expressions: a subject (subject) expression, a predicate (predicate) expression, and an object (object) expression. XPath expressions allow you to define a set of node relationships in a document and allow a subset of the content to be returned from the document based on those relationships. When you add, modify, and delete each XML document in the repository, the values of these XPath expressions are evaluated against the XML document. The statements that are evaluated, also called triples (triple), are automatically added to or removed from the RDF database (called the model). If you modify the document, you also change the tuple to reflect the changes that occurred, and if the document is deleted, remove the tuple from the RDF server. Document definitions can inherit the defined information from other documents, which allows you to define complex mappings of XML content to RDF metadata information.

In our sample application, we will extend one of the default document definitions. The default document definition describes the mapping of the Dublin core markup to the Dublin core statement embedded in the XML content. Dublin Core is a metadata initiative that defines a set of standard properties for web-based general-purpose objects (for example, Creator, Title, and Date). The derived document definition adds another statement to each document.

As shown below, a simple declaration sets the Creator metadata for this document to the result of one of the evaluated XPath:

Rdfstatement (subject= ' $uri ',
 predicate= "Http://purl.org/dc/elements/1.1#Title",
 object= "/rdf:rdf/s": Software/dc:creator ")

(The above code is a single-line statement, but it is indented to fit this format.) )

In order to add or update system default data, you should run the 4SS script populate.py. This will download useful data from ftp://ftp.fourthought.com to update your server. The downloaded data contains some commonly used items, such as the Dublin Core document definition and the Docbook style sheet (Docbook is the popular XML format for technical documents).

When 4SS is installed, the implant script is automatically installed in the demo application. On Unix-based machines, embedded scripts are typically stored in/usr/doc/4suiteserver-0.11 or/usr/local/doc/4suiteserver-0.11. On Windows machines, the storage directory is typically C:\Program Files\python or c:\Python20. Listing 2 shows the installation process for implanting your 4SS based application.
Listing 2: Implanted 4SS applications

Copy Code code as follows:
[Molson@penny example]$ python/usr/doc/4suiteserver-0.11/demo/populate.py
Downloading XML Documents
Downloading stylesheets
Downloading Docdefs
Adding XML document: ' null '
Adding stylesheet: ' docbook_html1.xslt '
Adding stylesheet: ' presentation_toc.xslt '
Adding stylesheet: ' presentation.xslt '
Adding stylesheet: ' docbook_text1.xslt '
Adding document definition: ' Dublin_core '
Adding document definition: ' Docbook1 '

Next, we must create a document definition for the Software entry list. To add a definition, we use the command-line script 4SS deserialize docdef to pass the file name defined by the serialized document as a unique parameter. For example:

Copy Code code as follows:
[Molson@penny example]$ 4SS deserialize docdef software.docdef

Content

We will add new content to the system from the command line with the 4SS create document. In the download example, there are two software listings, which are XML files named Software1.rdf and SOFTWARE2.RDF. To add these files to the system, we execute the 4SS create document, which specifies the documentation definition to use, the name of the file to be added, and a column alias to give the resource within the system.

First, we're going to create a container for the software repository on our servers, set the permissions on the container to allow write access to the "UO" group, and allow everyone to read (because we want to provide a Web page from this directory):

Copy Code code as follows:
[Molson@penny example]$ 4SS Create Container/softrepo
[Molson@penny example]$ 4ss set ACL--write=uo--world-read/softrepo

Then, we add our sample download file to the repository. Although the 4SS repository can store a lot of data in any format, it is highly optimized for storing XML data. When we add the. tar file to the repository, we specify the--IMT option to set the file's Internet media type (TYPE,IMT) (here is Application/x-gzip). This IMT can also be used by HTTP servers to retrieve content on the Web, in addition to other functions. Please note that IMT is sometimes called "MIME type". See listing 3 for a command to add content. Note that in a more complex project, you might consider placing the binaries in a separate container.
Fetching content

Fetching content is as simple as adding content. However, we must first add the style sheet to the repository. Our sample file contains a very simple style sheet. To add it, you can use the 4SS create document and alias it to SOFTWARE.XSLT. For example:

Copy Code code as follows:
[Molson@penny example]$ 4SS Create document BASE_XSLT software.xslt SOFTREPO/SOFTWARE.XSLT

BASE_XSLT is a special document definition that tells 4SS to optimize this document as an XSLT style sheet.

After you add a document, you can now connect to a 4SS HTTP server with your Web browser (support for normal Python and Apache servers), and then go to the Http://localhost:8080/softrepo/pong.xml page. This will remove the Pong software description document from the repository. If you are using a browser that supports IMT text/xml (such as Internet Explorer or Mozilla), you can view the XML that has been added to the repository. To tell the HTTP listener (HTTP Listener) that you want to process the page before the page is returned (by running in XSLT), specify the XSLT URI query parameter http://localhost/softrepo/pong.xml?xslt= Software.xslt.

Note that the link to the download package on the page also points to localhost. This link also checks the HTTP listener and takes out the resources we added for pong-0.0.2.tgz. When it is returned to the browser, it specifies the IMT defined when we add the resource to the system.

To generate an index page

To generate an index page, we will use some extension functions of 4SS to enable XSLT to access the RDF model. There are other solutions to the problem of generating an index page in 4SS. One of the solutions is to use Python to write a custom handler for HTTP get messages. Doing so may require querying the RDF model when requesting index.html. Another solution is to use the 4SS event system to update the index.html document whenever a new document is added to the system or a file is deleted from the system.

Because XSLT always needs to be applied to a source document, we will add a dumb source document to the system. In the download example, there is a source document called Index.doc. Please add this document to the repository using 4SS create document, as follows:

Copy Code code as follows:
[Molson@penny example]$ 4SS Create document Index.doc Base_xml Softrepo/index.doc
[Molson@penny example]$ 4ss set ACL--world-read softrepo/index.doc

We'll use the extension function rdf.complete in the stylesheet to collect information about all the software in the system. The extension function invokes the complete method on the RDF model. The complete method allows you to search the RDF model to find a statement that matches the specified pattern. This method takes up to three parameters: a subject, a predicate, and an optional object. These parameters can all be empty strings. It returns a column of statements that match all of the specified values. For example, if you enter subject foo, Object bar, a statement with subject Foo, arbitrary predicate, and object bar will be returned.

4SS automatically creates an RDF statement that links a document to a document definition. The subject of these statements is the URI of the document, the predicate is http://schemas.4suite.org/4ss#metaxml.docdef, and the object is the document definition name. Knowing this, we can use a simple complete call to specify the predicate and object, and use the software document definition to get a list of documents in our system.

The style sheet we use to generate the index is called INDEX.XSLT. The template that matches the root of the source document first invokes the Rdf.complete. This function calls a complete operation on the RDF model for all statements that http://schemas.4suite.org/4ss#metaxml.docdef as predicates and software as objects. The result of the Rdf.complete function call is a set of nodes for the Statement element. Each Statement element has three child elements: Subject, predicate, and Object. As shown in Listing 4, we use xsl:apply-templates based on the results of the function and display each software item within the template that matches on Statement.

To view the generated index pages, transfer your browser to HTTP://LOCALHOST:8080/SOFTREPO/INDEX.DOC?XSLT=INDEX.XSLT.

However, this index is not immutable, and it is easy to see how this simple page can be extended to display software titles in any style. We can modify the way that each item is displayed in the stylesheet, and you can adjust the data available for the style sheet by adding more mappings to the document definition.

Conclusion

Well, we still haven't written any Python code, but we did have a rough look at some of the features of 4Suite Server. In next month's column, we will expand on this example to give the software repository the ability to manage content and to search for all the metadata that is generated.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.