Introduction to XML Library 4Suite server in Python

Source: Internet
Author: User
Tags xpath xslt
Before you continue reading this article, it is important to understand some of the technologies that we will discuss in this column. The techniques we want to use include: Extensible Stylesheet Language Transformations (extensible Stylesheet Language transformations,xslt), XML Path languages (XML Paths Language, XPath) and Resource Description Framework (Resource Description FRAMEWORK,RDF). There are links to information about all of these technologies in the Resources section.
4Suite Server Overview

We will use the XML repository 4Suite Server (4SS), which was developed by the authors of this article, as the Application foundation for the example in this article. 4Suite server is an XML repository with many functional programs for XML data and metadata management, which makes 4Suite server ideal for rapid development of WEB services, regardless of whether Python is used or not.

The examples in this article were written in 4Suite Server 0.11 and need to use Python 1.5.2 or later and 4Suite 0.11. There are links to download all of these applications in the Resources section.
Online Software Resource Library

This article is the second part of the "Python Web Services Developer" column, but it is the first of a three-part series on building an online software repository. In this section, we will build our infrastructure. In the subsequent columns, we will specifically describe how to use various protocols (for example, Simple Object Access Protocol (PROTOCOL,SOAP), HTTP and WWW Distributed Authoring and Versioning (WWW distributed Authoring and Versioning,webdav) search for indexed content and agent-based content additions or content retrieval.

Our online software repository service model is based on the RDF pattern in rpmfind.net, but the relationship is not very close. Rpmfind is a system that catalogs UNIX and Linux packages in the popular Red Hat pack Manager (Red Hat package manager,rpm) format. It contains key metadata about the package (including author, version, and description, in the form RDF, see Listing 1). For a short definition of RDF, read the previous installment of this column, or go to the Resources section to find a link to the basic description of this simple format.

The actual format of the XML is irrelevant. In fact, because the technology described is applicable to any type of XML content, there is no need to describe the software at all. You can use this technique to describe a book's catalogue, employee information, or even a list of wine in a restaurant.

All the code and data files used in this example can be downloaded from the links in the Resources section.
Document definition

In the 4SS XML repository, the document definition allows you to specify a mapping between the XML content and the RDF metadata. To do this, you need to define a set consisting of three XPath expressions: a subject (subject) expression, a predicate (predicate) expression, and an object (object) expression. XPath expressions allow you to define a set of node relationships in a document and allow a subset of content to be returned from the document based on those relationships. When you add, modify, and delete each XML document in the repository, the values of these XPath expressions are evaluated against the XML document. The obtained statement, also known as the ternary Group (triple), is automatically added to the RDF database (called the model) or removed from it. If you modify the document, you also change the tuple to reflect the changes that occurred, and if the document is deleted, remove the tuple from the RDF server. Document definitions can inherit the defined information from other documents, which allows you to define complex mappings of XML content to RDF metadata information.

In our sample application, we will extend one of the default document definitions. The default document definition describes the mapping of the Dublin core tag to the Dublin core statement embedded in the XML content. Dublin Core is a metadata initiative that defines a set of standard properties for a common Web-based object (for example, Creator, Title, and Date). The derived document definition adds another statement to each document.

As shown below, a simple declaration will set the Creator metadata of this document to the result of an XPath evaluated:

Rdfstatement (subject= ' $uri ', predicate= "Http://purl.org/dc/elements/1.1#Title", object= "/RDF:RDF/S:SOFTWARE/DC: Creator ")

(The code above is a single-line statement, but it is indented for this format.) )

In order to add or update the system default data, you should run the 4SS self-populate.py script. This will download useful data from ftp://ftp.fourthought.com to update your server. The downloaded data contains some commonly used items, such as the Dublin Core document definition and the Docbook style sheet (Docbook is a popular XML format for technical documents).

When you install 4SS, the implant script is automatically installed in the demo application. On Unix-based machines, the implant scripts are typically stored in/usr/doc/4suiteserver-0.11 or/usr/local/doc/4suiteserver-0.11. On Windows machines, the storage directory is typically C:\Program Files\python or c:\Python20. Listing 2 shows the installation process for embedding your 4SS-based application.
Listing 2: Embedding 4SS applications

The code is as follows:

[Molson@penny example]$ python/usr/doc/4suiteserver-0.11/demo/populate.py
Downloading XML Documents
Downloading stylesheets
Downloading Docdefs
Adding XML document: ' null '
Adding stylesheet: ' Docbook_html1.xslt '
Adding stylesheet: ' Presentation_toc.xslt '
Adding stylesheet: ' Presentation.xslt '
Adding stylesheet: ' Docbook_text1.xslt '
Adding document definition: ' Dublin_core '
Adding document definition: ' Docbook1 '

Next, we must create a document definition for the list of software items. To add definitions, we use command-line scripting 4SS deserialize Docdef, which passes the file name of the serialized document definition as a unique parameter. For example:

The code is as follows:

[Molson@penny example]$ 4SS deserialize docdef software.docdef

Content

We will add new content to the system from the command line with the 4SS create document. In the download example, there are two software manifests, which are XML files named Software1.rdf and SOFTWARE2.RDF. In order to add these files to the system, we execute 4SS create document, specifying the documentation definition to be used, the name of the file to be added, and a list of aliases to be given to the resources within the system.

First, we'll create a container for the software repository on our server, set the container's permissions to allow write access to the "UO" group, and allow all people to read access (because we want to provide a Web page from this directory):

The code is as follows:

[Molson@penny example]$ 4SS Create Container/softrepo
[Molson@penny example]$ 4ss set ACL--write=uo--world-read/softrepo

We then add our sample download file to the repository. Although the 4SS repository can store a lot of data in any format, it is highly optimized for storing XML data. When we add the. tar file to the repository, we specify the--IMT option to set the file's Internet media type (Internet medium TYPE,IMT) (here is Application/x-gzip). This IMT can also be used by HTTP servers to retrieve content on the Web, in addition to other functions. Note that IMT is sometimes also called "MIME type". See listing 3 for commands to add content. Note that in a more complex project, you might consider putting a binary file in a separate container.
Fetch content

Fetching content is as simple as adding content. However, we must first add the style sheet to the repository. Our sample file contains a very simple style sheet. To add it, you can use the 4SS create document and alias it as SOFTWARE.XSLT. For example:

The code is as follows:

[Molson@penny example]$ 4SS Create document BASE_XSLT software.xslt SOFTREPO/SOFTWARE.XSLT

BASE_XSLT is a special document definition that tells 4SS to optimize this document as an XSLT style sheet.

After you add the document, you can now connect to the 4SS HTTP server (support for normal Python and Apache servers) using your Web browser, and then go to the Http://localhost:8080/softrepo/pong.xml page. This will remove the Pong software description document from the repository. If you are using a browser that supports IMT text/xml, such as Internet Explorer or Mozilla, you can view the XML that has been added to the repository. To tell the HTTP listener (HTTP Listener) that you want to process the page before the page is returned (by running in XSLT), specify the XSLT URI query parameter http://localhost/softrepo/pong.xml?xslt= Software.xslt.

Please note that the link to the download package on the page also points to localhost. This link will also check the HTTP listener and take out the resources we added for pong-0.0.2.tgz. When it is returned to the browser, it specifies the IMT that we defined when we added the resource to the system.

Build index page

To generate the index pages, we will use some of the 4SS extension functions to make the XSLT access to the RDF model. There are other solutions to the problem of generating an index page in 4SS. One solution is to use Python to write a custom handler for an HTTP GET message. Doing so may require querying the RDF model when index.html is requested. Another solution is to use the 4SS event system to update the index.html document whenever a new document is added to the system or when a file is deleted from the system.

Because XSLT always needs to be applied according to a source document, we will add a dummy document to the system. In the download example, there is a source document called Index.doc. Please add this document to the repository using 4SS create document, as follows:

The code is as follows:

[Molson@penny example]$ 4SS Create document Index.doc Base_xml Softrepo/index.doc
[Molson@penny example]$ 4ss set ACL--world-read softrepo/index.doc

We will use the extension function rdf.complete in the style sheet to gather information about all the software in the system. The extension function invokes the complete method on the RDF model. The complete method allows you to search the RDF model to find a statement that matches the specified pattern. This method uses up to three parameters: a subject, a predicate, and an optional object. These parameters can all be empty strings. It returns a list of statements that match all of the specified values. For example, if you enter the subject Foo, Object bar, you will return a statement with the subject foo, any predicate, and object bar.

4SS automatically creates an RDF statement that links the document to the document definition. The subject of these statements is the URI of the document, the predicate is http://schemas.4suite.org/4ss#metaxml.docdef, and the object is the document definition name. Knowing this, we can use a simple complete call to specify the predicate and object, using the software document definition to get a list of documents in our system.

The style sheet we used to generate the index is called INDEX.XSLT. The template that matches the root of the source document calls Rdf.complete first. This function call makes a complete operation on the RDF model for all statements that use HTTP://SCHEMAS.4SUITE.ORG/4SS#METAXML.DOCDEF as a predicate and software as an object. The result of the Rdf.complete function call is a node set of the Statement element. Each Statement element has three child elements: Subject, predicate, and Object. As shown in Listing 4, we use xsl:apply-templates based on the result of the function and display each software item that matches on Statement within the template.

In order to view the generated index page, go to http://localhost:8080/softrepo/index.doc?xslt=index.xslt your browser.

However, this index is not immutable, and it is easy to understand how to extend this simple page to show the software title in any style. We can modify the way in which each item is displayed in the style sheet, and you can adjust the data available to the style sheet by adding more mappings to the document definition.

Conclusion

Well, we still haven't written any Python code, but we did get a rough look at some of the features of 4Suite Server. In the next month's column, we'll expand on this example to give the software repository the ability to manage content and to search for all of the generated metadata.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.