XML and web-oriented data mining technology

Source: Internet
Author: User
Tags xsl

Web-oriented data mining

There is a large amount of data information on the Web, and how to apply these data to complex applications has become a hot research topic in modern database technology. Data mining is to find out the hidden regularity of data from a large number of data, and to solve the problem of application quality. The most important application of data mining technology is to make full use of useful data and discard false and useless data. Compared to the web data, the traditional data structure of the database is very strong, that is, the data is fully structured data, and the biggest feature of the data on the Web is semi-structured. The so-called semi-structured is relative to the data of the fully structured traditional database. It is clear that web-oriented data mining is much more complex than data mining for a single data warehouse.

1. Heterogeneous database environment

From the perspective of database research, information on Web sites can also be viewed as a database, a larger, more complex database. Each site on the web is a data source, each data source is heterogeneous, so the information and organization between each site is different, which constitutes a huge heterogeneous database environment. If you want to use this data for data mining, first of all, it is necessary to study the integration of heterogeneous data between sites, only the data of these sites are integrated to provide users with a unified view, it is possible to get from the huge data resources needed. Second, we need to solve the problem of data query on the Web, because if the required data can not be very effective, the data analysis, integration, processing can not be discussed.

2. Semi-structured data structure

The data on the web is different from the data in the traditional database, the traditional database has a certain data model, which can describe the specific data according to the model. The data on the web is very complex, there is no specific model description, each site's data are individually designed, and the data itself is self-describing and dynamic variability. Therefore, the data on the web has a certain structure, but because of the existence of the readme level, it is a kind of not fully structured data, which is also called semi-structured data. Semi-structured is the largest feature of data on the Web.

3. Solve the semi-structured data source problem

Web data mining technology mainly solves the problem of querying and integrating semi-structured data source model and semi-structured data model. To solve the problem of integration and query of heterogeneous data on the Web, you must have a model to clearly describe the data on the Web. For the characteristics of data semi-structured on the web, finding a semi-structured data model is the key to solve the problem. In addition to defining a semi-structured data model, a semi-structured model extraction technique is needed to automatically extract semi-structured models from existing data. Web-oriented data mining must be based on semi-structured model and semi-structured data Model extraction technology.

XML and web Data mining technology

The new generation WWW environment based on XML is directly faced with the Web data, not only can be well compatible with the original Web application, but also can better realize the information sharing and exchange in the Web. XML can be considered as a semi-structured data model, which can easily correspond the XML document description with the attribute in the relational database, and implement accurate query and model extraction.

Generation and development of 1.XML

XML (Extensiblemarkuplanguage) is an important branch of SGML (Standardgeneralmarkuplanguage) designed by the World Wide Web Consortium (WWW), especially for Web application services. In general, XML is a Mediation Markup Language (Meta-markuplanguage) that provides a format for describing structured data, and in detail XML is a language similar to HTML that is designed to describe data. XML provides a stand-alone way of running programs to share data, a new standard language used to automatically describe information, which enables computer communications to extend the function of the Internet from information transfer to other kinds of human activities. XML consists of a number of rules that can be used to create markup languages and to process all newly created markup languages with a concise program called an analyzer, just as HTML provides a way for users of the first computer to read Internet documents. XML also creates a Esperanto that anyone can read and write. XML solves two Web problems that HTML cannot solve, namely the problem of fast Internet development and slow access, as well as the amount of information available, but it is difficult to find the part of the information that you need. XML adds structural and semantic information that enables computers and servers to instantly process multiple forms of information. Therefore, the extensible function of XML can not only download a lot of information from the Web server, but also reduce the network traffic greatly.

The tags in XML are not predefined, and the consumer must customize the required flags, XML is a language that can be interpreted (selfdescribing). XML uses DTDs (documenttypedefinition document type definitions) to display this data, and XSL (extensiblestylesheetlanguage) is a mechanism to describe how these documents are displayed, which is the Stylesheet Description Language for XML. The history of XSL is much older than HTML CSS (cascading style sheet cascadingstylesheets), which includes two parts: a way to transform an XML document, and a way to format an XML document. XLL (Extensiblelinklanguage) is an XML connection language that provides connections in XML, similar to HTML, but more powerful. With XLL, you can connect in multiple directions, and the connection can exist at the object level, not just at the page level. Because XML can tag more information, it makes it easy for users to find the information they need. Xml,web designers can not only create text and graphics, but also build multi-level, interdependent systems, data trees, metadata, hyperlink structures, and style sheets that are defined by the document type.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.