1 Introduction
Network Information Retrieval has become the main means for us to obtain information. According to CNNIC statistics [1]: currently, 42.3% of Chinese users are listed at the top of the list for the most important purpose of surfing the Internet. 98.7% of users indicate that information is obtained through the Internet, 71.9% of them search for related websites through the search engine. However, Network Information Retrieval faces two key issues that need to be resolved:
(1) The search results have low relevance and there are too many redundant information;
(2) search engines cannot answer common sense questions, and their intelligence level is low.
The reason for the above problems is that the current retrieval technology mainly relies on the encoding technology to describe the given information through the classification mode, and searches for keywords submitted by users through the full-text retrieval technology based on string matching. Because the encoding description can only reflect partial semantics, semantic matching cannot be guaranteed. The retrieval process compares the user's query keywords with each word in the full text, the semantic match between the query request and the document is not considered. Based on the knowledge of ontology, this paper proposes a semantic search engine Model Based on Ontology. This model can perform Knowledge-Based Reasoning Based on the user's query keyword or question, so as to improve the relevance of the search results and achieve a certain level of semantic search.
2 Ontology
2.1 concept of Ontology
Ontology is a branch of metaphysics. At present, the ontology has been widely studied and applied in the AI field, but it has not yet been defined in a unified manner. The most widely used definition is [2]:
Definition 1: ontology is a formal and clear description of the shared conceptual model. It has several key points:
★Concept Model (conceptualization): A Model Obtained by abstracting the concepts related to some phenomena (phenomenon) in the objective world. Its meaning is independent of the specific environment state;
★Explicit: the concepts used and the constraints for using these concepts are clearly defined;
★Formalization: indicates that ontology is readable by computers;
★Share: A shared knowledge is a set of concepts recognized in the relevant field. It targets groups rather than individuals.
To put it simply, the ontology provides the basic terms and relationships for forming vocabulary in related fields, and defines the rules for determining the extension of words using these terms and relationships; the goal is to capture the knowledge of relevant fields, provide a common understanding of the knowledge in this field, determine the general words in the field, and give these words (Terms) and clearly define the relationship between words.
Definition 2: ontology is a theory about words or concepts used to build an artificial intelligence system. In this definition, the ontology is actually a expressive word, which can be applied to specific fields. For example, the ontology in the electronic equipment field contains some words that describe basic concepts, such as transistors, operational amplifiers, and voltages; it also includes the relationship between these basic terms-an operational amplifier is a type of electronic equipment, and a transistor is a component of an operational amplifier. In general, identifying such words and potential concepts requires careful analysis of various objects and relationships in the field. The ontology construction method described in this definition has some similarities with the object-oriented method.
Definition 3: ontology is used to define knowledge subjects in a certain field. In this definition, the ontology describes the knowledge of a certain domain. It is not just a simple vocabulary, but an entire upper-level knowledge base (including the words used to describe this knowledge base ).
To sum up, the ontology is different subjects (persons, agents, machines, etc.) in a certain field (which can be specialized or broad) A Semantic Basis for communication (such as dialogue, interoperability, and sharing). The ontology provides a defined vocabulary to describe the relationship between concepts, as a consensus between users.
2.2 role of Ontology
The role of an ontology can be attributed to communication, interoperability, and system engineering ).
(1) The so-called communication refers to the word that the ontology provides for communication between people or between organizations and organizations, that is, the basis of communication.
(2) Interoperability description the ontology establishes a mechanism for translation and ing between different modeling methods, paradigms, languages, and software tools to achieve integration between different systems.
(3) System Engineering: Ontology analysis can provide the following benefits for System Engineering:
★Reusability: ontology is the basis for the formal description of important entities, attributes, processes, and their relationships in the domain. This formal description can be a component that can be reused and shared in a software system ).
★Knodge DGE acquisition: when constructing a knowledge-based system, the existing ontology is used as the starting point and foundation to guide the knowledge acquisition, which can improve the speed and reliability.
★Reliability: Because the description of the ontology is formal, the formal expression makes automatic consistency check possible, thus improving the reliability of the software system.
★Specification: Ontology analysis helps determine the requirements and specifications of a system (such as a knowledge base.
3. Ontology-Based Semantic search engine
3.1 Ontology-based search engine design
The ontology provides a mechanism for human-machine communication, allowing machines to understand semantics, thus laying the foundation for search engines to improve efficiency.
The basic design concept of Ontology-based search engine:
(1) With the help of field experts, establish the ontology of relevant fields (Ontology );
(2) collect data from the information source and store the in the metadatabase (relational database, knowledge base, etc.) in the specified format with reference to the established ontology:
(3) For the query requests obtained on the user search interface, the query converter converts the query requests to the specified format according to ontology, matching a qualified data set from the metabase with the help of ontology;
(4) the retrieval result is customized and returned to the user.
3.2 Knowledge Base
To implement Ontology-Based Semantic search engines, it is necessary to establish a knowledge base. Knowledge Base is the basis and key for semantic search engine reasoning and knowledge accumulation, while ontology is the basis of knowledge base. Generally, an ontology provides a set of terms and concepts to describe a domain, and the knowledge base uses these terms to express the facts of the domain. For example, the medical ontology may contain definitions of terms such as "Leukemia" and "Skin Disease", but it does not contain the diagnostic results of a specific patient, which is exactly what the Knowledge Base expresses. For example, Zhang San suffers from skin diseases, Li Si suffers from skin diseases and leukemia, and Wang Wu suffers from leukemia. Among them, skin diseases and leukemia are the ontology. Examples of various diseases (Zhang San, Li Si, and Wang Wu) and their descriptions are the content of the knowledge base.
There are several key points in the relationship between the ontology and the knowledge base:
★Ontology provides a basic structure for the establishment of a knowledge base;
★Ontology provides a set of concepts and terms to describe a domain and obtain the essential conceptual structure of the domain;
★The knowledge base uses these terms to express the correct knowledge in the real or virtual world.
Therefore, the first step in building a knowledge base is to conduct effective ontology analysis in this field.
3.3 construct an Ontology
Ontology construction is the underlying foundation of the Ontology-based Information Retrieval System. It determines the system performance, general degree, and system operation quality. Correct, effective, and logical ontology creation is the key to the establishment of this system.
(1) ontology Construction Principles
Different people often establish different Ontology for the same domain and things. Because the ontology should be a canonicalized description, it is necessary to follow the unified construction principles. Currently, the five most common guidelines for guiding ontology construction proposed by Gruber are as follows:
Clarity: the ontology must effectively describe the meaning of the defined term. The definition should be objective and independent from the background. When a definition can be expressed using logical justice, it should be formal. The definition should be as complete as possible. All definitions should be described in natural language.
Coherence: the ontology should be consistent, that is, it should support reasoning consistent with its definition. The principle it defines and the documents that are explained in natural languages should be consistent.
Extendibility: the ontology should provide a conceptual basis for predictable tasks. It should support defining new terms based on existing concepts to meet special requirements without modifying existing concepts.
Minimum encoding preference (minimal encoding bias): The description of a concept should not depend on the representation of a special symbol layer. Because the actual system may use different knowledge representation methods.
Minimum ontology conventions (minimal Ontological Commitment): the ontology conventions should be minimal, as long as they can meet specific knowledge sharing needs. This can be ensured by defining the principle with the weakest constraints and defining only the words required for communication.
(2) ontology Representation
Currently, two ontology representation methods are widely used. One is the traditional four-element representation method and the other is the new six-element group representation method. The former has gained a high level of recognition in the world, but the form is too flexible and difficult to grasp. The latter has been welcomed by domestic researchers because of its standardized definition and operability.
★Four-element representation
The basic idea of a four-element representation method is that the four main elements in an ontology are concepts, relations, instances, and axioms ).
A concept represents a set of entities or things in a domain. Concepts can be divided into two categories: primitive concepts and defined concepts ). The simple concept is that class members that only have the necessary conditions (that is, attributes) (Note: Original ENGLISH: primitive concepts are those which only have necessary conditions (in terms of their properties) for membership of the class .). A detailed description of a class member is both adequate and necessary: defined concepts are those whose description is both necessary and sufficient for a thing to be a member of the class .). For example, "a square is a right-angle quadrilateral with four corners" is a simple concept. The concept of "Square is a quadrilateral with four sides at a right angle with an equi-length" is detailed, because the Equi-length of the four sides is a sufficient and necessary condition of the square.
Interaction between a Link Description concept and a conceptual attribute.
An instance is a specific concept. For example, Shandong University is an example of the concept of "university. Strictly speaking, an ontology should not include any instance because it is assumed to be a conceptual concept of a specific domain. A combination of ontology and related instances is what we call today's knowledge base ).
The principle is used to limit the value range of classes and instances. The principle includes many specific rules and constraints.
★Six-Element Group Representation
The basic idea of this method is to use a six-element group to represent an ontology.
An Ontology = {C, AC, R, AR, H, x}
C Indicates a set of concepts. AC indicates a set composed of multiple attribute sets. Each attribute set corresponds to one concept. R is a relational set. Ar is a set composed of multiple Property sets. Each property set corresponds to a link in R. H indicates the hierarchical relationship between concepts, and X indicates the set of justice.
To describe the ontology in depth, a home ontology description instance is listed below.
Family_ontology = {cfamily, acfamily, r family, arfamily, H family, X Family} Where
Cfamily = {father, mother, children}
Acfamily = {acfamily (father), acfamily (mother), acfamily (children )}
Acfamily (father) = {name, age, job, salary ,......}
Acfamily (mother) = {name, age, job, salary ,......}
Acfamily (children) = {name, age, sex ,......}
R family = {takecareof (mother, mother, children ),
Educate (father, mother, children ),
Help (children, mother ),......}
Arfamily = {arfamily (takecareof), arfamily (educate), arfamily (help ),......}
Arfamily (takecareof) = {feed, clothing, seedoctor ,......}
Arfamily (educate) = {teach, exercise ,......}
......
(3) lifecycle of ontology Construction
After understanding the principles and methodology of ontology construction, the next step is to establish the ontology. The creation process can be described by the concept of lifecycle. Uschold & gruninger proposes a methodology framework for Ontology construction [4] (figure 1). The framework consists of the following components:
First, you must clarify the purpose and scope of the ontology, and then construct the ontology in sequence. The process of constructing an ontology can be divided into three phases:
Ontology capture is to determine key concepts and relationships, provide precise definitions, and determine other related terms. Ontology encoding selects appropriate expression language expression concepts and terms; the integration of existing ontology is the reuse and modification of existing ontology. This stage is also a cyclical iteration process.
Finally, in the evaluation phase, the ontology, software environment, and related documents should be evaluated based on the Requirement Description and capability question.
3.4 introduction to Ontology-Based Semantic search engine model (ontosse, Ontology-Based Semantic search engine)
Ontosse is an Ontology-based search engine that can implement semantic search, knowledge search, and certain inference functions. This model assumes that the search engine has a Web page, which does not automatically contain semantic tags.
The system should also have various basic functions of the search engine, such as page traversal and retrieval, index creation, and page search algorithms, you can refer to the structure and implementation process of popular search engines.
Ontosse is important for communication between the information library and the knowledge base. Knowledge Base is the core of Smart Search. It is like a human brain, and its growth also requires a natural cycle. The rich knowledge base also determines the retrieval capability and question-Answer capability. The information base is the space for the existence and development of the knowledge base. The knowledge base is the judgment, extraction, analysis and generalization of the information base. The Intelligent Search Engine raises users' questions to the knowledge level through the knowledge base, and then uses this knowledge to retrieve the information base. [5] semantic analysis and knowledge management are essential to the combination of the two. Therefore, ontology serves as an important foundation for semantic analysis and knowledge sharing and reuse. It forms three pillars of ontosse together with the knowledge base and information base.
The system structure and workflow of the Ontology-Based Semantic search engine ontosse are presented. The ontosse model's working principle and retrieval steps can be summarized as follows:
(1) The search engine crawls web pages through the automatic web page collector (web spider), classifies web page information by reference to a specific word table, and adds it to the index library.
Ontosse architecture and its workflow
(2) A domain or a general ontology is established manually, automatically, or semi-automatically. (3) Use the ontology description language (DAML and RDF) to mark documents. (4) The labeled document (RDF triple) is equivalent to an ontology instance and is stored in the knowledge base. (5) A user enters a query request in a natural language. Such a request may be a keyword or a problem. (6) The query filter (analyzer) performs Semantic Analysis on users' query requests and extracts the values of relevant attributes. (7) The retrieval proxy performs logical reasoning based on the class and link information embodied in the RDF triple and the attribute values submitted by the query filter to generate a query instance. (8) The query instance is passed to the information database, searched in different directories, and the results are processed and returned to the user. For example, if we want to search for "who is the president of Microsoft", After inputting this question into the model, the query filter performs semantic understanding based on the Word Segmentation technology, the semantic meaning of the sentence actually indicates that "the value of a property named" title "is" President of Microsoft '". Through ontology and knowledge base, the system can learn through reasoning that there is a "job" attribute in the class named "person". In this way, when conducting semantic reasoning, an instance of a person's class is generated, with the attribute "Title = President of Microsoft". The knowledge base shows that the Instance name attribute is "Bill Gate ". At this moment, we get the answer of "Bill Gate. Finally, we can also retrieve various potential information related to Bill Gates from the information library and knowledge base.
It can be seen that the ontosse model can improve the search engine in three aspects: Improving the result relevance, semantic reasoning, and knowledge retrieval.
4 Conclusion
Using ontology to support semantics and communication between humans and machines, machine intelligence is realized, which brings new opportunities for web development. The Application of ontology in search engines will greatly improve the ease-of-use and efficiency of search engines, so that web users can travel in the vast ocean of information.
[References]
1 China Internet Network Information Center. 14th China Internet Network Development Statistics report [R]. 2004/7
HTTP: llwww.cnnic.net.cn/
2 ontologies-description and applications. http://wiki.w3china.org/wiki/index.php
3 Gruber T. Towards principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies 5/6 (907): 928-
4 uschold M. Building ontologies: Towards unifiedmethodology [J]. inexpert systems 96,1996 (3)
5 Wu Dan. Research on the intelligence of search engines [J]. Intelligence Theory and Practice, 2002 (4)
China paper download center Co., http://www.studa.net.