Introduction to Knowledge Atlas (eight) semantic search

Last Update:2018-07-26 Source: Internet

Author: User

Tags joins requires knowledge base

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Welcome to my blog http://pelhans.com/, all articles will be published in the first time there Oh ~

This section makes a simple introduction to semantic search, and then introduces the search for semantics and hybrid search. This part of the understanding is not deep, the follow-up will be further supplemented. Introduction to Semantic search

What is semantic search, using the interpretation of the World Wide Web's father Tim Berners-lee, "The Essence of semantic search is to ask the guesses and approximations used in today's search through mathematics, and to introduce a clear understanding of the meaning of words and how they relate to what we find in search engine input boxes,

The technical differences between the different search modes can be divided into: representation of user needs (query model) to the underlying data representation (data model) matching method (matching technique)

Previously used searches are document-based searches (documents retrieval). Information retrieval (IR) supports the retrieval of documents through a lightweight syntax model that represents the user's retrieval needs and resource content, such as and OR. The current dominant keyword model: the word bag model. It works well for theme searches, but not for more complex information retrieval needs .

The database (DB) and Knowledge Base expert system (Knowledge-based expert system) can provide more accurate answers (data retrieval). It uses a more expressive model to represent the needs of the user, take advantage of the intrinsic structure and semantic associations between the data, allow complex queries, and return specific answers to exact matching queries.

Semantic search answers can be divided into two categories:

The DB and KB systems are a heavyweight semantic search system that is modeled on semantic displays and formalized modeling , such as the ER diagram or RDF (S) and the Knowledge model in Owl. the data retrieval system is mainly semantic.

The semantic-based IR system belongs to the lightweight semantic search system. Adopt a lightweight semantic model, such as a classification system or a dictionary. Semantic data (RDF) is embedded in a document or associated with a document. It is a semantic-based document retrieval system .

With the increasing availability of structured and semantic data, there is a tendency for data web search and document web search to converge gradually.

For Web search, we adopt a better extensibility method, which is traditionally applied in IR domain, to deal with the quality problem of Web data and the data elements related to long text description.

For document web search, database and semantic search techniques are applied to the IR system to combine the increasing, highly structured and expressive data in the search process.

The flowchart for semantic search is shown in the following diagram:

Semantic Data Search

Semantic data search has the following difficulties: extensibility: The effective use of linked data by semantic data search requires that the infrastructure be extended and applied on large-scale and growing internal chain data. Heterogeneity: Data source heterogeneity, multi-data source query, merge multi-data source query results. Uncertainty: Incomplete representation of user requirements

Here are some of the best practices and their corresponding principles for semantic data search based on ternary storage.

Based on Ir:sindice, falcons; is a single data structure and query algorithm, optimized for sorting and retrieving text data. Its data is highly compressible and accessible. Sorting is part of. But can not handle the simple select,join and so on operation.

The Db:oracle-based RDF extension, the DB2 Sor, has a variety of index and query algorithms to accommodate a variety of complex queries over structured data. The advantage is the ability to complete complex selects,joins,... (SQL, SPARQL), capable of high dynamic scenarios (many insertions/deletions). The disadvantage is due to the use of B + trees, the overhead of space and the limitations of access. At the same time the results from the leaf nodes are not integrated into the sorting of the results.

Native Storage (Native stores): Dataplore, Yars, rdf-3x; The advantage is highly compressible and accessible. A sort of retrieval similar to IR. Selects and joins operations similar to DB. Terabytes of data can be queried on a single machine within the sub-second practice. Supports high dynamic operation. The disadvantage is that there is no transaction, recovery, etc. storage and indexing (predecessor of Semplore,dataplore)

Reuse an IR index to index semantic data. Its core idea is to convert RDF to a virtual document with fields and terms. The IR index is based on the following concepts: Document fields (field), such as title, summary, body, author .... Words (terms) Posting list and position list

Here is an example to understand the above terminology:

When a new element is inserted, it is not possible to completely rebuild the index, so a Delta index is required. The current incremental index needs to traverse the posting list, which is time consuming, so the posting list needs to be chunked, but much more random access is needed to locate the blocks, and more quickly requires more space overhead. Therefore, you need to weigh the index updates, search, and index sizes. Sorting and indexing

The index created above and stored. Now we need to retrieve it, for the retrieval we need to support four basic operations: Basic Search: (f, t) Merge sort: M (S1, OP, S2)

Conceptual expression calculation: λ (c) λ (c) \lambda (c), as

λ (americanfilm∩ "war") =m ((type,americanfilm), ∩, (text, "War")) λ (a m e r i c a n F i l m∩ "W A r") = m ((t y P E, A M e r i c a n F i l m), ∩, (t e x T, "W A R")) \lambda (americanfilm \cap "war") = m ((type, americanfilm), \cap, (Text, "War"))

Relationship extension (Relation Expansion): ⋈ (S1,R,S2) ⋈ (s 1, R, s 2) \join (S1, R, S2),
such as Y|∃x:x∈s1∧ (x,r,y) ∧y∈s2 y | ∃x:x∈s 1∧ (x, R, y) ∧y∈s 2 {y | \exists x:x\in s_1 \wedge (x, R, y) \wedge y \in s_2}, this is what we need to improve

So how do you make complex queries? The following illustration shows an example:

The approximate process is to start from x0 to x1,x1 return results to x0, in the results to X2 to find, and finally return to x0. The way the graph is traversed is the depth-first traversal query.

We also need to sort them in a query, sorted by two principles: quality propagation Principle: the fraction of an element can be considered as a measure of its mass (quality), and mass propagation is the quality of the adjacent elements that are reversed by updating the score. Quantity Aggregation: In addition to quality, consider the number of neighbors. Therefore, if there are more neighbors, the elements will be ranked higher.

The

How to tightly combine the sort into the basic operation. Ascending Integerstream (AIS) Basic retrieval: Given field F and term T, B (f, T) retrieves the posting list from the inverted index and outputs a ascending Integer list (AIS). Merge sort: S1 and S2 are two AIs, M (S1∪S2) m (S 1∪s 2) m (S1 \cup S2) Compute the intersection relationship extension of S1 and S2: given relationship R and AIS S, compute set

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More